|
| 1 | +--- |
| 2 | +title: Configuring the pattern |
| 3 | +weight: 20 |
| 4 | +aliases: /rag-llm-cpu/configure/ |
| 5 | +--- |
| 6 | + |
| 7 | +# **Configuring the pattern** |
| 8 | + |
| 9 | +This guide covers common customizations, such as changing the default LLM, adding new models, and configuring RAG data sources. |
| 10 | +We assume you have already completed the [Getting Started](/rag-llm-cpu/getting-started/) guide. |
| 11 | + |
| 12 | +## **How configuration works** |
| 13 | + |
| 14 | +This pattern is managed by ArgoCD (GitOps). All application configurations are defined in `values-prod.yaml`. |
| 15 | +To customize a component, you will typically: |
| 16 | + |
| 17 | +1. **Enable an override:** In `values-prod.yaml`, find the application you want to change (e.g., `llm-inference-service`) and add an `extraValueFiles:` entry pointing to a new override file (e.g., `$patternref/overrides/llm-inference-service.yaml`). |
| 18 | +2. **Create the override file:** Create the new .yaml file inside the `/overrides` directory. |
| 19 | +3. **Add your settings:** Add _only_ the specific values you want to change into this new file. |
| 20 | +4. **Commit and sync:** Commit your changes and let ArgoCD sync the application. |
| 21 | + |
| 22 | +## **Task: Change the Default LLM** |
| 23 | + |
| 24 | +By default, the pattern deploys the `mistral-7b-instruct-v0.2.Q5_0.gguf model`. You might want to change this to a different model (e.g., a different quantization) or adjust its resource usage. |
| 25 | +You can do this by creating an override file for the _existing_ `llm-inference-service` application. |
| 26 | + |
| 27 | +1. **Enable the override**: |
| 28 | + In `values-prod.yaml`, update the llm-inference-service application to use an override file: |
| 29 | + |
| 30 | + ```yaml |
| 31 | + clusterGroup: |
| 32 | + # ... |
| 33 | + applications: |
| 34 | + # ... |
| 35 | + llm-inference-service: |
| 36 | + name: llm-inference-service |
| 37 | + namespace: rag-llm-cpu |
| 38 | + chart: llm-inference-service |
| 39 | + chartVersion: 0.3.* |
| 40 | + extraValueFiles: # <-- ADD THIS BLOCK |
| 41 | + - $patternref/overrides/llm-inference-service.yaml |
| 42 | + ``` |
| 43 | +
|
| 44 | +2. **Create the override file:** |
| 45 | + Create a new file `overrides/llm-inference-service.yaml`. Here is an example that switches to a different model file (Q8_0) and increases the CPU/memory requests: |
| 46 | + |
| 47 | + ```yaml |
| 48 | + inferenceService: |
| 49 | + resources: # <-- Increased allocated resources |
| 50 | + requests: |
| 51 | + cpu: "8" |
| 52 | + memory: 12Gi |
| 53 | + limits: |
| 54 | + cpu: "12" |
| 55 | + memory: 24Gi |
| 56 | +
|
| 57 | + servingRuntime: |
| 58 | + args: |
| 59 | + - --model |
| 60 | + - /models/mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed model file |
| 61 | +
|
| 62 | + model: |
| 63 | + repository: TheBloke/Mistral-7B-Instruct-v0.2-GGUF |
| 64 | + files: |
| 65 | + - mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed file to download |
| 66 | + ``` |
| 67 | + |
| 68 | +## **Task: add a second LLM** |
| 69 | + |
| 70 | +You can also deploy an entirely separate, second LLM and add it to the demo user interface (UI). This example deploys a different runtime, HuggingFace TGI, instead of `llama.cpp`. |
| 71 | + |
| 72 | +This is a two-step process: |
| 73 | + |
| 74 | +1. Deploy the new LLM. |
| 75 | +2. Tell the front end UI about it. |
| 76 | + |
| 77 | +### **Step 1: Deploy the new LLM service** |
| 78 | + |
| 79 | +1. **Define the new application:** |
| 80 | + In `values-prod.yaml`, add a new application to the applications list. We'll call it `another-llm-inference-service`. |
| 81 | + |
| 82 | + ```yaml |
| 83 | + clusterGroup: |
| 84 | + # ... |
| 85 | + applications: |
| 86 | + # ... |
| 87 | + another-llm-inference-service: # <-- ADD THIS NEW APPLICATION |
| 88 | + name: another-llm-inference-service |
| 89 | + namespace: rag-llm-cpu |
| 90 | + chart: llm-inference-service |
| 91 | + chartVersion: 0.3.* |
| 92 | + extraValueFiles: |
| 93 | + - $patternref/overrides/another-llm-inference-service.yaml |
| 94 | + ``` |
| 95 | + |
| 96 | +2. **Create the override file:** |
| 97 | + Create the new file `overrides/another-llm-inference-service.yaml`. This file needs to define the new model and disable resource creation, such as secrets, that the first LLM already created. |
| 98 | + |
| 99 | + ```yaml |
| 100 | + dsc: |
| 101 | + initialize: false |
| 102 | + externalSecret: |
| 103 | + create: false |
| 104 | +
|
| 105 | + # Define the new InferenceService |
| 106 | + inferenceService: |
| 107 | + name: hf-inference-service # <-- New service name |
| 108 | + minReplicas: 1 |
| 109 | + maxReplicas: 1 |
| 110 | + resources: |
| 111 | + requests: |
| 112 | + cpu: "8" |
| 113 | + memory: 32Gi |
| 114 | + limits: |
| 115 | + cpu: "12" |
| 116 | + memory: 32Gi |
| 117 | +
|
| 118 | + # Define the new runtime (HuggingFace TGI) |
| 119 | + servingRuntime: |
| 120 | + name: hf-runtime |
| 121 | + port: 8080 |
| 122 | + image: docker.io/kserve/huggingfaceserver:latest |
| 123 | + modelFormat: huggingface |
| 124 | + args: |
| 125 | + - --model_dir |
| 126 | + - /models |
| 127 | + - --model_name |
| 128 | + - /models/Mistral-7B-Instruct-v0.3 |
| 129 | + - --http_port |
| 130 | + - "8080" |
| 131 | +
|
| 132 | + # Define the new model to download |
| 133 | + model: |
| 134 | + repository: mistralai/Mistral-7B-Instruct-v0.3 |
| 135 | + files: |
| 136 | + - generation_config.json |
| 137 | + - config.json |
| 138 | + - model.safetensors.index.json |
| 139 | + - model-00001-of-00003.safetensors |
| 140 | + - model-00002-of-00003.safetensors |
| 141 | + - model-00003-of-00003.safetensors |
| 142 | + - tokenizer.model |
| 143 | + - tokenizer.json |
| 144 | + - tokenizer_config.json |
| 145 | + ``` |
| 146 | + |
| 147 | + > **Warning:** There is currently a bug in the model-downloading container that requires you to explicitly list _all_ files you want to download from the HuggingFace repository. Make sure you list every file needed for the model to run. |
| 148 | + |
| 149 | +### **Step 2: Add the new LLM to the demo UI** |
| 150 | + |
| 151 | +Now, tell the front end that this new LLM exists. |
| 152 | + |
| 153 | +1. **Edit the front end overrides**: |
| 154 | + Open `overrides/rag-llm-frontend-values.yaml` (this file should already exist from the initial setup). |
| 155 | +2. **Update LLM_URLS:** |
| 156 | + Add the URL of your new service to the `LLM_URLS` environment variable. The URL follows the format _http://<service-name>-predictor/v1_ (or _http://<service-name>-predictor/openai/v1_ for the HF runtime). |
| 157 | + |
| 158 | + In `overrides/rag-llm-frontend-values.yaml`: |
| 159 | + |
| 160 | + ```yaml |
| 161 | + env: |
| 162 | + # ... |
| 163 | + - name: LLM_URLS |
| 164 | + value: '["http://cpu-inference-service-predictor/v1","http://hf-inference-service-predictor/openai/v1"]' |
| 165 | + ``` |
| 166 | + |
| 167 | +## **Task: Customize RAG data sources** |
| 168 | + |
| 169 | +By default, the pattern loads data from the Validated Patterns documentation. You can change this to point to your own public git repositories or web pages. |
| 170 | + |
| 171 | +1. **Edit the Vector DB overrides:** |
| 172 | + Open `overrides/vector-db-values.yaml` (this file should already exist). |
| 173 | +2. **Update sources:** |
| 174 | + Modify the repoSources and webSources keys. You can add any publicly available Git repository (using globs to filter files) or public web URLs. The job will also process PDFs from webSources. |
| 175 | + |
| 176 | + In `overrides/vector-db-values.yaml`: |
| 177 | + |
| 178 | + ```yaml |
| 179 | + providers: |
| 180 | + qdrant: |
| 181 | + enabled: true |
| 182 | + mssql: |
| 183 | + enabled: true |
| 184 | +
|
| 185 | + vectorEmbedJob: |
| 186 | + repoSources: |
| 187 | + - repo: https://github.com/your-org/your-docs.git # <-- Your repo |
| 188 | + globs: |
| 189 | + - "**/*.md" |
| 190 | + webSources: |
| 191 | + - https://your-company.com/product-manual.pdf # <-- Your PDF |
| 192 | + chunking: |
| 193 | + size: 4096 |
| 194 | + ``` |
| 195 | + |
| 196 | +## **Task: Add a new RAG database provider** |
| 197 | + |
| 198 | +By default, the pattern enables _qdrant_ and _mssql_. You can also enable _redis_, _pgvector_ (Postgres), or _elastic_ (Elasticsearch). |
| 199 | +This is a three-step process: (1) Add secrets, (2) Enable the DB, and (3) Tell the front end UI. |
| 200 | + |
| 201 | +### **Step 1: Update your secrets file** |
| 202 | + |
| 203 | +If your new DB requires credentials (like _pgvector_ or _elastic_), add them to your main secrets file: |
| 204 | + |
| 205 | +```sh |
| 206 | +vim ~/values-secret-rag-llm-cpu.yaml |
| 207 | +``` |
| 208 | + |
| 209 | +Add the necessary credentials. For example: |
| 210 | + |
| 211 | +```yaml |
| 212 | +secrets: |
| 213 | + # ... |
| 214 | + - name: pgvector |
| 215 | + fields: |
| 216 | + - name: user |
| 217 | + value: user # <-- Update the user |
| 218 | + - name: password |
| 219 | + value: password # <-- Update the password |
| 220 | + - name: db |
| 221 | + value: db # <-- Update the db |
| 222 | +``` |
| 223 | + |
| 224 | +**Note:** refer to the file [`values-secret.yaml.template`](https://github.com/validatedpatterns-sandbox/rag-llm-cpu/blob/main/values-secret.yaml.template) for a reference as to which values are expected. |
| 225 | + |
| 226 | +### **Step 2: Enable the provider in the Vector DB chart** |
| 227 | + |
| 228 | +Edit `overrides/vector-db-values.yaml` and set enabled: true for the provider(s) you want to add. |
| 229 | + |
| 230 | +In `overrides/vector-db-values.yaml`: |
| 231 | + |
| 232 | +```yaml |
| 233 | +providers: |
| 234 | + qdrant: |
| 235 | + enabled: true |
| 236 | + mssql: |
| 237 | + enabled: true |
| 238 | + pgvector: # <-- ADD THIS |
| 239 | + enabled: true |
| 240 | + elastic: # <-- OR THIS |
| 241 | + enabled: true |
| 242 | +``` |
| 243 | + |
| 244 | +### **Step 3: Add the provider to the demo UI** |
| 245 | + |
| 246 | +Finally, edit `overrides/rag-llm-frontend-values.yaml` to configure the UI. You must: |
| 247 | + |
| 248 | +1. Add the new provider's secrets to the `dbProvidersSecret.vault` list. |
| 249 | +2. Add the new provider's connection details to the `dbProvidersSecret.providers` list. |
| 250 | + |
| 251 | +Below is a complete example showing configuration for the non-default RAG DB providers: |
| 252 | + |
| 253 | +In `overrides/rag-llm-frontend-values.yaml` |
| 254 | + |
| 255 | +```yaml |
| 256 | +dbProvidersSecret: |
| 257 | + vault: |
| 258 | + - key: mssql |
| 259 | + field: sapassword |
| 260 | + - key: pgvector # <-- Add this block |
| 261 | + field: user |
| 262 | + - key: pgvector |
| 263 | + field: password |
| 264 | + - key: pgvector |
| 265 | + field: db |
| 266 | + - key: elastic # <-- Add this block |
| 267 | + field: user |
| 268 | + - key: elastic |
| 269 | + field: password |
| 270 | + providers: |
| 271 | + - type: qdrant # <-- Example for Qdrant |
| 272 | + collection: docs |
| 273 | + url: http://qdrant-service:6333 |
| 274 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 275 | + - type: mssql # <-- Example for MSSQL |
| 276 | + table: docs |
| 277 | + connection_string: >- |
| 278 | + Driver={ODBC Driver 18 for SQL Server}; |
| 279 | + Server=mssql-service,1433; |
| 280 | + Database=embeddings; |
| 281 | + UID=sa; |
| 282 | + PWD={{ .mssql_sapassword }}; |
| 283 | + TrustServerCertificate=yes; |
| 284 | + Encrypt=no; |
| 285 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 286 | + - type: redis # <-- Example for Redis |
| 287 | + index: docs |
| 288 | + url: redis://redis-service:6379 |
| 289 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 290 | + - type: elastic # <-- Example for Elastic |
| 291 | + index: docs |
| 292 | + url: http://elastic-service:9200 |
| 293 | + user: "{{ .elastic_user }}" |
| 294 | + password: "{{ .elastic_password }}" |
| 295 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 296 | + - type: pgvector # <-- Example for PGVector |
| 297 | + collection: docs |
| 298 | + url: >- |
| 299 | + postgresql+psycopg://{{ .pgvector_user }}:{{ .pgvector_password }}@pgvector-service:5432/{{ .pgvector_db }} |
| 300 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 301 | +``` |
0 commit comments