|
175 | 175 | "source": [
|
176 | 176 | "The `openlayer` Python client comes with LLM runners, which are wrappers around common LLMs -- such as OpenAI's. The idea is that these LLM runners adhere to a common interface and can be called to make predictions on pandas dataframes. \n",
|
177 | 177 | "\n",
|
178 |
| - "To use `openlayer`'s LLM runners, we must follow the steps:\n", |
179 |
| - "\n", |
180 |
| - "**1. Create a new directory**\n", |
181 |
| - "\n", |
182 |
| - "This directory will house all the configs and files related to the LLM of our choice. Let's call ours `llm_package`:" |
183 |
| - ] |
184 |
| - }, |
185 |
| - { |
186 |
| - "cell_type": "code", |
187 |
| - "execution_count": null, |
188 |
| - "id": "ed8f150d", |
189 |
| - "metadata": {}, |
190 |
| - "outputs": [], |
191 |
| - "source": [ |
192 |
| - "!mkdir llm_package" |
| 178 | + "To use `openlayer`'s LLM runners, we must follow the steps:" |
193 | 179 | ]
|
194 | 180 | },
|
195 | 181 | {
|
196 | 182 | "cell_type": "markdown",
|
197 | 183 | "id": "f639ce93",
|
198 | 184 | "metadata": {},
|
199 | 185 | "source": [
|
200 |
| - "**2. Write a YAML config file**\n", |
| 186 | + "**1. Prepare the config**\n", |
201 | 187 | "\n",
|
202 |
| - "Now, we can write a YAML config file called `model_config.yaml` to our newly created directory:" |
| 188 | + "We need to prepare a config for the LLM:" |
203 | 189 | ]
|
204 | 190 | },
|
205 | 191 | {
|
|
245 | 231 | "metadata": {},
|
246 | 232 | "outputs": [],
|
247 | 233 | "source": [
|
248 |
| - "import yaml\n", |
249 |
| - "\n", |
250 | 234 | "# Note the camelCase for the keys\n",
|
251 | 235 | "model_config = {\n",
|
252 | 236 | " \"prompt\": prompt,\n",
|
|
256 | 240 | " \"modelParameters\": {\n",
|
257 | 241 | " \"temperature\": 0\n",
|
258 | 242 | " },\n",
|
259 |
| - " \"modelType\": \"api\",\n", |
260 |
| - " \"name\": \"Product name suggestor\",\n", |
261 |
| - " \"architectureType\": \"llm\",\n", |
262 |
| - "}\n", |
263 |
| - "\n", |
264 |
| - "with open(\"llm_package/model_config.yaml\", \"w\") as model_config_file:\n", |
265 |
| - " yaml.dump(model_config, model_config_file, default_flow_style=False)" |
| 243 | + "}" |
266 | 244 | ]
|
267 | 245 | },
|
268 | 246 | {
|
269 | 247 | "cell_type": "markdown",
|
270 | 248 | "id": "9543123e",
|
271 | 249 | "metadata": {},
|
272 | 250 | "source": [
|
273 |
| - "You can check out the details for every field of the `model_config.yaml` file in our documentation. \n", |
274 |
| - "\n", |
275 | 251 | "To highlight a few important fields:\n",
|
276 | 252 | "- `prompt`: this is the prompt that will get sent to the LLM. Notice that our variables are refered to in the prompt template with double handlebars `{{ }}`. When we make the request, the prompt will get injected with the input variables data from the pandas dataframe. Also, we follow OpenAI's convention with messages with `role` and `content` regardless of the LLM provider you choose.\n",
|
277 |
| - "- `inputVariableNames`: this is a list with the names of the input variables. Each input variable should be a column in the pandas dataframe that we will use. Furthermore, these are the input variables referenced in the `promptTemplate` with the handlebars.\n", |
| 253 | + "- `inputVariableNames`: this is a list with the names of the input variables. Each input variable should be a column in the pandas dataframe that we will use. Furthermore, these are the input variables referenced in the `prompt` with the handlebars.\n", |
278 | 254 | "- `modelProvider`: one of the supported model providers, such as `OpenAI`.\n",
|
279 | 255 | "- `model`: name of the model from the `modelProvider`. In our case `gpt-3.5-turbo`.\n",
|
280 | 256 | "- `modelParameters`: a dictionary with the model parameters for that specific `model`. For `gpt-3.5-turbo`, for example, we could specify the `temperature`, the `tokenLimit`, etc."
|
|
285 | 261 | "id": "0d36b925",
|
286 | 262 | "metadata": {},
|
287 | 263 | "source": [
|
288 |
| - "**3. Get the model runner**\n", |
| 264 | + "**2. Get the model runner**\n", |
289 | 265 | "\n",
|
290 | 266 | "Now we can import `models` from `openlayer` and call the `get_model_runner` function, which will return a `ModelRunner` object. This is where we'll pass the OpenAI API key. For a different LLM `modelProvider` you might need to pass a different argument -- refer to our documentation for details."
|
291 | 267 | ]
|
|
301 | 277 | "\n",
|
302 | 278 | "llm_runner = models.get_model_runner(\n",
|
303 | 279 | " task_type=tasks.TaskType.LLM,\n",
|
304 |
| - " model_package=\"llm_package\",\n", |
305 |
| - " openai_api_key=\"YOUR_OPENAI_API_KEY_HERE\"\n", |
| 280 | + " openai_api_key=\"YOUR_OPENAI_API_KEY_HERE\",\n", |
| 281 | + " **model_config\n", |
306 | 282 | ")"
|
307 | 283 | ]
|
308 | 284 | },
|
|
321 | 297 | "id": "ca5d75e5",
|
322 | 298 | "metadata": {},
|
323 | 299 | "source": [
|
324 |
| - "**4. Run the LLM to get the predictions**\n", |
| 300 | + "**3. Run the LLM to get the predictions**\n", |
325 | 301 | "\n",
|
326 | 302 | "Every model runner comes with a `run` method. This method expects a pandas dataframe with the input variables as input and returns a pandas dataframe with a single column: the predictions.\n",
|
327 | 303 | "\n",
|
|
464 | 440 | "source": [
|
465 | 441 | "### <a id=\"dataset\">Uploading datasets</a>\n",
|
466 | 442 | "\n",
|
467 |
| - "Before adding the datasets to a project, we need to do prepare a `dataset_config.yaml` file. \n", |
| 443 | + "Before adding the datasets to a project, we need to do Prepare a `dataset_config`. \n", |
468 | 444 | "\n",
|
469 |
| - "This is a file that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the fields of the `dataset_config.yaml` file, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).\n", |
| 445 | + "This is a Python dictionary that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the `dataset_config` items, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).\n", |
470 | 446 | "\n",
|
471 |
| - "Let's prepare the `dataset_config.yaml` files for our validation set:" |
| 447 | + "Let's prepare the `dataset_config` for our validation set:" |
472 | 448 | ]
|
473 | 449 | },
|
474 | 450 | {
|
|
478 | 454 | "metadata": {},
|
479 | 455 | "outputs": [],
|
480 | 456 | "source": [
|
481 |
| - "# Some variables that will go into the `dataset_config.yaml` file\n", |
482 |
| - "column_names = list(dataset.columns)\n", |
| 457 | + "# Some variables that will go into the `dataset_config`\n", |
483 | 458 | "input_variable_names = [\"description\", \"seed_words\"]\n",
|
484 | 459 | "output_column_name = \"model_output\""
|
485 | 460 | ]
|
|
491 | 466 | "metadata": {},
|
492 | 467 | "outputs": [],
|
493 | 468 | "source": [
|
494 |
| - "import yaml \n", |
495 |
| - "\n", |
496 | 469 | "validation_dataset_config = {\n",
|
497 |
| - " \"columnNames\": column_names,\n", |
498 | 470 | " \"inputVariableNames\": input_variable_names,\n",
|
499 | 471 | " \"label\": \"validation\",\n",
|
500 | 472 | " \"outputColumnName\": output_column_name,\n",
|
501 |
| - "}\n", |
502 |
| - "\n", |
503 |
| - "with open(\"validation_dataset_config.yaml\", \"w\") as dataset_config_file:\n", |
504 |
| - " yaml.dump(validation_dataset_config, dataset_config_file, default_flow_style=False)" |
| 473 | + "}" |
505 | 474 | ]
|
506 | 475 | },
|
507 | 476 | {
|
|
514 | 483 | "# Validation set\n",
|
515 | 484 | "project.add_dataframe(\n",
|
516 | 485 | " dataset_df=dataset,\n",
|
517 |
| - " dataset_config_file_path=\"validation_dataset_config.yaml\",\n", |
| 486 | + " dataset_config=validation_dataset_config,\n", |
518 | 487 | ")"
|
519 | 488 | ]
|
520 | 489 | },
|
|
563 | 532 | "\n",
|
564 | 533 | "Note that to use a direct-to-API model on the platform, you'll need to **provide your model provider's API key (such as the OpenAI API key) using the platform's UI**, under the project settings.\n",
|
565 | 534 | "\n",
|
566 |
| - "Since we used an LLM runner in this notebook, we already wrote a model config YAML file. We will write it again just for completeness:" |
| 535 | + "Since we used an LLM runner in this notebook, we already wrote a model config for the LLM. We'll write it again for completeness:" |
567 | 536 | ]
|
568 | 537 | },
|
569 | 538 | {
|
|
573 | 542 | "metadata": {},
|
574 | 543 | "outputs": [],
|
575 | 544 | "source": [
|
576 |
| - "import yaml\n", |
577 |
| - "\n", |
578 | 545 | "# Note the camelCase for the keys\n",
|
579 | 546 | "model_config = {\n",
|
580 | 547 | " \"prompt\": prompt,\n",
|
|
585 | 552 | " \"temperature\": 0\n",
|
586 | 553 | " },\n",
|
587 | 554 | " \"modelType\": \"api\",\n",
|
588 |
| - " \"name\": \"Product name suggestor\",\n", |
589 |
| - " \"architectureType\": \"llm\",\n", |
590 |
| - "}\n", |
591 |
| - "\n", |
592 |
| - "with open(\"llm_package/model_config.yaml\", \"w\") as model_config_file:\n", |
593 |
| - " yaml.dump(model_config, model_config_file, default_flow_style=False)" |
| 555 | + "}" |
594 | 556 | ]
|
595 | 557 | },
|
596 | 558 | {
|
|
602 | 564 | "source": [
|
603 | 565 | "# Adding the model\n",
|
604 | 566 | "project.add_model(\n",
|
605 |
| - " model_config_file_path=\"llm_package/model_config.yaml\",\n", |
| 567 | + " model_config=model_config,\n", |
606 | 568 | ")"
|
607 | 569 | ]
|
608 | 570 | },
|
|
0 commit comments