From dea2ab85e9774480e30f03f9d0208765b54efae4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Tue, 11 Mar 2025 17:48:42 +0100 Subject: [PATCH 1/7] [E&A] Drafts initial conceptual docs for EIS. --- explore-analyze/elastic-inference/eis.md | 57 +++++++++++++++++++++++- 1 file changed, 55 insertions(+), 2 deletions(-) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index 9a2823744..96c884f6a 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -5,6 +5,59 @@ applies_to: navigation_title: Elastic Inference Service (EIS) --- -# Elastic {{infer-cap}} Service +# Elastic {{infer-cap}} Service [elastic-inference-service-eis] -This is the documentation of the Elastic Inference Service. +The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster. +With EIS, you don't need to manage the infrastructure and resources required for large language models (LLMs) by adding, configuring, and scaling {{ml}} nodes. +Instead, you can use {{ml}} models in high-throughput, low-latency scenarios independently of your {{es}} infrastructure. + +Currently, you can perform chat completion tasks through EIS using the {{infer}} API. + +% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % + +## Default EIS endpoints [default-eis-inference-endpoints] + +Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API: + +* `rainbow-sprinkles-elastic`: uses Anthropic's Claude Sonnet 3.5 model for chat completion {{infer}} tasks. + +::::{note} + +* The model appears as `Elastic LLM` in the AI Assistant, Attack Discovery UI, preconfigured connectors list, and the Search Playground. +* To fine-tune prompts sent to `rainbow-sprinkles-elastic`, optimize them for Claude Sonnet 3.5. + +:::: + +% TO DO: Link to the AI assistant documentation in the different solutions and possibly connector docs. % + +## Regions [eis-regions] + +EIS is currently running on AWS and in the following regions: + +* `us-east-1` +* `us-west-2` + +For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) and the [supported cross-region {{infer}} profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html) documentation. + +## LLM hosts [llm-hosts] + +The LLM used with EIS is hosted by [Amazon Bedrock](https://aws.amazon.com/bedrock/). + +## Examples + +The following example demostrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint. + +```json +POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream +{ + "messages": [ + { + "role": "user", + "content": "Say yes if it works." + } + ], + "temperature": 0.7, + "max_completion_tokens": 300 + } +} +``` From 5c0499f94ce79ea1dc1b1f98de975ccb5b6f9eb7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Tue, 11 Mar 2025 17:55:29 +0100 Subject: [PATCH 2/7] [E&A] Small edits. --- .../inference-api/elastic-inference-service-eis.md | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md b/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md index d6127e53f..967c09d93 100644 --- a/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md +++ b/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md @@ -15,12 +15,10 @@ Refer to the [{{infer-cap}} APIs](https://www.elastic.co/docs/api/doc/elasticsea Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` service. - ## {{api-request-title}} [infer-service-elastic-api-request] `PUT /_inference//` - ## {{api-path-parms-title}} [infer-service-elastic-api-path-params] `` @@ -34,7 +32,6 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se * `chat_completion`, * `sparse_embedding`. - ::::{note} The `chat_completion` task type only supports streaming and only through the `_stream` API. @@ -42,8 +39,6 @@ For more information on how to use the `chat_completion` task type, please refer :::: - - ## {{api-request-body-title}} [infer-service-elastic-api-request-body] `max_chunk_size` @@ -64,7 +59,6 @@ For more information on how to use the `chat_completion` task type, please refer `service_settings` : (Required, object) Settings used to install the {{infer}} model. - `model_id` : (Required, string) The name of the model to use for the {{infer}} task. @@ -77,9 +71,7 @@ For more information on how to use the `chat_completion` task type, please refer } ``` - - -## Elastic {{infer-cap}} Service example [inference-example-elastic] +## Elastic {{infer-cap}} Service example [inference-example-elastic] The following example shows how to create an {{infer}} endpoint called `elser-model-eis` to perform a `text_embedding` task type. @@ -104,4 +96,3 @@ PUT /_inference/chat_completion/chat-completion-endpoint } } ``` - From 92410c838239e70276c5c9d7793841c61852f7a5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 12 Mar 2025 14:53:36 +0100 Subject: [PATCH 3/7] Apply suggestions from code review Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --- explore-analyze/elastic-inference/eis.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index 96c884f6a..ec0f0ef55 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -32,7 +32,7 @@ Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier t ## Regions [eis-regions] -EIS is currently running on AWS and in the following regions: +EIS runs on AWS in the following regions: * `us-east-1` * `us-west-2` @@ -45,7 +45,7 @@ The LLM used with EIS is hosted by [Amazon Bedrock](https://aws.amazon.com/bedro ## Examples -The following example demostrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint. +The following example demonstrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint. ```json POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream From 705b0cf371aef3f0ac3467b9eaafd45423094b55 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 13 Mar 2025 11:57:51 +0100 Subject: [PATCH 4/7] Addresses feedback. --- explore-analyze/elastic-inference/eis.md | 39 +++++++++++++++++++----- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index ec0f0ef55..e8e05abf6 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -11,7 +11,18 @@ The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered searc With EIS, you don't need to manage the infrastructure and resources required for large language models (LLMs) by adding, configuring, and scaling {{ml}} nodes. Instead, you can use {{ml}} models in high-throughput, low-latency scenarios independently of your {{es}} infrastructure. -Currently, you can perform chat completion tasks through EIS using the {{infer}} API. +% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % + +## Available task types + +EIS offers the following {{infer}} task types to perform: + +* Chat completion + +## How to use EIS [using-eis] + +Your Elastic deployment comes with default endpoints for EIS that you can use performing {{infer}} tasks. +You can either do it by calling the {{infer}} API or using the default `Elastic LLM` model in the AI Assistant, Attack Discovery UI, and Search Playground. % TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % @@ -19,12 +30,11 @@ Currently, you can perform chat completion tasks through EIS using the {{infer}} Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API: -* `rainbow-sprinkles-elastic`: uses Anthropic's Claude Sonnet 3.5 model for chat completion {{infer}} tasks. +* `rainbow-sprinkles-elastic` ::::{note} * The model appears as `Elastic LLM` in the AI Assistant, Attack Discovery UI, preconfigured connectors list, and the Search Playground. -* To fine-tune prompts sent to `rainbow-sprinkles-elastic`, optimize them for Claude Sonnet 3.5. :::: @@ -39,10 +49,6 @@ EIS runs on AWS in the following regions: For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) and the [supported cross-region {{infer}} profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html) documentation. -## LLM hosts [llm-hosts] - -The LLM used with EIS is hosted by [Amazon Bedrock](https://aws.amazon.com/bedrock/). - ## Examples The following example demonstrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint. @@ -58,6 +64,23 @@ POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream ], "temperature": 0.7, "max_completion_tokens": 300 - } } ``` + +The request returns the following response: + +```json +(...) +{ + "role" : "assistant", + "content": "Yes", + "model" : "rainbow-sprinkles", + "object" : "chat.completion.chunk", + "usage" : { + "completion_tokens" : 4, + "prompt_tokens" : 13, + "total_tokens" : 17 + } +} +(...) +``` From 6937860416d9584ffa69f41866ec776082c3025e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Mon, 17 Mar 2025 09:43:01 +0100 Subject: [PATCH 5/7] Apply suggestions from code review Co-authored-by: Max Jakob --- explore-analyze/elastic-inference/eis.md | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index e8e05abf6..cbf5f2784 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -8,8 +8,8 @@ navigation_title: Elastic Inference Service (EIS) # Elastic {{infer-cap}} Service [elastic-inference-service-eis] The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster. -With EIS, you don't need to manage the infrastructure and resources required for large language models (LLMs) by adding, configuring, and scaling {{ml}} nodes. -Instead, you can use {{ml}} models in high-throughput, low-latency scenarios independently of your {{es}} infrastructure. +With EIS, you don't need to manage the infrastructure and resources required for {{ml}} {{infer}} by adding, configuring, and scaling {{ml}} nodes. +Instead, you can use {{ml}} models for ingest, search and chat independently of your {{es}} infrastructure. % TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % @@ -17,12 +17,11 @@ Instead, you can use {{ml}} models in high-throughput, low-latency scenarios ind EIS offers the following {{infer}} task types to perform: -* Chat completion +* `chat_completion` -## How to use EIS [using-eis] +## AI features powered by EIS [ai-features-powered-by-eis] -Your Elastic deployment comes with default endpoints for EIS that you can use performing {{infer}} tasks. -You can either do it by calling the {{infer}} API or using the default `Elastic LLM` model in the AI Assistant, Attack Discovery UI, and Search Playground. +Your Elastic deployment or project comes with a default `Elastic LLM` connector. This connector is used in the AI Assistant, Attack Discovery, Automatic Import and Search Playground. % TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % @@ -30,11 +29,11 @@ You can either do it by calling the {{infer}} API or using the default `Elastic Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API: -* `rainbow-sprinkles-elastic` +* `.rainbow-sprinkles-elastic` ::::{note} -* The model appears as `Elastic LLM` in the AI Assistant, Attack Discovery UI, preconfigured connectors list, and the Search Playground. +* This endpoint is used by the `Elastic LLM` AI connector, which in turn powers the AI Assistant, Attack Discovery, Automatic Import, and the Search Playground. :::: @@ -42,12 +41,12 @@ Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier t ## Regions [eis-regions] -EIS runs on AWS in the following regions: +All EIS requests are handled by one of these AWS regions: * `us-east-1` * `us-west-2` -For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) and the [supported cross-region {{infer}} profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html) documentation. +For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/). ## Examples @@ -67,7 +66,7 @@ POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream } ``` -The request returns the following response: +The request returns the following response as a stream: ```json (...) @@ -75,7 +74,6 @@ The request returns the following response: "role" : "assistant", "content": "Yes", "model" : "rainbow-sprinkles", - "object" : "chat.completion.chunk", "usage" : { "completion_tokens" : 4, "prompt_tokens" : 13, From df74a391a184c2173549b2c61dce09430d468851 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Mon, 17 Mar 2025 12:30:20 +0100 Subject: [PATCH 6/7] Restructures page. --- explore-analyze/elastic-inference/eis.md | 70 +++++++++++++++--------- 1 file changed, 44 insertions(+), 26 deletions(-) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index cbf5f2784..2654043cc 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -13,18 +13,18 @@ Instead, you can use {{ml}} models for ingest, search and chat independently of % TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % -## Available task types - -EIS offers the following {{infer}} task types to perform: - -* `chat_completion` - ## AI features powered by EIS [ai-features-powered-by-eis] Your Elastic deployment or project comes with a default `Elastic LLM` connector. This connector is used in the AI Assistant, Attack Discovery, Automatic Import and Search Playground. % TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) % +## Available task types + +EIS offers the following {{infer}} task types to perform: + +* `chat_completion` + ## Default EIS endpoints [default-eis-inference-endpoints] Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API: @@ -39,16 +39,7 @@ Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier t % TO DO: Link to the AI assistant documentation in the different solutions and possibly connector docs. % -## Regions [eis-regions] - -All EIS requests are handled by one of these AWS regions: - -* `us-east-1` -* `us-west-2` - -For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/). - -## Examples +### Examples The following example demonstrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint. @@ -69,16 +60,43 @@ POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream The request returns the following response as a stream: ```json -(...) -{ - "role" : "assistant", - "content": "Yes", +event: message +data: { + "id" : "unified-45ecde2b-6293-4fd6-a195-4252de76ee63", + "choices" : [ + { + "delta" : { + "role" : "assistant" + }, + "index" : 0 + } + ], "model" : "rainbow-sprinkles", - "usage" : { - "completion_tokens" : 4, - "prompt_tokens" : 13, - "total_tokens" : 17 - } + "object" : "chat.completion.chunk" +} + + +event: message +data: { + "id" : "unified-45ecde2b-6293-4fd6-a195-4252de76ee63", + "choices" : [ + { + "delta" : { + "content" : "Yes" + }, + "index" : 0 + } + ], + "model" : "rainbow-sprinkles", + "object" : "chat.completion.chunk" } -(...) ``` + +## Regions [eis-regions] + +All EIS requests are handled by one of these AWS regions: + +* `us-east-1` +* `us-west-2` + +For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/). From d56e228264e0f0607ff084990af3ee9631154b68 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Mon, 24 Mar 2025 14:04:58 +0100 Subject: [PATCH 7/7] Addresses feedback. --- explore-analyze/elastic-inference/eis.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index 2654043cc..6c88d9ab9 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -99,4 +99,7 @@ All EIS requests are handled by one of these AWS regions: * `us-east-1` * `us-west-2` +However, projects and deployments can use the Elastic LLM regardless of their cloud provider or region. +The request routing does not restrict the location of your deployments. + For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).