Update inference API specification to include new Llama Service #5020

Jan-Kazlouski-elastic · 2025-07-22T17:02:47Z

This PR adds changes to specification caused by elastic/elasticsearch#130092:

Additional actions

Signed the CLA
Executed make contrib

Jan-Kazlouski-elastic · 2025-07-22T17:07:34Z

specification/inference/put_llama/PutLlamaRequest.ts

+ *
+ * Create an inference endpoint to perform an inference task with the `llama` service.
+ * @rest_spec_name inference.put_llama
+ * @availability stack since=9.2.0 stability=stable visibility=public


@jonathan-buttner could you please check if this 9.2.0 version is correctly set here. I assume it is, but want to be sure.

Yep this is correct 👍

…ng-completion # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

github-actions · 2025-07-22T17:31:23Z

Following you can find the validation changes against the target branch for the APIs.

No changes detected.

You can validate these APIs yourself by using the make validate target.

Jan-Kazlouski-elastic · 2025-07-22T17:45:44Z

@jonathan-buttner since we didn't have backport for Llama integration, I added skip-backport label in order to skip backport label github action check

jonathan-buttner

Looking good, left a few questions

specification/inference/_types/CommonTypes.ts

jonathan-buttner · 2025-07-22T19:28:06Z

specification/inference/_types/CommonTypes.ts

+   * After creating the inference model, you cannot change the associated API key.
+   * If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.
+   */
+  api_key?: string


I should have mentioned this on the elasticsearch PR. When did you need to supply an API key? In my testing of running the stack locally I didn't need to supply one 🤔

I was running it like this:

PUT _inference/text_embedding/llama-text-embedding { "service": "llama", "service_settings": { "url": "http://localhost:8321/v1/inference/embeddings", "model_id": "all-MiniLM-L6-v2" } }

It is good point to discuss. Glad you're bringing that up.
Llama Stack doesn't have built in authorization check by default, so It is possible to use it without providing any tokens. Specially when testing with Ollama locally.
However for me it is doubtful that users are going to use llama stack without auth 100% of the time so I added this api_key parameter as option for clients that would want to set up bearer auth. Authentication Configuration is described in Distribution Overview's Server Configuration section of official Llama Stack guide.
https://llama-stack.readthedocs.io/en/latest/distributions/configuration.html#authentication-configuration
I haven't investigated it in depth but I think it is safe to assume that providing ability to send bearer token pretty much covers security concerns.

jonathan-buttner · 2025-07-22T19:30:38Z

specification/inference/put_llama/PutLlamaRequest.ts

+ *
+ * Create an inference endpoint to perform an inference task with the `llama` service.
+ * @rest_spec_name inference.put_llama
+ * @availability stack since=9.2.0 stability=stable visibility=public


Yep this is correct 👍

…ng-completion

Update inference API specification to include new Llama Service

8b89902

Jan-Kazlouski-elastic requested a review from a team as a code owner July 22, 2025 17:02

Jan-Kazlouski-elastic requested review from jonathan-buttner and removed request for a team July 22, 2025 17:02

github-actions bot added the specification label Jul 22, 2025

Jan-Kazlouski-elastic assigned jonathan-buttner Jul 22, 2025

Jan-Kazlouski-elastic added the ml label Jul 22, 2025

Jan-Kazlouski-elastic commented Jul 22, 2025

View reviewed changes

Jan-Kazlouski-elastic added 2 commits July 22, 2025 17:15

Fix typos

659c6ca

Merge remote-tracking branch 'origin/main' into feature/llama-embeddi…

a68e740

…ng-completion # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

Jan-Kazlouski-elastic added the skip-backport This pull request should not be backported label Jul 22, 2025

jonathan-buttner reviewed Jul 22, 2025

View reviewed changes

Jan-Kazlouski-elastic added 3 commits July 23, 2025 12:03

Fixed Typo

23dd73f

Merge remote-tracking branch 'origin/main' into feature/llama-embeddi…

6b1c6d4

…ng-completion

Update json outputs

73fc8af

Jan-Kazlouski-elastic requested a review from jonathan-buttner July 25, 2025 12:16

Merge remote-tracking branch 'origin/main' into feature/llama-embeddi…

c527e7d

…ng-completion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update inference API specification to include new Llama Service #5020

Update inference API specification to include new Llama Service #5020

Jan-Kazlouski-elastic commented Jul 22, 2025

Uh oh!

Jan-Kazlouski-elastic Jul 22, 2025

Uh oh!

jonathan-buttner Jul 22, 2025

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

Jan-Kazlouski-elastic commented Jul 22, 2025

Uh oh!

jonathan-buttner left a comment

Uh oh!

Uh oh!

jonathan-buttner Jul 22, 2025

Uh oh!

Jan-Kazlouski-elastic Jul 23, 2025

Uh oh!

jonathan-buttner Jul 22, 2025

Uh oh!

Uh oh!

Update inference API specification to include new Llama Service #5020

Are you sure you want to change the base?

Update inference API specification to include new Llama Service #5020

Conversation

Jan-Kazlouski-elastic commented Jul 22, 2025

Uh oh!

Jan-Kazlouski-elastic Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

Jan-Kazlouski-elastic commented Jul 22, 2025

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonathan-buttner Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!