-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Update Llama Integration to support OpenAI API for embeddings and support user task setting field and dimensions service setting field #131827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…upport user field
…dingsModel for enhanced customization
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
Pinging @elastic/ml-core (Team:ML) |
@@ -20,15 +22,17 @@ public interface LlamaActionVisitor { | |||
* Creates an executable action for the given Llama embeddings model. | |||
* | |||
* @param model the Llama embeddings model | |||
* @param taskSettings the settings for the task, which may include parameters like user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can you remove ", which may include parameters like user". Just to avoid having to update this in the future if the user parameter is removed from valid task settings. The same comment applies for other comments across these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Fixed here and in other places that mention "user" task setting specifically.
@@ -48,6 +51,7 @@ public LlamaChatCompletionModel( | |||
taskType, | |||
service, | |||
LlamaChatCompletionServiceSettings.fromMap(serviceSettings, context), | |||
OpenAiChatCompletionTaskSettings.fromMap(taskSettings), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we only able to send user
as a task setting for completion or does Llama have some extra settings we can support that OpenAI does not support? Also do we already know that Llama will always support all task settings that OpenAI supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we only able to send user as a task setting for completion or does Llama have some extra settings we can support that OpenAI does not support?
According to Llama Stack Specification - they don't support anything on top of OpenAI "user" task parameter.
https://llama-stack.readthedocs.io/en/latest/references/api_reference/index.html#/paths/v1-openai-v1-embeddings/post
https://llama-stack.readthedocs.io/en/latest/references/api_reference/index.html#/paths/v1-openai-v1-chat-completions/post
Also do we already know that Llama will always support all task settings that OpenAI supports?
Llama Stack inference API is under active development and we cannot make any solid predictions. However according to their API reference - they do support OpenAI's "user" task setting, the only task setting in OpenAI integration.
@@ -50,6 +54,7 @@ public LlamaEmbeddingsModel( | |||
taskType, | |||
service, | |||
LlamaEmbeddingsServiceSettings.fromMap(serviceSettings, context), | |||
OpenAiEmbeddingsTaskSettings.fromMap(taskSettings, context), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question regarding whether there are other Llama specific settings we may want to support/whether it is known that Llama will always support all OpenAI embeddings task settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer.
According to Llama Stack Specification - they don't support anything on top of OpenAI "user" task parameter.
https://llama-stack.readthedocs.io/en/latest/references/api_reference/index.html#/paths/v1-openai-v1-embeddings/post
https://llama-stack.readthedocs.io/en/latest/references/api_reference/index.html#/paths/v1-openai-v1-chat-completions/post
Llama Stack inference API is under active development and we cannot make any solid predictions. However according to their API reference - they do support OpenAI's "user" task setting, the only task setting in OpenAI integration.
Strings.toString(new LlamaEmbeddingsRequestEntity(model.getServiceSettings().modelId(), truncationResult.input())) | ||
.getBytes(StandardCharsets.UTF_8) | ||
Strings.toString( | ||
new OpenAiEmbeddingsRequestEntity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we deprecating support for the hugging face API entirely? Is there any reason why someone might want to use it over the OpenAI API? Are there any existing users of the hugging face API for which this could cause breaking changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we deprecating support for the hugging face API entirely?
This PR deprecates Hugging Face API support, correct.
Is there any reason why someone might want to use it over the OpenAI API?
There is no visible reason for someone to choose Hugging Face API over the OpenAI API. Hugging Face Embedding API is not superior in any way compared to OpenAI Embedding API.
Are there any existing users of the hugging face API for which this could cause breaking changes?
Despite Llama integration PR being merged last week, specification changes that describe the change are not yet merged. So according to documentation - Llama is not yet supported as a provider. So there shouldn't be any existing users that would have integrated with Elastic Llama Inference Provider.
); | ||
} | ||
dimensionsSetByUser = dimensions != null; | ||
} else if (context == ConfigurationParseContext.PERSISTENT && dimensionsSetByUser == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, is this only going to happen for endpoints created before these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. In practice, there shouldn’t be any endpoints created before these changes, since Llama Integration specification hasn’t been merged yet. I’ve updated the logic to remove the default value, as users are not aware that Llama integration is available while the documentation is still pending.
} | ||
|
||
@SuppressWarnings("unchecked") | ||
private void assertEmbeddingsRequest() throws IOException { | ||
private void assertEmbeddingsRequest(String user) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this doesn't test any of the cases where dimensions are set either by the user or the system. Can we add tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Tests added. Validations for requestMap for dimensions are there.
@@ -11,39 +11,43 @@ | |||
import org.elasticsearch.inference.EmptySecretSettings; | |||
import org.elasticsearch.inference.TaskType; | |||
import org.elasticsearch.test.ESTestCase; | |||
import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsTaskSettings; | |||
import org.elasticsearch.xpack.inference.services.settings.DefaultSecretSettings; | |||
|
|||
import static org.elasticsearch.xpack.inference.chunking.ChunkingSettingsTests.createRandomChunkingSettings; | |||
|
|||
public class LlamaEmbeddingsModelTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for the of
function that was added in this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests for this "of" function.
@@ -20,38 +21,41 @@ | |||
|
|||
public class LlamaChatCompletionModelTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have testing for overriding the user field in a model? Both if the user is set and we override to another user or if it's not set and we override to a user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests for this "of" function.
@@ -39,6 +39,8 @@ public class LlamaEmbeddingsServiceSettingsTests extends AbstractWireSerializing | |||
private static final SimilarityMeasure SIMILARITY_MEASURE = SimilarityMeasure.DOT_PRODUCT; | |||
private static final int MAX_INPUT_TOKENS = 128; | |||
private static final int RATE_LIMIT = 2; | |||
private static final Boolean DIMENSIONS_SET_BY_USER = Boolean.TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests where the dimensions are not set by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we do have such tests
@@ -27,45 +27,84 @@ | |||
public class LlamaEmbeddingsRequestTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests to ensure that dimensions are handled properly when they are set by the user/when they are not set by the user? It seems since we are reusing the OpenAIEmbeddingsRequestEntity there is logic that changes how the request is formed across these cases (see here) so we should test that here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic related to dimensions is located in OpenAIEmbeddingsRequestEntity and tested in OpenAIEmbeddingsRequestEntityTests. However I added tests here as well.
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
Hello @dan-rubinstein
|
There are few issues with current Llama stack integration.
dimensions
field from service settings is not being sent to the model because it is only possible in OpenAI embedding API.user
field cannot be sent to the model as part ofembeddings
andcompletion
service requests.This update:
dimensions
field from service settings to the model as part of OpenAI style API request structure for embedding services.user
field as part of task settings so it would be sent as part of OpenAI style API request structure for embedding and completion services.Here are examples of requests and responses Llama model receives/returns (not elastic search inference API request/response structure)
Current Embedding Request Structure (Hugging Face style API):
Current Embedding Response Structure (Hugging Face style API):
New Embedding Request Structure (OpenAI style API):
New Embedding Response Structure (OpenAI style API):
Testing was performed for changed scenarios and here are the testing results:
Create Embedding with `dimensions` field set by user (dimensions of result is correct):
Create Embedding with `dimensions` field set by user (`dimensions` of result are NOT correct):
Perform Embedding with `user` field set:
Perform Non-Streaming Completion with `user` field set:
Perform Streaming Completion with `user` field set:
gradle check
?