-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion: serving common AI features #186
Comments
cc @envoyproxy/ai-gateway-assignable @envoyproxy/ai-gateway-maintainers |
We currently have this requirement. In our POC solution, we have a service to return /v1/models and we create a envoyproxy backend pointing to this service. Wondering if AIGatewayRoute allow both AIServiceBckend and Backend ? |
yeah maybe adding a field of a list of available models to AIServiceBackend or AIGatewayRoute makes sense |
@yuzisun and I discuss about it and propose that gateway can aggregate this model information from AIServiceBackend + AIGatewayRoute and return immediate response. In our use case, we can enrich the response from our internal DB to add more information |
yep, let's do this. @nacx want to raise an API PR for |
and i wonder what other endpoints will fall into the similar style |
Yeah probably worth investigating. @nacx @wengyao04 @Krishanx92 - got any thoughts on what other information type endpoints would make sense here? |
@nacx do you have any update? |
Sharing from the community meeting today: This is a key use case for users. A common need would be:
|
@Rutledge this is not relevant here as already implemented and working! |
Sorry for not being clear 1-3 are a single user workflow/story! |
For this, users are the ones defining matching rules so they have the explicit list of models rather than being given to them. Did you check the API as well as the example? |
sorry: i meant users == ones deploying Gateawy |
Thanks for sharing the yaml- so then yes I think the request is the same as the @nacx. The API providers have methods for model listing and retrieval so what users (gateway deployers) would want is a method to lookup the models/configure the YAML based on what the APIs return: |
**Commit Message** extproc: custom processors per path and serve /v1/models Refactors the server processing to allow registering custom Processors for different request paths, and adds a custom processor for requests to `/v1/models` that returns an immediate response based on the models that are configured in the filter configuration. **Related Issues/PRs (if applicable)** Related discussion: #186 --------- Signed-off-by: Ignasi Barrera <[email protected]>
) **Commit Message** extproc: custom processors per path and serve /v1/models Refactors the server processing to allow registering custom Processors for different request paths, and adds a custom processor for requests to `/v1/models` that returns an immediate response based on the models that are configured in the filter configuration. **Related Issues/PRs (if applicable)** Related discussion: envoyproxy#186 --------- Signed-off-by: Ignasi Barrera <[email protected]> Signed-off-by: Loong <[email protected]>
I would like to discuss whether serving common AI features would be within this project's scope.
A good example would be the
/v1/models
endpoint. This is not implemented by every AI provider, but it is very commonly used by applications that allow users to choose their desired model. Right now, the ext-proc filter would fail for requests that go to any endpoint that is not the chat completions endpoint (#115 was created to address this), but probably the project could do more to facilitate the adoption to existing apps that rely on such APIs.In the case of the
/v1/models
endpoint, for example, it would make a lot of sense that the ai-gateway could serve the response for such requests based on what has been configured in the ConfigMap, returning those models that have been configured and are allowed to be used.What is the general feeling about ai-gateway directly implementing common AI features?
The text was updated successfully, but these errors were encountered: