discussion: serving common AI features #186

nacx · 2025-01-27T09:38:12Z

I would like to discuss whether serving common AI features would be within this project's scope.

A good example would be the /v1/models endpoint. This is not implemented by every AI provider, but it is very commonly used by applications that allow users to choose their desired model. Right now, the ext-proc filter would fail for requests that go to any endpoint that is not the chat completions endpoint (#115 was created to address this), but probably the project could do more to facilitate the adoption to existing apps that rely on such APIs.

In the case of the /v1/models endpoint, for example, it would make a lot of sense that the ai-gateway could serve the response for such requests based on what has been configured in the ConfigMap, returning those models that have been configured and are allowed to be used.

What is the general feeling about ai-gateway directly implementing common AI features?

The text was updated successfully, but these errors were encountered:

mathetake · 2025-01-27T18:53:20Z

cc @envoyproxy/ai-gateway-assignable @envoyproxy/ai-gateway-maintainers

wengyao04 · 2025-01-29T19:39:07Z

We currently have this requirement. In our POC solution, we have a service to return /v1/models and we create a envoyproxy backend pointing to this service. Wondering if AIGatewayRoute allow both AIServiceBckend and Backend ?

mathetake · 2025-01-29T19:44:42Z

yeah maybe adding a field of a list of available models to AIServiceBackend or AIGatewayRoute makes sense

wengyao04 · 2025-01-29T22:26:22Z

@yuzisun and I discuss about it and propose that gateway can aggregate this model information from AIServiceBackend + AIGatewayRoute and return immediate response.

In our use case, we can enrich the response from our internal DB to add more information

mathetake · 2025-01-29T23:05:08Z

yep, let's do this. @nacx want to raise an API PR for /models endpoints?

mathetake · 2025-01-29T23:06:17Z

and i wonder what other endpoints will fall into the similar style

missBerg · 2025-01-30T20:15:16Z

Yeah probably worth investigating. @nacx @wengyao04 @Krishanx92 - got any thoughts on what other information type endpoints would make sense here?

mathetake · 2025-02-06T16:36:18Z

@nacx do you have any update?

Rutledge · 2025-02-06T16:49:38Z

Sharing from the community meeting today:

This is a key use case for users. A common need would be:

User adds their API key for a provider to the gateway
User is provided the list of models they have access to via that provider
Per model information is available include per token pricing, input/output window sizes, capabilities of these models (tool use, vision, audio, etc)

mathetake · 2025-02-06T16:51:06Z

User adds their API key for a provider to the gateway

@Rutledge this is not relevant here as already implemented and working!

Rutledge · 2025-02-06T16:52:44Z

User adds their API key for a provider to the gateway

@Rutledge this is not relevant here as already implemented and working!

Sorry for not being clear 1-3 are a single user workflow/story!

mathetake · 2025-02-06T16:55:13Z

User is provided the list of models they have access to via that provider

For this, users are the ones defining matching rules so they have the explicit list of models rather than being given to them. Did you check the API as well as the example?

mathetake · 2025-02-06T16:56:33Z

sorry: i meant users == ones deploying Gateawy

Rutledge · 2025-02-10T18:30:00Z

Thanks for sharing the yaml- so then yes I think the request is the same as the @nacx. The API providers have methods for model listing and retrieval so what users (gateway deployers) would want is a method to lookup the models/configure the YAML based on what the APIs return:

**Commit Message** extproc: custom processors per path and serve /v1/models Refactors the server processing to allow registering custom Processors for different request paths, and adds a custom processor for requests to `/v1/models` that returns an immediate response based on the models that are configured in the filter configuration. **Related Issues/PRs (if applicable)** Related discussion: #186 --------- Signed-off-by: Ignasi Barrera <[email protected]>

) **Commit Message** extproc: custom processors per path and serve /v1/models Refactors the server processing to allow registering custom Processors for different request paths, and adds a custom processor for requests to `/v1/models` that returns an immediate response based on the models that are configured in the filter configuration. **Related Issues/PRs (if applicable)** Related discussion: envoyproxy#186 --------- Signed-off-by: Ignasi Barrera <[email protected]> Signed-off-by: Loong <[email protected]>

mathetake added the discussion label Jan 27, 2025

mathetake added this to the v0.2.0 milestone Jan 29, 2025

nacx mentioned this issue Feb 11, 2025

extproc: custom processors per path and serve /v1/models #325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discussion: serving common AI features #186

discussion: serving common AI features #186

nacx commented Jan 27, 2025

mathetake commented Jan 27, 2025 •

edited

Loading

wengyao04 commented Jan 29, 2025

mathetake commented Jan 29, 2025

wengyao04 commented Jan 29, 2025 •

edited

Loading

mathetake commented Jan 29, 2025 •

edited

Loading

mathetake commented Jan 29, 2025

missBerg commented Jan 30, 2025

mathetake commented Feb 6, 2025

Rutledge commented Feb 6, 2025

mathetake commented Feb 6, 2025

Rutledge commented Feb 6, 2025

mathetake commented Feb 6, 2025

mathetake commented Feb 6, 2025

Rutledge commented Feb 10, 2025 •

edited

Loading

discussion: serving common AI features #186

discussion: serving common AI features #186

Comments

nacx commented Jan 27, 2025

mathetake commented Jan 27, 2025 • edited Loading

wengyao04 commented Jan 29, 2025

mathetake commented Jan 29, 2025

wengyao04 commented Jan 29, 2025 • edited Loading

mathetake commented Jan 29, 2025 • edited Loading

mathetake commented Jan 29, 2025

missBerg commented Jan 30, 2025

mathetake commented Feb 6, 2025

Rutledge commented Feb 6, 2025

mathetake commented Feb 6, 2025

Rutledge commented Feb 6, 2025

mathetake commented Feb 6, 2025

mathetake commented Feb 6, 2025

Rutledge commented Feb 10, 2025 • edited Loading

mathetake commented Jan 27, 2025 •

edited

Loading

wengyao04 commented Jan 29, 2025 •

edited

Loading

mathetake commented Jan 29, 2025 •

edited

Loading

Rutledge commented Feb 10, 2025 •

edited

Loading