Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion: serving common AI features #186

Open
nacx opened this issue Jan 27, 2025 · 14 comments
Open

discussion: serving common AI features #186

nacx opened this issue Jan 27, 2025 · 14 comments
Labels
discussion To be discussed in community
Milestone

Comments

@nacx
Copy link
Contributor

nacx commented Jan 27, 2025

I would like to discuss whether serving common AI features would be within this project's scope.

A good example would be the /v1/models endpoint. This is not implemented by every AI provider, but it is very commonly used by applications that allow users to choose their desired model. Right now, the ext-proc filter would fail for requests that go to any endpoint that is not the chat completions endpoint (#115 was created to address this), but probably the project could do more to facilitate the adoption to existing apps that rely on such APIs.

In the case of the /v1/models endpoint, for example, it would make a lot of sense that the ai-gateway could serve the response for such requests based on what has been configured in the ConfigMap, returning those models that have been configured and are allowed to be used.

What is the general feeling about ai-gateway directly implementing common AI features?

@mathetake
Copy link
Member

mathetake commented Jan 27, 2025

cc @envoyproxy/ai-gateway-assignable @envoyproxy/ai-gateway-maintainers

@mathetake mathetake added the discussion To be discussed in community label Jan 27, 2025
@wengyao04
Copy link
Contributor

We currently have this requirement. In our POC solution, we have a service to return /v1/models and we create a envoyproxy backend pointing to this service. Wondering if AIGatewayRoute allow both AIServiceBckend and Backend ?

@mathetake
Copy link
Member

yeah maybe adding a field of a list of available models to AIServiceBackend or AIGatewayRoute makes sense

@wengyao04
Copy link
Contributor

wengyao04 commented Jan 29, 2025

@yuzisun and I discuss about it and propose that gateway can aggregate this model information from AIServiceBackend + AIGatewayRoute and return immediate response.

In our use case, we can enrich the response from our internal DB to add more information

@mathetake
Copy link
Member

mathetake commented Jan 29, 2025

yep, let's do this. @nacx want to raise an API PR for /models endpoints?

@mathetake mathetake added this to the v0.2.0 milestone Jan 29, 2025
@mathetake
Copy link
Member

and i wonder what other endpoints will fall into the similar style

@missBerg
Copy link
Contributor

Yeah probably worth investigating. @nacx @wengyao04 @Krishanx92 - got any thoughts on what other information type endpoints would make sense here?

@mathetake
Copy link
Member

@nacx do you have any update?

@Rutledge
Copy link

Rutledge commented Feb 6, 2025

Sharing from the community meeting today:

This is a key use case for users. A common need would be:

  1. User adds their API key for a provider to the gateway
  2. User is provided the list of models they have access to via that provider
  3. Per model information is available include per token pricing, input/output window sizes, capabilities of these models (tool use, vision, audio, etc)

@mathetake
Copy link
Member

User adds their API key for a provider to the gateway

@Rutledge this is not relevant here as already implemented and working!

@Rutledge
Copy link

Rutledge commented Feb 6, 2025

User adds their API key for a provider to the gateway

@Rutledge this is not relevant here as already implemented and working!

Sorry for not being clear 1-3 are a single user workflow/story!

@mathetake
Copy link
Member

User is provided the list of models they have access to via that provider

For this, users are the ones defining matching rules so they have the explicit list of models rather than being given to them. Did you check the API as well as the example?

@mathetake
Copy link
Member

sorry: i meant users == ones deploying Gateawy

@Rutledge
Copy link

Rutledge commented Feb 10, 2025

Thanks for sharing the yaml- so then yes I think the request is the same as the @nacx. The API providers have methods for model listing and retrieval so what users (gateway deployers) would want is a method to lookup the models/configure the YAML based on what the APIs return:

mathetake pushed a commit that referenced this issue Feb 12, 2025
**Commit Message**

extproc: custom processors per path and serve /v1/models

Refactors the server processing to allow registering custom Processors
for different request paths,
and adds a custom processor for requests to `/v1/models` that returns an
immediate response based
on the models that are configured in the filter configuration.

**Related Issues/PRs (if applicable)**

Related discussion: #186

---------

Signed-off-by: Ignasi Barrera <[email protected]>
daixiang0 pushed a commit to daixiang0/ai-gateway that referenced this issue Feb 19, 2025
)

**Commit Message**

extproc: custom processors per path and serve /v1/models

Refactors the server processing to allow registering custom Processors
for different request paths,
and adds a custom processor for requests to `/v1/models` that returns an
immediate response based
on the models that are configured in the filter configuration.

**Related Issues/PRs (if applicable)**

Related discussion: envoyproxy#186

---------

Signed-off-by: Ignasi Barrera <[email protected]>
Signed-off-by: Loong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion To be discussed in community
Projects
None yet
Development

No branches or pull requests

5 participants