-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: list models only for active providers #3143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cf85fe9 to
1c39ba9
Compare
|
Please note, ALL APIs can use This is because |
51f8fdd to
78227a5
Compare
|
thanks @bparees ! I think this PR regardless is a good stop-gap. I might add some comments to your issue around the database inconsistencies |
78227a5 to
3f58050
Compare
15a0443 to
f91b2e4
Compare
c2b65a7 to
37246e3
Compare
There has been an error rolling around where we can retrieve a model when doing something like a chat completion but then we hit issues when trying to associate that model with an active provider.
This is a common thing that happens when:
1. you run the stack with say remote::ollama
2. you register a model, say llama3.2:3b
3. you do some completions, etc
4. you kill the server
5. you `unset OLLAMA_URL`
6. you re-start the stack
7. you do `llama-stack-client models list`
```
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding │ all-minilm │ all-minilm:l6-v2 │ {'embedding_dimension': 384.0, │ ollama │
│ │ │ │ 'context_length': 512.0} │ │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm │ llama3.2:3b │ llama3.2:3b │ │ ollama │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding │ ollama/all-minilm:l6-v2 │ all-minilm:l6-v2 │ {'embedding_dimension': 384.0, │ ollama │
│ │ │ │ 'context_length': 512.0} │ │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm │ ollama/llama3.2:3b │ llama3.2:3b │ │ ollama │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
```
This shouldn't be happening, `ollama` isn't a provider running, and the only reason the model is popping up is because its in the dist_registry (on disk).
While its nice to have this static store so that if I go and `export OLLAMA_URL=..` again, it can read from the store, it shouldn't _always_ be reading and returning these models from the store
now if you `llama-stack-client models list` with this change, no more llama3.2:3b appears.
Signed-off-by: Charlie Doern <[email protected]>
37246e3 to
14f96d7
Compare
|
@cdoern Thanks for adding this. Looks great! I can suggest adding a test case for inactive provider filtering and/or access control interactions. |
I agree with @mattf here. I think #3198 (or its subsequent iteration since even that isn't quite there yet) is the right direction to fix it. There are deeper problems with the registry state model and it is best if we fix them cleanly rather than adding a patch for now. I am going to close this PR as such. |
What does this PR do?
There has been an error rolling around where we can retrieve a model when doing something like a chat completion but then we hit issues when trying to associate that model with an active provider.
This is a common thing that happens when:
unset OLLAMA_URLllama-stack-client models list, or something likellama-stack-client inference chat-completion --message hiThis shouldn't be happening,
ollamaisn't a provider running, and the only reason the model is popping up is because its in the dist_registry (on disk).While its nice to have this static store so that if I go and
export OLLAMA_URL=..again, it can read from the store, it shouldn't always be reading and returning these models from the storenow if you
llama-stack-client models listwith this change, no more llama3.2:3b appears.This seems crucial to me, because for folks who might be less aware of our
OLLAMA_URL=..and other_URLenv variables being the thing that enables the provider, getting hit with zero output related to the disabled provider seems like a more straightforward way of saying "hey,ollamais not a provider you have running" as opposed to showing themollamaresources but not allowing them to use any of these resources for a chat completion.Test Plan
llama-stack-client models listafterunset OLLAMA_URL, should no longer seeollamamodels