fix: list models only for active providers #3143

cdoern · 2025-08-14T12:14:59Z

What does this PR do?

There has been an error rolling around where we can retrieve a model when doing something like a chat completion but then we hit issues when trying to associate that model with an active provider.

This is a common thing that happens when:

you run the stack with say remote::ollama
you register a model, say llama3.2:3b
you do some completions, etc
you kill the server
you unset OLLAMA_URL
you re-start the stack
you do llama-stack-client models list, or something like llama-stack-client inference chat-completion --message hi

├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding     │ all-minilm                                                                       │ all-minilm:l6-v2                                                     │ {'embedding_dimension': 384.0,        │ ollama                   │
│               │                                                                                  │                                                                      │ 'context_length': 512.0}              │                          │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm           │ llama3.2:3b                                                                      │ llama3.2:3b                                                          │                                       │ ollama                   │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding     │ ollama/all-minilm:l6-v2                                                          │ all-minilm:l6-v2                                                     │ {'embedding_dimension': 384.0,        │ ollama                   │
│               │                                                                                  │                                                                      │ 'context_length': 512.0}              │                          │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm           │ ollama/llama3.2:3b                                                               │ llama3.2:3b                                                          │                                       │ ollama                   │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤

This shouldn't be happening, ollama isn't a provider running, and the only reason the model is popping up is because its in the dist_registry (on disk).

While its nice to have this static store so that if I go and export OLLAMA_URL=.. again, it can read from the store, it shouldn't always be reading and returning these models from the store

now if you llama-stack-client models list with this change, no more llama3.2:3b appears.

This seems crucial to me, because for folks who might be less aware of our OLLAMA_URL=.. and other _URL env variables being the thing that enables the provider, getting hit with zero output related to the disabled provider seems like a more straightforward way of saying "hey, ollama is not a provider you have running" as opposed to showing them ollama resources but not allowing them to use any of these resources for a chat completion.

Test Plan

llama-stack-client models list after unset OLLAMA_URL, should no longer see ollama models

cdoern · 2025-08-14T14:36:45Z

Please note, ALL APIs can use get_all_with_type_filtered which filters out the results with inactive providers EXCEPT for vector_io.

This is because vector_io goes "across API boundaries" and queries all models in _get_first_embedding_model meaning that when the stack then checks if the model is in self.impls_by_provider_id it will fail because for vector_io impls_by_provider_id will return the vector_io providers not the inference providers that the model is associated with

bparees · 2025-08-14T17:18:31Z

@cdoern per the community discussion today i've opened this RFE related to model inclusion behavior: #3151

cdoern · 2025-08-14T17:22:56Z

thanks @bparees ! I think this PR regardless is a good stop-gap. I might add some comments to your issue around the database inconsistencies

llama_stack/core/routing_tables/common.py

There has been an error rolling around where we can retrieve a model when doing something like a chat completion but then we hit issues when trying to associate that model with an active provider. This is a common thing that happens when: 1. you run the stack with say remote::ollama 2. you register a model, say llama3.2:3b 3. you do some completions, etc 4. you kill the server 5. you `unset OLLAMA_URL` 6. you re-start the stack 7. you do `llama-stack-client models list` ``` ├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤ │ embedding │ all-minilm │ all-minilm:l6-v2 │ {'embedding_dimension': 384.0, │ ollama │ │ │ │ │ 'context_length': 512.0} │ │ ├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤ │ llm │ llama3.2:3b │ llama3.2:3b │ │ ollama │ ├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤ │ embedding │ ollama/all-minilm:l6-v2 │ all-minilm:l6-v2 │ {'embedding_dimension': 384.0, │ ollama │ │ │ │ │ 'context_length': 512.0} │ │ ├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤ │ llm │ ollama/llama3.2:3b │ llama3.2:3b │ │ ollama │ ├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤ ``` This shouldn't be happening, `ollama` isn't a provider running, and the only reason the model is popping up is because its in the dist_registry (on disk). While its nice to have this static store so that if I go and `export OLLAMA_URL=..` again, it can read from the store, it shouldn't _always_ be reading and returning these models from the store now if you `llama-stack-client models list` with this change, no more llama3.2:3b appears. Signed-off-by: Charlie Doern <[email protected]>

skamenan7 · 2025-08-18T20:50:58Z

@cdoern Thanks for adding this. Looks great! I can suggest adding a test case for inactive provider filtering and/or access control interactions.

mattf · 2025-08-19T16:22:49Z

@cdoern will #3198 also address this?

cdoern · 2025-08-19T16:48:20Z

@cdoern will #3198 also address this?

Haven't reviewed that yet but I'd imagine it should. Though these checks are probably still a good safeguard.

mattf · 2025-08-20T15:20:28Z

@cdoern will #3198 also address this?

Haven't reviewed that yet but I'd imagine it should. Though these checks are probably still a good safeguard.

i'm concerned these safeguards could obscure a bug in the cleanup code.

ashwinb · 2025-08-21T23:30:13Z

@cdoern will #3198 also address this?
i'm concerned these safeguards could obscure a bug in the cleanup code.

I agree with @mattf here. I think #3198 (or its subsequent iteration since even that isn't quite there yet) is the right direction to fix it. There are deeper problems with the registry state model and it is best if we fix them cleanly rather than adding a patch for now. I am going to close this PR as such.

cdoern requested review from ashwinb, bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1 and terrytangyuan as code owners August 14, 2025 12:15

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 14, 2025

cdoern force-pushed the list-proper-models branch 4 times, most recently from cf85fe9 to 1c39ba9 Compare August 14, 2025 14:35

cdoern force-pushed the list-proper-models branch 2 times, most recently from 51f8fdd to 78227a5 Compare August 14, 2025 16:06

cdoern force-pushed the list-proper-models branch from 78227a5 to 3f58050 Compare August 14, 2025 17:52

ashwinb reviewed Aug 15, 2025

View reviewed changes

llama_stack/core/routing_tables/common.py Show resolved Hide resolved

cdoern requested a review from ashwinb August 15, 2025 21:46

cdoern force-pushed the list-proper-models branch 3 times, most recently from 15a0443 to f91b2e4 Compare August 18, 2025 00:09

cdoern requested a review from yanxi0830 as a code owner August 18, 2025 00:09

cdoern mentioned this pull request Aug 18, 2025

Model cache is not cleared when model list in run.yaml is changed #3150

Open

2 tasks

cdoern force-pushed the list-proper-models branch 2 times, most recently from c2b65a7 to 37246e3 Compare August 18, 2025 17:24

cdoern force-pushed the list-proper-models branch from 37246e3 to 14f96d7 Compare August 18, 2025 20:28

ashwinb closed this Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: list models only for active providers #3143

fix: list models only for active providers #3143

Uh oh!

cdoern commented Aug 14, 2025 •

edited

Loading

Uh oh!

cdoern commented Aug 14, 2025 •

edited

Loading

Uh oh!

bparees commented Aug 14, 2025

Uh oh!

cdoern commented Aug 14, 2025

Uh oh!

Uh oh!

skamenan7 commented Aug 18, 2025

Uh oh!

mattf commented Aug 19, 2025

Uh oh!

cdoern commented Aug 19, 2025

Uh oh!

mattf commented Aug 20, 2025

Uh oh!

ashwinb commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: list models only for active providers #3143

fix: list models only for active providers #3143

Uh oh!

Conversation

cdoern commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

cdoern commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bparees commented Aug 14, 2025

Uh oh!

cdoern commented Aug 14, 2025

Uh oh!

Uh oh!

skamenan7 commented Aug 18, 2025

Uh oh!

mattf commented Aug 19, 2025

Uh oh!

cdoern commented Aug 19, 2025

Uh oh!

mattf commented Aug 20, 2025

Uh oh!

ashwinb commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cdoern commented Aug 14, 2025 •

edited

Loading

cdoern commented Aug 14, 2025 •

edited

Loading