Skip to content

Conversation

@cdoern
Copy link
Contributor

@cdoern cdoern commented Aug 14, 2025

What does this PR do?

There has been an error rolling around where we can retrieve a model when doing something like a chat completion but then we hit issues when trying to associate that model with an active provider.

This is a common thing that happens when:

  1. you run the stack with say remote::ollama
  2. you register a model, say llama3.2:3b
  3. you do some completions, etc
  4. you kill the server
  5. you unset OLLAMA_URL
  6. you re-start the stack
  7. you do llama-stack-client models list, or something like llama-stack-client inference chat-completion --message hi
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding     │ all-minilm                                                                       │ all-minilm:l6-v2                                                     │ {'embedding_dimension': 384.0,        │ ollama                   │
│               │                                                                                  │                                                                      │ 'context_length': 512.0}              │                          │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm           │ llama3.2:3b                                                                      │ llama3.2:3b                                                          │                                       │ ollama                   │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding     │ ollama/all-minilm:l6-v2                                                          │ all-minilm:l6-v2                                                     │ {'embedding_dimension': 384.0,        │ ollama                   │
│               │                                                                                  │                                                                      │ 'context_length': 512.0}              │                          │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm           │ ollama/llama3.2:3b                                                               │ llama3.2:3b                                                          │                                       │ ollama                   │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤

This shouldn't be happening, ollama isn't a provider running, and the only reason the model is popping up is because its in the dist_registry (on disk).

While its nice to have this static store so that if I go and export OLLAMA_URL=.. again, it can read from the store, it shouldn't always be reading and returning these models from the store

now if you llama-stack-client models list with this change, no more llama3.2:3b appears.

This seems crucial to me, because for folks who might be less aware of our OLLAMA_URL=.. and other _URL env variables being the thing that enables the provider, getting hit with zero output related to the disabled provider seems like a more straightforward way of saying "hey, ollama is not a provider you have running" as opposed to showing them ollama resources but not allowing them to use any of these resources for a chat completion.

Test Plan

llama-stack-client models list after unset OLLAMA_URL, should no longer see ollama models

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 14, 2025
@cdoern cdoern force-pushed the list-proper-models branch 4 times, most recently from cf85fe9 to 1c39ba9 Compare August 14, 2025 14:35
@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

Please note, ALL APIs can use get_all_with_type_filtered which filters out the results with inactive providers EXCEPT for vector_io.

This is because vector_io goes "across API boundaries" and queries all models in _get_first_embedding_model meaning that when the stack then checks if the model is in self.impls_by_provider_id it will fail because for vector_io impls_by_provider_id will return the vector_io providers not the inference providers that the model is associated with

@cdoern cdoern force-pushed the list-proper-models branch 2 times, most recently from 51f8fdd to 78227a5 Compare August 14, 2025 16:06
@bparees
Copy link

bparees commented Aug 14, 2025

@cdoern per the community discussion today i've opened this RFE related to model inclusion behavior: #3151

@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

thanks @bparees ! I think this PR regardless is a good stop-gap. I might add some comments to your issue around the database inconsistencies

@cdoern cdoern force-pushed the list-proper-models branch from 78227a5 to 3f58050 Compare August 14, 2025 17:52
@cdoern cdoern requested a review from ashwinb August 15, 2025 21:46
@cdoern cdoern force-pushed the list-proper-models branch 3 times, most recently from 15a0443 to f91b2e4 Compare August 18, 2025 00:09
@cdoern cdoern requested a review from yanxi0830 as a code owner August 18, 2025 00:09
@cdoern cdoern force-pushed the list-proper-models branch 2 times, most recently from c2b65a7 to 37246e3 Compare August 18, 2025 17:24
There has been an error rolling around where we can retrieve a model when doing something like a chat completion but then we hit issues when trying to associate that model with an active provider.

This is a common thing that happens when:
1. you run the stack with say remote::ollama
2. you register a model, say llama3.2:3b
3. you do some completions, etc
4. you kill the server
5. you `unset OLLAMA_URL`
6. you re-start the stack
7. you do `llama-stack-client models list`

```
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding     │ all-minilm                                                                       │ all-minilm:l6-v2                                                     │ {'embedding_dimension': 384.0,        │ ollama                   │
│               │                                                                                  │                                                                      │ 'context_length': 512.0}              │                          │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm           │ llama3.2:3b                                                                      │ llama3.2:3b                                                          │                                       │ ollama                   │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ embedding     │ ollama/all-minilm:l6-v2                                                          │ all-minilm:l6-v2                                                     │ {'embedding_dimension': 384.0,        │ ollama                   │
│               │                                                                                  │                                                                      │ 'context_length': 512.0}              │                          │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤
│ llm           │ ollama/llama3.2:3b                                                               │ llama3.2:3b                                                          │                                       │ ollama                   │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────────────────────────────────────┼──────────────────────────┤

```

This shouldn't be happening, `ollama` isn't a provider running, and the only reason the model is popping up is because its in the dist_registry (on disk).

While its nice to have this static store so that if I go and `export OLLAMA_URL=..` again, it can read from the store, it shouldn't _always_ be reading and returning these models from the store

now if you `llama-stack-client models list` with this change, no more llama3.2:3b appears.

Signed-off-by: Charlie Doern <[email protected]>
@cdoern cdoern force-pushed the list-proper-models branch from 37246e3 to 14f96d7 Compare August 18, 2025 20:28
@skamenan7
Copy link
Contributor

@cdoern Thanks for adding this. Looks great! I can suggest adding a test case for inactive provider filtering and/or access control interactions.

@mattf
Copy link
Collaborator

mattf commented Aug 19, 2025

@cdoern will #3198 also address this?

@cdoern
Copy link
Contributor Author

cdoern commented Aug 19, 2025

@cdoern will #3198 also address this?

Haven't reviewed that yet but I'd imagine it should. Though these checks are probably still a good safeguard.

@mattf
Copy link
Collaborator

mattf commented Aug 20, 2025

@cdoern will #3198 also address this?

Haven't reviewed that yet but I'd imagine it should. Though these checks are probably still a good safeguard.

i'm concerned these safeguards could obscure a bug in the cleanup code.

@ashwinb
Copy link
Contributor

ashwinb commented Aug 21, 2025

@cdoern will #3198 also address this?
i'm concerned these safeguards could obscure a bug in the cleanup code.

I agree with @mattf here. I think #3198 (or its subsequent iteration since even that isn't quite there yet) is the right direction to fix it. There are deeper problems with the registry state model and it is best if we fix them cleanly rather than adding a patch for now. I am going to close this PR as such.

@ashwinb ashwinb closed this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants