feat(registry): enrich model metadata with OpenRouter context_length#2880
feat(registry): enrich model metadata with OpenRouter context_length#2880dwillitzer wants to merge 2 commits intorouter-for-me:devfrom
Conversation
Adds a background enrichment job that populates `context_length` metadata for registered models by fetching OpenRouter's public models catalog (https://openrouter.ai/api/v1/models). Runs once on startup and then refreshes every 24 hours. ## Why Clients that surface model capabilities (e.g. "supports 1M context") currently rely on hand-maintained entries in internal/registry/models/ models.json. OpenRouter already publishes authoritative context_length values for every routed model; consuming that upstream eliminates the maintenance burden for any model OpenRouter fronts and keeps the catalog accurate as providers bump windows. ## Behavior - Startup fetch populates the in-memory enrichment store; if the fetch fails (network, upstream outage, etc.) the registry falls back to the static JSON with no service impact. - 24-hour periodic refresh keeps the cache fresh without hot-pathing the OpenRouter API. - Management endpoint exposes enriched metadata so the UI and clients can display accurate context windows. - model_registry.RefreshModels() now writes through to the live registration so the enriched count is observable and returned. ## Timeout pattern Uses `http.Client{Timeout: 30s}` + `context.WithTimeout` — structurally identical to the existing precedent at `internal/misc/antigravity_version.go` (commits 3774b56, 8d5e470), which was accepted upstream for the same class of periodic metadata fetch. Happy to tighten the timeout or rewrite the pattern if the maintainer prefers — just point at the approved shape. ## Scope This PR is intentionally scoped to OpenRouter enrichment only. The temporal-anti-drift feature that was previously bundled in the same branch is now split out to router-for-me#2574. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c491cf9050
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| // GetModelsHealth returns comprehensive health information for all registered models. | ||
| func (h *Handler) GetModelsHealth(c *gin.Context) { |
There was a problem hiding this comment.
Register new model-health management routes
This commit adds GetModelsHealth/RefreshModels handlers, but they are never reachable because the management route table is unchanged (checked internal/api/server.go, which still only wires /model-definitions/:channel in this area). As shipped, operators cannot call the new health or manual refresh APIs, so the feature added here is effectively inaccessible in production.
Useful? React with 👍 / 👎.
| func StartOpenRouterEnrichment(ctx context.Context) { | ||
| enrichmentOnce.Do(func() { | ||
| go runOpenRouterEnrichment(ctx) | ||
| }) |
There was a problem hiding this comment.
Start OpenRouter enrichment during service startup
StartOpenRouterEnrichment is introduced but never invoked anywhere in the repository (repo-wide search for StartOpenRouterEnrichment( returns only this definition), so the promised startup fetch and 24-hour refresh loop never run. Without a startup call, last_refresh stays zero and context enrichment will not occur automatically.
Useful? React with 👍 / 👎.
| } | ||
| reg.Info.ContextLength = contextLength | ||
| return true | ||
| } |
There was a problem hiding this comment.
Invalidate available-model cache on context updates
SetModelContextLength mutates live model metadata but does not invalidate availableModelsCache. GetAvailableModels serves cached copies and can keep them indefinitely when expiresAt is zero, so enrichment performed after cache warm-up will not be reflected in /models responses until some unrelated registry mutation clears cache.
Useful? React with 👍 / 👎.
| for modelID, ctxLen := range openRouterContextLengths { | ||
| openRouterStore.contextLength[modelID] = ctxLen | ||
| } |
There was a problem hiding this comment.
Store enrichment-source keys using registry model IDs
The enrichment store is populated with raw OpenRouter IDs (openai/..., anthropic/...), but source lookups later use local registry IDs (gpt-..., claude-...). For models matched via the heuristic path, GetOpenRouterContextLengthSource will miss and BuildModelSources reports the wrong source, so management health output can claim provider/static even when context came from OpenRouter.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a metadata enrichment system that fetches model context lengths from OpenRouter and adds new administrative endpoints for monitoring model health and triggering manual refreshes. It also enhances the model registry to support static metadata lookups and includes additional context window information in API responses. The review feedback identifies several critical issues, including overly aggressive fuzzy matching for model IDs that could lead to incorrect metadata and a bug in the refresh endpoint that incorrectly returns an error when no new models are enriched. Additionally, improvements were suggested for performance and efficiency, such as reusing HTTP clients, utilizing memory-efficient JSON decoding, optimizing static metadata lookups with maps, and ensuring enrichment applies to all registered models regardless of their current health status.
| if strings.Contains(orID, modelID) || strings.Contains(modelID, orID) { | ||
| // Extract the base name from OpenRouter ID (after last slash) | ||
| orBase := orID | ||
| if slashIdx := strings.LastIndex(orID, "/"); slashIdx >= 0 { | ||
| orBase = orID[slashIdx+1:] | ||
| } | ||
| // Check if local ID matches the base or is a prefix/suffix | ||
| if orBase == modelID || strings.HasPrefix(orBase, modelID) || strings.HasPrefix(modelID, orBase) { | ||
| ctxLen = cl | ||
| found = true | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The fuzzy matching logic using strings.HasPrefix on the base name is too aggressive and will lead to incorrect metadata enrichment. For example, if the local modelID is gpt-4 and OpenRouter has openai/gpt-4o, the orBase will be gpt-4o. strings.HasPrefix("gpt-4o", "gpt-4") will return true, causing the standard gpt-4 (8k context) to be enriched with gpt-4o (128k context) metadata. Matching should be restricted to exact matches of the base name or a curated mapping.
for orID, cl := range openRouterContextLengths {
orBase := orID
if slashIdx := strings.LastIndex(orID, "/"); slashIdx >= 0 {
orBase = orID[slashIdx+1:]
}
if orBase == modelID {
ctxLen = cl
found = true
break
}
}| if count == 0 { | ||
| c.JSON(http.StatusBadRequest, gin.H{"error": "no models registered"}) | ||
| return | ||
| } |
There was a problem hiding this comment.
The logic here incorrectly assumes that if TriggerOpenRouterRefresh returns 0, it means no models are registered. However, TriggerOpenRouterRefresh returns the number of newly enriched models. If all registered models have already been enriched in a previous run, this will return 0 even if many models are registered. This results in a misleading 400 error for subsequent refresh requests.
| if count == 0 { | |
| c.JSON(http.StatusBadRequest, gin.H{"error": "no models registered"}) | |
| return | |
| } | |
| if count == 0 && len(registry.GetGlobalRegistry().GetAllRegisteredModels("openai")) == 0 { | |
| c.JSON(http.StatusBadRequest, gin.H{"error": "no models registered"}) | |
| return | |
| } |
| if model == nil || model.ID == "" { | ||
| continue | ||
| } | ||
| if staticInfo := LookupStaticModelInfo(model.ID); staticInfo != nil { |
There was a problem hiding this comment.
LookupStaticModelInfo is called inside a loop within RegisterClient. Since LookupStaticModelInfo performs a linear search over multiple slices in models.json, this results in O(N*M) complexity where N is the number of models being registered and M is the total number of static models. Given that RegisterClient is called frequently (e.g., for every credential file), this could become a performance bottleneck. Consider using a map for static model lookups.
| // registered models that lack context_length metadata. | ||
| // Returns the number of models actually enriched. | ||
| func fetchAndEnrichOpenRouter(ctx context.Context) int { | ||
| client := &http.Client{Timeout: openRouterFetchTimeout} |
There was a problem hiding this comment.
A new http.Client is created for every enrichment fetch. This is inefficient as it prevents the reuse of TCP connections (keep-alive). It is better to define a single package-level client or pass one in.
var enrichmentClient = &http.Client{Timeout: openRouterFetchTimeout}
func fetchAndEnrichOpenRouter(ctx context.Context) int {
client := enrichmentClient| } | ||
|
|
||
| var parsed openRouterModelsResponse | ||
| if err := json.Unmarshal(data, &parsed); err != nil { |
There was a problem hiding this comment.
Using io.ReadAll to load the entire API response into memory before unmarshaling can be memory-intensive if the OpenRouter models catalog grows significantly. Using json.NewDecoder with the response body is more memory-efficient.
| if err := json.Unmarshal(data, &parsed); err != nil { | |
| var parsed openRouterModelsResponse | |
| if err := json.NewDecoder(resp.Body).Decode(&parsed); err != nil { | |
| log.Warnf("OpenRouter enrichment: parse failed: %v", err) | |
| return 0 | |
| } |
| } | ||
|
|
||
| // Get all registered models and enrich those lacking context_length | ||
| allModels := registry.GetAvailableModels("openai") |
There was a problem hiding this comment.
GetAvailableModels only returns models that currently have active, non-suspended clients. This means that registered models that are temporarily unhealthy or have no clients yet will not be enriched with metadata. Enrichment should ideally apply to all registered models to ensure metadata is ready when they become available.
| allModels := registry.GetAvailableModels("openai") | |
| allModels := registry.GetAllRegisteredModels("openai") |
…chment Addresses all P1 and critical-severity findings from the initial Codex and Gemini bot reviews: 1. [P1, Codex] Wire StartOpenRouterEnrichment at startup cmd/server/main.go now calls registry.StartOpenRouterEnrichment in both the standalone-TUI path and the main service path (gated by !localModel, matching StartModelsUpdater). Previously the function was defined but never invoked anywhere in the tree — the feature was dead code. 2. [P1, Codex] Register management routes internal/api/server.go registers GET /v0/management/models/health and POST /v0/management/models/refresh so the new handlers are reachable. 3. [P1, Codex] Invalidate available-models cache on enrichment ModelRegistry.SetModelContextLength now calls invalidateAvailableModelsCacheLocked() when the context_length actually changes, so subsequent GetAvailableModels reads reflect the new value instead of serving a stale cached snapshot. 4. [high, Gemini] Correct RefreshModels semantics TriggerOpenRouterRefresh returns the number of *newly* enriched models on the call; a zero return is a legitimate outcome (everything already enriched) and must not be reported as `"no models registered"`. Handler now always returns 200 with the count. 5. [critical, Gemini] Tighten fuzzy model-ID matching The previous substring+prefix heuristic would incorrectly attach context_length values across model families — "gpt-4" matching "gpt-4o", "claude-3" matching "claude-3-opus", etc. Matching is now exact-only, either on the full OpenRouter ID or on the base name after the last slash. Wrong-model enrichment is worse than no enrichment. 6. [medium, Gemini] Reuse a package-level http.Client and stream-decode Single long-lived enrichmentClient replaces per-call http.Client so TCP keep-alive is reused; json.NewDecoder streams the response instead of buffering the full body into memory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all P1 and critical-severity findings in
Remaining Gemini medium items I'd like to defer or discuss:
Re-review when you have time. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86e1b0b42a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| case strings.HasPrefix(modelID, "gemini-"), strings.HasPrefix(modelID, "models/gemini-"): | ||
| sources[modelID] = "static" |
There was a problem hiding this comment.
Detect enrichment source for Gemini model IDs
BuildModelSources hard-codes every gemini-*/models/gemini-* model as "static", so /v0/management/models/health misreports provenance after OpenRouter enrichment. This is reachable because enrichModelsFromOpenRouter can fill missing context_length for Gemini IDs (many registry Gemini entries start without context_length), but the source map will still claim static metadata, which gives operators incorrect health/debug information.
Useful? React with 👍 / 👎.
Summary
Adds a background enrichment job that populates
context_lengthmetadata for registered models by fetching OpenRouter's public models catalog (https://openrouter.ai/api/v1/models). Runs once on startup and then refreshes every 24 hours.Why
Clients that surface model capabilities (e.g. "supports 1M context") currently rely on hand-maintained entries in
internal/registry/models/models.json. OpenRouter already publishes authoritativecontext_lengthvalues for every routed model; consuming that upstream eliminates the maintenance burden for any model OpenRouter fronts and keeps the catalog accurate as providers bump windows.Behavior
model_registry.RefreshModels()now writes through to the live registration so the enriched count is observable and returned.Timeout pattern — note to @luispater
Uses
http.Client{Timeout: 30s}+context.WithTimeout— structurally identical to the existing precedent atinternal/misc/antigravity_version.go(commits3774b56e,8d5e470e), which was accepted upstream for the same class of periodic metadata fetch. If the repo's "small set of documented exceptions" covers antigravity-version but not model-enrichment, I'm happy to tighten or rewrite — just point at the approved shape.Scope
This PR is intentionally scoped to OpenRouter enrichment only. The temporal-anti-drift work that was previously bundled in the same branch is split out to #2574.
Test plan
go build ./...— compiles cleanly🤖 Generated with Claude Code