Pre-Phase-7: per-source LLM routing seam#7
Merged
Conversation
Defer original Phase 7 (local inference / Ollama / vLLM) to post-launch and land the per-source LLM routing surface ahead of Phase 7 connectors, so that connectors and the eventual local-inference adapter plug into the same decision point without touching dispatch sites. See decisions.md (2026-05-15) and the renumbered build plan in CLAUDE.md. - model_preference (nullable) columns on projects + documents - backend/src/lib/llm/routing.ts: resolveModelRouting() with doc → project → request precedence, conflicts and unknown-model rejections captured for audit - streamChatWithTools accepts optional routing context, dispatches with the resolved model, and records the policy into the existing audit_log.routing_policy_applied jsonb column - main chat path wired to pass project + document IDs - 8 unit tests covering precedence, conflicts, rejections, and db errors Backend tsc --noEmit clean; Vitest 73/73 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
decisions.md(2026-05-15) and the renumbered build plan inCLAUDE.md.resolveModelRouting()resolver with document → project → request precedence. Stored prefs are validated against the canonical model set; unknown IDs are rejected and recorded, then the resolver walks down the chain. Conflicts between documents are captured but the first non-null wins (no policy engine yet).streamChatWithToolsaccepts an optionalroutingcontext, dispatches with the resolved model, and records the resolution into the existingaudit_log.routing_policy_appliedjsonb column shipped in Phase 6.docIndex. Today this resolves to the user's requested model unchanged; Phase 7 connectors will populatedocuments.model_preferenceat ingest and the same code path will route accordingly without further changes.model_preference(nullable text) columns onprojectsanddocuments. No UI yet — by design.Why now, not in Phase 7
The local-inference adapter is cheap to add later (a fourth file alongside
claude.ts/gemini.ts/openai.ts). The routing policy surface is the part that would otherwise get baked into every connector and dispatch site in Phase 7+, and retrofitting it would force a much wider edit later. Landing the seam now keeps Phase 7 connectors free to declare amodel_preferenceat ingest without inventing the surface from scratch.Test plan
backend/npx tsc --noEmitcleanbackend/npx vitest run— 73/73 green (8 new routing tests covering precedence, conflicts, unknown-model rejection at doc + project layers, db-error tolerance, and missing-document-ids skip path)documents.model_preferenceto a known model id on one doc; confirm a chat that includes that doc resolves to the override and writes the policy intoaudit_log.routing_policy_appliedaudit_log.routing_policy_appliedrecordssource: "request"🤖 Generated with Claude Code