Problem
Read-only operations (read_file, grep, list_files) currently consume the same expensive frontier model tokens as complex code generation. For users running code_puppy at scale, this creates significant token exhaust on tasks that require no high-level reasoning — file reads, searches, and directory scans that a cheap or local model handles equally well.
This is especially pronounced in non-developer use cases (dashboards, presentations, data analysis) where the majority of interactions are read-heavy with occasional edits. These users are paying frontier model prices for work that doesn't need frontier model intelligence.
Proposed Solution
A four-component framework that builds entirely on existing code_puppy infrastructure — no new dependencies, fully opt-in, zero telemetry, local-only.
Component 1 — Session Classifier
At session end, inspect which tools fired. Map tool call patterns to task type labels:
Read-heavy (read_file, grep, list_files, no edit_file) → Routine
Mixed read/write → Standard
Write-heavy + shell commands + complex reasoning → Complex
Component 2 — Local User Profile
A simple JSON file in ~/.code_puppy/ that accumulates session classifications over time. After enough sessions you have a distribution — e.g. 70% Routine, 20% Standard, 10% Complex. Fully inspectable, deletable with one command, never leaves the machine.
Component 3 — Model Tier Suggestion
On session start, the profile suggests a default model tier based on observed history. User can always override. A user who mostly reads files shouldn't default to Claude Sonnet — they should default to Gemini Flash or a local Ollama model.
Component 4 — Complexity Escalation Monitor
Watches tool call sequence mid-session and detects inflection points — when a session transitions from read-heavy to write-heavy. Surfaces a single non-blocking suggestion:
"This looks like it's turning into active development. You're on [cheap model]. Want to switch to [strong model] for better results? /yes or /no"
No auto-switching. User confirms or dismisses. Human-in-the-loop at every escalation point.
Update: Found discussion #123 where Michael explicitly stated his preferred direction for model routing. Updating this proposal to align.
Key feedback from that thread:
- Rules-based routing was flagged as too simplistic
- Preferred approach: a lightweight ML classifier (TF-IDF Bag-of-Words → SVM) constrained to favor successful outcomes while routing cheaper models where possible
- No-telemetry constraint remains — any training data would need to be opt-in or synthetically generated
Revised direction for this issue:
Rather than static tier rules, the implementation would:
- Collect opt-in session outcome signals (model used, task completed, tokens consumed) — stored locally, never transmitted
- Train a lightweight TF-IDF/SVM classifier on local history + a bootstrapped synthetic dataset of representative prompts
- Route based on classifier confidence, defaulting to the global model when confidence is low
This keeps the privacy-first commitment intact while moving toward the data-driven approach Michael described. Still collecting usage pattern data to inform the training set before proposing an implementation.
Open question: Is a locally-trained classifier still the preferred direction, or has thinking evolved since discussion #123?
Problem
Read-only operations (read_file, grep, list_files) currently consume the same expensive frontier model tokens as complex code generation. For users running code_puppy at scale, this creates significant token exhaust on tasks that require no high-level reasoning — file reads, searches, and directory scans that a cheap or local model handles equally well.
This is especially pronounced in non-developer use cases (dashboards, presentations, data analysis) where the majority of interactions are read-heavy with occasional edits. These users are paying frontier model prices for work that doesn't need frontier model intelligence.
Proposed Solution
A four-component framework that builds entirely on existing code_puppy infrastructure — no new dependencies, fully opt-in, zero telemetry, local-only.
Component 1 — Session Classifier
At session end, inspect which tools fired. Map tool call patterns to task type labels:
Read-heavy (read_file, grep, list_files, no edit_file) → Routine
Mixed read/write → Standard
Write-heavy + shell commands + complex reasoning → Complex
Component 2 — Local User Profile
A simple JSON file in ~/.code_puppy/ that accumulates session classifications over time. After enough sessions you have a distribution — e.g. 70% Routine, 20% Standard, 10% Complex. Fully inspectable, deletable with one command, never leaves the machine.
Component 3 — Model Tier Suggestion
On session start, the profile suggests a default model tier based on observed history. User can always override. A user who mostly reads files shouldn't default to Claude Sonnet — they should default to Gemini Flash or a local Ollama model.
Component 4 — Complexity Escalation Monitor
Watches tool call sequence mid-session and detects inflection points — when a session transitions from read-heavy to write-heavy. Surfaces a single non-blocking suggestion:
"This looks like it's turning into active development. You're on [cheap model]. Want to switch to [strong model] for better results? /yes or /no"
No auto-switching. User confirms or dismisses. Human-in-the-loop at every escalation point.
Update: Found discussion #123 where Michael explicitly stated his preferred direction for model routing. Updating this proposal to align.
Key feedback from that thread:
Revised direction for this issue:
Rather than static tier rules, the implementation would:
This keeps the privacy-first commitment intact while moving toward the data-driven approach Michael described. Still collecting usage pattern data to inform the training set before proposing an implementation.
Open question: Is a locally-trained classifier still the preferred direction, or has thinking evolved since discussion #123?