feat: task-aware model routing to reduce token costs

## Problem
Read-only operations (read_file, grep, list_files) currently consume the same expensive frontier model tokens as complex code generation. For users running code_puppy at scale, this creates significant token exhaust on tasks that require no high-level reasoning — file reads, searches, and directory scans that a cheap or local model handles equally well.
This is especially pronounced in non-developer use cases (dashboards, presentations, data analysis) where the majority of interactions are read-heavy with occasional edits. These users are paying frontier model prices for work that doesn't need frontier model intelligence.

## Proposed Solution
A four-component framework that builds entirely on existing code_puppy infrastructure — no new dependencies, fully opt-in, zero telemetry, local-only.

### Component 1 — Session Classifier
At session end, inspect which tools fired. Map tool call patterns to task type labels:

Read-heavy (read_file, grep, list_files, no edit_file) → Routine
Mixed read/write → Standard
Write-heavy + shell commands + complex reasoning → Complex

### Component 2 — Local User Profile
A simple JSON file in ~/.code_puppy/ that accumulates session classifications over time. After enough sessions you have a distribution — e.g. 70% Routine, 20% Standard, 10% Complex. Fully inspectable, deletable with one command, never leaves the machine.

### Component 3 — Model Tier Suggestion
On session start, the profile suggests a default model tier based on observed history. User can always override. A user who mostly reads files shouldn't default to Claude Sonnet — they should default to Gemini Flash or a local Ollama model.
### Component 4 — Complexity Escalation Monitor
Watches tool call sequence mid-session and detects inflection points — when a session transitions from read-heavy to write-heavy. Surfaces a single non-blocking suggestion:

*"This looks like it's turning into active development. You're on [cheap model]. Want to switch to [strong model] for better results? /yes or /no"*

No auto-switching. User confirms or dismisses. Human-in-the-loop at every escalation point.

Update: Found discussion #123 where Michael explicitly stated his preferred direction for model routing. Updating this proposal to align.

Key feedback from that thread:
- Rules-based routing was flagged as too simplistic
- Preferred approach: a lightweight ML classifier (TF-IDF Bag-of-Words → SVM) constrained to favor successful outcomes while routing cheaper models where possible
- No-telemetry constraint remains — any training data would need to be opt-in or synthetically generated

Revised direction for this issue:

Rather than static tier rules, the implementation would:
1. Collect opt-in session outcome signals (model used, task completed, tokens consumed) — stored locally, never transmitted
2. Train a lightweight TF-IDF/SVM classifier on local history + a bootstrapped synthetic dataset of representative prompts
3. Route based on classifier confidence, defaulting to the global model when confidence is low

This keeps the privacy-first commitment intact while moving toward the data-driven approach Michael described. Still collecting usage pattern data to inform the training set before proposing an implementation.

Open question: Is a locally-trained classifier still the preferred direction, or has thinking evolved since discussion #123?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: task-aware model routing to reduce token costs #237

Problem

Proposed Solution

Component 1 — Session Classifier

Component 2 — Local User Profile

Component 3 — Model Tier Suggestion

Component 4 — Complexity Escalation Monitor

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: task-aware model routing to reduce token costs #237

Description

Problem

Proposed Solution

Component 1 — Session Classifier

Component 2 — Local User Profile

Component 3 — Model Tier Suggestion

Component 4 — Complexity Escalation Monitor

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions