Skip to content

feat: task-aware model routing to reduce token costs #237

@RyanLHicks

Description

@RyanLHicks

Problem

Read-only operations (read_file, grep, list_files) currently consume the same expensive frontier model tokens as complex code generation. For users running code_puppy at scale, this creates significant token exhaust on tasks that require no high-level reasoning — file reads, searches, and directory scans that a cheap or local model handles equally well.
This is especially pronounced in non-developer use cases (dashboards, presentations, data analysis) where the majority of interactions are read-heavy with occasional edits. These users are paying frontier model prices for work that doesn't need frontier model intelligence.

Proposed Solution

A four-component framework that builds entirely on existing code_puppy infrastructure — no new dependencies, fully opt-in, zero telemetry, local-only.

Component 1 — Session Classifier

At session end, inspect which tools fired. Map tool call patterns to task type labels:

Read-heavy (read_file, grep, list_files, no edit_file) → Routine
Mixed read/write → Standard
Write-heavy + shell commands + complex reasoning → Complex

Component 2 — Local User Profile

A simple JSON file in ~/.code_puppy/ that accumulates session classifications over time. After enough sessions you have a distribution — e.g. 70% Routine, 20% Standard, 10% Complex. Fully inspectable, deletable with one command, never leaves the machine.

Component 3 — Model Tier Suggestion

On session start, the profile suggests a default model tier based on observed history. User can always override. A user who mostly reads files shouldn't default to Claude Sonnet — they should default to Gemini Flash or a local Ollama model.

Component 4 — Complexity Escalation Monitor

Watches tool call sequence mid-session and detects inflection points — when a session transitions from read-heavy to write-heavy. Surfaces a single non-blocking suggestion:

"This looks like it's turning into active development. You're on [cheap model]. Want to switch to [strong model] for better results? /yes or /no"

No auto-switching. User confirms or dismisses. Human-in-the-loop at every escalation point.

Update: Found discussion #123 where Michael explicitly stated his preferred direction for model routing. Updating this proposal to align.

Key feedback from that thread:

  • Rules-based routing was flagged as too simplistic
  • Preferred approach: a lightweight ML classifier (TF-IDF Bag-of-Words → SVM) constrained to favor successful outcomes while routing cheaper models where possible
  • No-telemetry constraint remains — any training data would need to be opt-in or synthetically generated

Revised direction for this issue:

Rather than static tier rules, the implementation would:

  1. Collect opt-in session outcome signals (model used, task completed, tokens consumed) — stored locally, never transmitted
  2. Train a lightweight TF-IDF/SVM classifier on local history + a bootstrapped synthetic dataset of representative prompts
  3. Route based on classifier confidence, defaulting to the global model when confidence is low

This keeps the privacy-first commitment intact while moving toward the data-driven approach Michael described. Still collecting usage pattern data to inform the training set before proposing an implementation.

Open question: Is a locally-trained classifier still the preferred direction, or has thinking evolved since discussion #123?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions