feat(config): per-operation LLM provider/model endpoints (extraction vs consolidation vs reflect)

## Use case

As a Hindsight operator running an active agent, I want to point extraction (retain) at a free local model while keeping consolidation and reflect on a stronger cloud model. Extraction is high-volume structured JSON work — it doesn't need frontier quality. Consolidation and reflect are lower-volume but quality-sensitive. Right now I have to pick one endpoint for everything, which means either overpaying for extraction or under-serving consolidation.

## Problem statement

Hindsight uses a single LLM endpoint for all operations. Extraction (retain) accounts for the vast majority of token burn during active use, but it's structured output work that fast local models handle well. Consolidation and reflect benefit from stronger reasoning. There's no way to route different operations to different endpoints, so you're forced into an all-or-nothing cost/quality tradeoff.

## How would this feature help

Would allow operators to cut token costs significantly (60-80% for active agents) by running extraction on a free local model (llama.cpp, Ollama, vLLM) while keeping consolidation and reflect on a paid cloud model for quality. Also enables tiered setups — fast cheap model for extraction, stronger model for consolidation, best model for reflect.

## Proposed solution

Add per-operation LLM provider/model env var overrides following the existing per-scope pattern (`HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY` in #1607):

```bash
# Global (existing, unchanged — fallback for anything not overridden)
HINDSIGHT_API_LLM_PROVIDER=openai
HINDSIGHT_API_LLM_MODEL=some-model

# Override extraction to a local/cheaper model
HINDSIGHT_API_RETAIN_LLM_PROVIDER=openai
HINDSIGHT_API_RETAIN_LLM_MODEL=some-efficient-model
HINDSIGHT_API_RETAIN_LLM_BASE_URL=http://localhost:8080/v1

# Override consolidation to a stronger model
HINDSIGHT_API_CONSOLIDATION_LLM_PROVIDER=anthropic
HINDSIGHT_API_CONSOLIDATION_LLM_MODEL=some-reasoning-model
```

For each of `{RETAIN,REFLECT,CONSOLIDATION}`:

- `HINDSIGHT_API_<SCOPE>_LLM_PROVIDER`
- `HINDSIGHT_API_<SCOPE>_LLM_MODEL`
- `HINDSIGHT_API_<SCOPE>_LLM_BASE_URL`
- `HINDSIGHT_API_<SCOPE>_LLM_API_KEY`
- `HINDSIGHT_API_<SCOPE>_LLM_TIMEOUT`
- `HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY`
- `HINDSIGHT_API_<SCOPE>_LLM_LITELLMROUTER_CONFIG`

When unset, each scope falls back to the global `HINDSIGHT_API_LLM_*`. Embedding generation always uses the configured embedding model regardless.

## Alternatives considered

- **LiteLLM Router as a single entrypoint** with routing rules — works but adds operational complexity, another moving part, and still costs for every call through the proxy. Per-scope env vars are simpler and keep local traffic truly local.
- **Two separate Hindsight instances** sharing the same database — functional but doubles operational overhead (two containers, two health checks, confusing worker coordination).
- **Setting `HINDSIGHT_API_RETAIN_EXTRACTION_MODE=chunks`** to skip extraction entirely — eliminates LLM cost for retain but also eliminates structured fact extraction, which is the whole point.

## Priority

Medium

## Additional context

The plumbing already exists — `MemoryEngine.__init__` constructs separate `LLMConfig` instances per operation. Per-scope `*_LLM_TIMEOUT`, `*_LLM_EXTRA_BODY` (#1607), and `*_LLM_LITELLMROUTER_CONFIG` already follow this exact pattern. This would extend it to provider/model/base_url/api_key.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config): per-operation LLM provider/model endpoints (extraction vs consolidation vs reflect) #1646

Use case

Problem statement

How would this feature help

Proposed solution

Alternatives considered

Priority

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(config): per-operation LLM provider/model endpoints (extraction vs consolidation vs reflect) #1646

Description

Use case

Problem statement

How would this feature help

Proposed solution

Alternatives considered

Priority

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions