feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607
feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607connorblack wants to merge 2 commits into
Conversation
Hindsight previously only supported a single global llm_extra_body, set via HINDSIGHT_API_LLM_EXTRA_BODY. This forced an all-or-nothing choice for reasoning-model knobs like enable_thinking — useful for retain and consolidation (structured JSON extraction; thinking just adds latency) but breaks reflect (an agentic tool loop that needs reasoning to decide when to call tools). Add per-scope env vars matching the existing per-scope provider/model/ timeout pattern: HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY Each parses as JSON and falls back to the global llm_extra_body when unset. Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine. Tests: - 4 new tests in tests/test_per_scope_llm_extra_body.py covering parse, fallback, and invalid-JSON paths - All 13 existing per_operation_llm_config tests still pass
There was a problem hiding this comment.
Pull request overview
This PR adds per-scope overrides for LLMConfig.extra_body so retain/reflect/consolidation can independently supply JSON extra_body values via environment variables, falling back to the existing global HINDSIGHT_API_LLM_EXTRA_BODY when the per-scope var is not set.
Changes:
- Added
HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODYenv vars and correspondingHindsightConfigfields parsed as JSON. - Wired per-scope
extra_bodyintoMemoryEngine’s per-operationLLMConfigconstruction with fallback to the global extra_body. - Added a new test module covering unset/global/per-scope parsing and invalid JSON.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
hindsight-api-slim/hindsight_api/config.py |
Adds per-scope env var constants + config fields and parses them from env as JSON. |
hindsight-api-slim/hindsight_api/engine/memory_engine.py |
Uses per-scope extra_body values (when present) when creating per-operation LLMConfigs. |
hindsight-api-slim/tests/test_per_scope_llm_extra_body.py |
Introduces tests for per-scope parsing, fallback behavior, and invalid JSON handling. |
Comments suppressed due to low confidence (2)
hindsight-api-slim/hindsight_api/engine/memory_engine.py:603
- Same issue as above:
config.reflect_llm_extra_body or config.llm_extra_bodytreats{}as falsy and will incorrectly fall back to the global extra_body. Use an explicitNonecheck so an empty per-scope dict can intentionally override the global config.
self._reflect_llm_config = LLMConfig(
provider=reflect_provider,
api_key=reflect_api_key,
base_url=reflect_base_url,
model=reflect_model,
extra_body=config.reflect_llm_extra_body or config.llm_extra_body,
default_headers=config.llm_default_headers,
litellmrouter_config=config.reflect_llm_litellmrouter_config or config.llm_litellmrouter_config,
)
hindsight-api-slim/hindsight_api/engine/memory_engine.py:627
- Same issue as above:
config.consolidation_llm_extra_body or config.llm_extra_bodywill not allow an explicit empty object ({}) to override the global extra_body. Switch to an explicitNonecheck to distinguish “unset” from “set to empty dict”.
self._consolidation_llm_config = LLMConfig(
provider=consolidation_provider,
api_key=consolidation_api_key,
base_url=consolidation_base_url,
model=consolidation_model,
extra_body=config.consolidation_llm_extra_body or config.llm_extra_body,
default_headers=config.llm_default_headers,
litellmrouter_config=config.consolidation_llm_litellmrouter_config or config.llm_litellmrouter_config,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| extra_body=config.retain_llm_extra_body or config.llm_extra_body, | ||
| default_headers=config.llm_default_headers, | ||
| litellmrouter_config=config.retain_llm_litellmrouter_config or config.llm_litellmrouter_config, |
| assert config.retain_llm_extra_body is None | ||
| assert config.reflect_llm_extra_body is None | ||
| assert config.consolidation_llm_extra_body is None | ||
|
|
vLLM v0.20.2 + nightly still ship open bugs #35936 and #33965 where
tool_choice="required" and tool_choice={"type":"function","function":
{"name":...}} both bypass the configured --tool-call-parser and use
JSON-only validation. For Qwen3 models that emit XML-style tool calls
via the qwen3_coder parser, this silently returns tool_calls=[] while
finish_reason still reports "tool_calls". Hindsight reflect's first
three iterations use forced specific-name tool_choice to drive the
hierarchical retrieval path (mental_models -> observations -> recall),
which means those iterations are no-ops when run against a current vLLM
serving any Qwen3 instruct/coder/MoE model.
Add HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE env var that, when
set, falls back to tool_choice="auto" for every iteration. The model
still calls retrieval tools when given factual queries — just without
API-level forcing.
Default behavior is unchanged. Operators on affected vLLM builds can
opt in until the upstream PRs land, at which point the env var becomes
a no-op and can be removed.
nicoloboschi
left a comment
There was a problem hiding this comment.
Two changes requested before merge:
1. Remove HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE from this PR
The reflect/agent.py change is scope creep — the PR title and summary advertise only the per-scope LLM_EXTRA_BODY env vars, but agent.py quietly adds a second, unrelated env var with no tests and no documentation. Please remove it from this PR.
If we still want that escape hatch, it should land separately and:
- Be parsed once in
HindsightConfig.from_env()with anENV_*constant (matching the established.lower() == "true"pattern atconfig.py:1664-1687), not via rawos.getenv()inside the per-iteration hot path atagent.py:579. - Be read off the engine's config rather than re-importing
osinagent.pyjust for this. - Have at least one test that monkeypatches the env var and asserts
iter_tool_choice == "auto".
2. Document the three new env vars
hindsight-docs/docs/developer/configuration.md already documents the global HINDSIGHT_API_LLM_EXTRA_BODY (line 190) and the full per-scope table for provider/model/timeout (lines 378–404). Please add the three new vars (HINDSIGHT_API_{RETAIN,REFLECT,CONSOLIDATION}_LLM_EXTRA_BODY) to that table following the existing "Falls back to HINDSIGHT_API_LLM_EXTRA_BODY" phrasing — CLAUDE.md requires docs updates for any new config flag.
The core extra-body change itself is clean and good to go once these two items are addressed.
Summary
Hindsight currently exposes only a single global
HINDSIGHT_API_LLM_EXTRA_BODY. This forces an all-or-nothing choice for reasoning-model knobs like Qwen3'schat_template_kwargs.enable_thinking— useful for retain/consolidation (structured JSON extraction; thinking adds latency without quality gain) but breaks reflect (an agentic tool loop that needs reasoning to decide whether to call tools).We hit this concretely with Qwen3.6-35B-A3B-FP8 served via vLLM: with
enable_thinking=falseset globally for consolidation throughput, every reflect call returned 0 tool calls and short-circuited to "I don't have information" without ever searching the bank.Change
Add per-scope env vars matching the existing per-scope provider/model/timeout pattern:
HINDSIGHT_API_RETAIN_LLM_EXTRA_BODYHINDSIGHT_API_REFLECT_LLM_EXTRA_BODYHINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODYEach parses as JSON. When set, it overrides the global
HINDSIGHT_API_LLM_EXTRA_BODYfor that scope only. When unset, falls back to the global. When neither is set, no extra_body is sent (existing behavior).Wiring is a single-line change at each
LLMConfiginstantiation inMemoryEngine.__init__:Test plan
tests/test_per_scope_llm_extra_body.py:tests/test_per_operation_llm_config.pytests still passruff checkclean on touched files