feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY by connorblack · Pull Request #1607 · vectorize-io/hindsight

connorblack · 2026-05-13T09:17:35Z

Summary

Hindsight currently exposes only a single global HINDSIGHT_API_LLM_EXTRA_BODY. This forces an all-or-nothing choice for reasoning-model knobs like Qwen3's chat_template_kwargs.enable_thinking — useful for retain/consolidation (structured JSON extraction; thinking adds latency without quality gain) but breaks reflect (an agentic tool loop that needs reasoning to decide whether to call tools).

We hit this concretely with Qwen3.6-35B-A3B-FP8 served via vLLM: with enable_thinking=false set globally for consolidation throughput, every reflect call returned 0 tool calls and short-circuited to "I don't have information" without ever searching the bank.

Change

Add per-scope env vars matching the existing per-scope provider/model/timeout pattern:

HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY
HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY
HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY

Each parses as JSON. When set, it overrides the global HINDSIGHT_API_LLM_EXTRA_BODY for that scope only. When unset, falls back to the global. When neither is set, no extra_body is sent (existing behavior).

Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine.__init__:

extra_body=config.<scope>_llm_extra_body or config.llm_extra_body,

Test plan

4 new tests in tests/test_per_scope_llm_extra_body.py:
- unset → all fields None
- per-scope env parses as JSON
- global only → per-scope stays None (fallback happens at engine init, not parse time)
- invalid JSON raises at parse time
All 13 existing tests/test_per_operation_llm_config.py tests still pass
ruff check clean on touched files

Hindsight previously only supported a single global llm_extra_body, set via HINDSIGHT_API_LLM_EXTRA_BODY. This forced an all-or-nothing choice for reasoning-model knobs like enable_thinking — useful for retain and consolidation (structured JSON extraction; thinking just adds latency) but breaks reflect (an agentic tool loop that needs reasoning to decide when to call tools). Add per-scope env vars matching the existing per-scope provider/model/ timeout pattern: HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY Each parses as JSON and falls back to the global llm_extra_body when unset. Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine. Tests: - 4 new tests in tests/test_per_scope_llm_extra_body.py covering parse, fallback, and invalid-JSON paths - All 13 existing per_operation_llm_config tests still pass

Copilot

Pull request overview

This PR adds per-scope overrides for LLMConfig.extra_body so retain/reflect/consolidation can independently supply JSON extra_body values via environment variables, falling back to the existing global HINDSIGHT_API_LLM_EXTRA_BODY when the per-scope var is not set.

Changes:

Added HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY env vars and corresponding HindsightConfig fields parsed as JSON.
Wired per-scope extra_body into MemoryEngine’s per-operation LLMConfig construction with fallback to the global extra_body.
Added a new test module covering unset/global/per-scope parsing and invalid JSON.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`hindsight-api-slim/hindsight_api/config.py`	Adds per-scope env var constants + config fields and parses them from env as JSON.
`hindsight-api-slim/hindsight_api/engine/memory_engine.py`	Uses per-scope extra_body values (when present) when creating per-operation `LLMConfig`s.
`hindsight-api-slim/tests/test_per_scope_llm_extra_body.py`	Introduces tests for per-scope parsing, fallback behavior, and invalid JSON handling.

Comments suppressed due to low confidence (2)

hindsight-api-slim/hindsight_api/engine/memory_engine.py:603

Same issue as above: config.reflect_llm_extra_body or config.llm_extra_body treats {} as falsy and will incorrectly fall back to the global extra_body. Use an explicit None check so an empty per-scope dict can intentionally override the global config.

        self._reflect_llm_config = LLMConfig(
            provider=reflect_provider,
            api_key=reflect_api_key,
            base_url=reflect_base_url,
            model=reflect_model,
            extra_body=config.reflect_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.reflect_llm_litellmrouter_config or config.llm_litellmrouter_config,
        )

hindsight-api-slim/hindsight_api/engine/memory_engine.py:627

Same issue as above: config.consolidation_llm_extra_body or config.llm_extra_body will not allow an explicit empty object ({}) to override the global extra_body. Switch to an explicit None check to distinguish “unset” from “set to empty dict”.

        self._consolidation_llm_config = LLMConfig(
            provider=consolidation_provider,
            api_key=consolidation_api_key,
            base_url=consolidation_base_url,
            model=consolidation_model,
            extra_body=config.consolidation_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.consolidation_llm_litellmrouter_config or config.llm_litellmrouter_config,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            extra_body=config.retain_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.retain_llm_litellmrouter_config or config.llm_litellmrouter_config,


+        assert config.retain_llm_extra_body is None
+        assert config.reflect_llm_extra_body is None
+        assert config.consolidation_llm_extra_body is None
+


vLLM v0.20.2 + nightly still ship open bugs #35936 and #33965 where tool_choice="required" and tool_choice={"type":"function","function": {"name":...}} both bypass the configured --tool-call-parser and use JSON-only validation. For Qwen3 models that emit XML-style tool calls via the qwen3_coder parser, this silently returns tool_calls=[] while finish_reason still reports "tool_calls". Hindsight reflect's first three iterations use forced specific-name tool_choice to drive the hierarchical retrieval path (mental_models -> observations -> recall), which means those iterations are no-ops when run against a current vLLM serving any Qwen3 instruct/coder/MoE model. Add HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE env var that, when set, falls back to tool_choice="auto" for every iteration. The model still calls retrieval tools when given factual queries — just without API-level forcing. Default behavior is unchanged. Operators on affected vLLM builds can opt in until the upstream PRs land, at which point the env var becomes a no-op and can be removed.

nicoloboschi

Two changes requested before merge:

1. Remove `HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE` from this PR

The reflect/agent.py change is scope creep — the PR title and summary advertise only the per-scope LLM_EXTRA_BODY env vars, but agent.py quietly adds a second, unrelated env var with no tests and no documentation. Please remove it from this PR.

If we still want that escape hatch, it should land separately and:

Be parsed once in HindsightConfig.from_env() with an ENV_* constant (matching the established .lower() == "true" pattern at config.py:1664-1687), not via raw os.getenv() inside the per-iteration hot path at agent.py:579.
Be read off the engine's config rather than re-importing os in agent.py just for this.
Have at least one test that monkeypatches the env var and asserts iter_tool_choice == "auto".

2. Document the three new env vars

hindsight-docs/docs/developer/configuration.md already documents the global HINDSIGHT_API_LLM_EXTRA_BODY (line 190) and the full per-scope table for provider/model/timeout (lines 378–404). Please add the three new vars (HINDSIGHT_API_{RETAIN,REFLECT,CONSOLIDATION}_LLM_EXTRA_BODY) to that table following the existing "Falls back to HINDSIGHT_API_LLM_EXTRA_BODY" phrasing — CLAUDE.md requires docs updates for any new config flag.

The core extra-body change itself is clean and good to go once these two items are addressed.

Copilot AI review requested due to automatic review settings May 13, 2026 09:17

Copilot started reviewing on behalf of connorblack May 13, 2026 09:18 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

kevin-ho mentioned this pull request May 18, 2026

feat(config): per-operation LLM provider/model endpoints (extraction vs consolidation vs reflect) #1646

Closed

nicoloboschi requested changes May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607

feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607
connorblack wants to merge 2 commits into
vectorize-io:mainfrom
connorblack:feat/per-scope-llm-extra-body

connorblack commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

nicoloboschi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

connorblack commented May 13, 2026

Summary

Change

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

nicoloboschi left a comment

Choose a reason for hiding this comment

1. Remove HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE from this PR

2. Document the three new env vars

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Remove `HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE` from this PR