Skip to content

fix(compressor): keep truncated tool_call arguments as valid JSON#1

Merged
handsdiff merged 1 commit intomainfrom
fix/compressor-tool-args-valid-json
Apr 17, 2026
Merged

fix(compressor): keep truncated tool_call arguments as valid JSON#1
handsdiff merged 1 commit intomainfrom
fix/compressor-tool-args-valid-json

Conversation

@handsdiff
Copy link
Copy Markdown
Owner

Summary

context_compressor.py Pass 3 truncates large tool_call arguments strings by byte-slicing to 200 chars and appending the literal sentinel ...[truncated]. The result is invalid JSON — mid-value, unterminated string.

Some providers (notably Anthropic via LiteLLM) re-parse function.arguments when rebuilding tool_use blocks. Once a session is compressed with any tool_call whose original args exceeded 500 chars (routine for patch), the next API call fails with:

HTTP 400: litellm.BadRequestError: AnthropicException -
Failed to parse tool call arguments for tool 'patch' (Anthropic tool invoke).
Error: Unterminated string starting at: line 1 column 35 (char 34).

The 3 retries are identical payloads, so they all fail identically, and every subsequent user turn replays the corrupted history. The session is permanently stuck and the user sees only a canned error fallback.

Fix

Replace the byte slice with a compact JSON sentinel that preserves provenance (original length + 200-char preview) while staying parseable:

{"_truncated": true, "_original_length": 1247, "_preview": "..."}

Observed in the wild

A production session had 11 prior patch tool_calls poisoned by this truncation after compression. Every subsequent user message triggered a new failure cycle — user saw 3 cryptic LiteLLM 400 error strings back-to-back with no way to recover (the poison is in server-side history).

Test plan

  • New unit test test_pruned_tool_call_args_remain_valid_json that asserts the pruned arguments are parseable via json.loads and carry a sensible sentinel shape.
  • All existing TestTokenBudgetTailProtection tests still pass.

When _prune_old_tool_results truncates a large tool_call arguments
string, it was byte-slicing to 200 chars and appending the literal
sentinel "...[truncated]". The resulting string is not valid JSON
(unterminated mid-value), and some providers — notably Anthropic via
LiteLLM — re-parse function.arguments when rebuilding tool_use
blocks. The next API call fails with HTTP 400 ("Unterminated string
starting at: line 1 column 35"), and every subsequent turn replays
the same corrupted history, so the session is permanently stuck.

Observed in the wild: a session that compressed with 11 prior patch
tool_calls (each >500 chars) then returned HTTP 400 on every inbound
message. The 3 retries are identical payloads, so they all fail the
same way. The user sees only a canned error fallback.

Replace the byte slice with a compact JSON sentinel that preserves
provenance (original length + 200-char preview) while staying
parseable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@handsdiff handsdiff merged commit 01ba396 into main Apr 17, 2026
4 of 5 checks passed
handsdiff pushed a commit that referenced this pull request Apr 18, 2026
…ts (NousResearch#11745)

Move moonshotai/kimi-k2.5 to position #1 in every model picker list:
- OPENROUTER_MODELS (with 'recommended' tag)
- _PROVIDER_MODELS: nous, kimi-coding, opencode-zen, opencode-go, alibaba, huggingface
- _model_flow_kimi() Coding Plan model list in main.py

kimi-coding-cn and moonshot lists already had kimi-k2.5 first.
handsdiff pushed a commit that referenced this pull request Apr 21, 2026
When the live Vercel AI Gateway catalog exposes a Moonshot model with
zero input AND output pricing, it's promoted to position #1 as the
recommended default — even if the exact ID isn't in the curated
AI_GATEWAY_MODELS list. This enables dynamic discovery of new free
Moonshot variants without requiring a PR to update curation.

Paid Moonshot models are unaffected; falls back to the normal curated
recommended tag when no free Moonshot is live.
handsdiff pushed a commit that referenced this pull request Apr 21, 2026
…#13354)

Classic-CLI /steer typed during an active agent run was queued through
self._pending_input alongside ordinary user input.  process_loop, which
drains that queue, is blocked inside self.chat() for the entire run,
so the queued command was not pulled until AFTER _agent_running had
flipped back to False — at which point process_command() took the idle
fallback ("No agent running; queued as next turn") and delivered the
steer as an ordinary next-turn user message.

From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes
later at the end of the turn as a /queue-style message, completely
defeating its purpose.

Fix: add _should_handle_steer_command_inline() gating — when
_agent_running is True and the user typed /steer, dispatch
process_command(text) directly from the prompt_toolkit Enter handler
on the UI thread instead of queueing.  This mirrors the existing
_should_handle_model_command_inline() pattern for /model and is
safe because agent.steer() is thread-safe (uses _pending_steer_lock,
no prompt_toolkit state mutation, instant return).

No changes to the idle-path behavior: /steer typed with no active
agent still takes the normal queue-and-drain route so the fallback
"No agent running; queued as next turn" message is preserved.

Validation:
- 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering
  the detector, dispatch path, and idle-path control behavior.
- All 21 existing tests in tests/run_agent/test_steer.py still pass.
- Live PTY end-to-end test with real agent + real openrouter model:
    22:36:22 API call #1 (model requested execute_code)
    22:36:26 ENTER FIRED: agent_running=True, text='/steer ...'
    22:36:26 INLINE STEER DISPATCH fired
    22:36:43 agent.log: 'Delivered /steer to agent after tool batch'
    22:36:44 API call NousResearch#2 included the steer; response contained marker
  Same test on the tip of main without this fix shows the steer
  landing as a new user turn ~20s after the run ended.
handsdiff pushed a commit that referenced this pull request Apr 21, 2026
Previously the breaker was only cleared when the post-reconnect retry
call itself succeeded (via _reset_server_error at the end of the try
block). If OAuth recovery succeeded but the retry call happened to
fail for a different reason, control fell through to the
needs_reauth path which called _bump_server_error — adding to an
already-tripped count instead of the fresh count the reconnect
justified. With fix #1 in place this would still self-heal on the
next cooldown, but we should not pay a 60s stall when we already
have positive evidence the server is viable.

Move _reset_server_error(server_name) up to immediately after the
reconnect-and-ready-wait block, before the retry_call. The
subsequent retry still goes through _bump_server_error on failure,
so a genuinely broken server re-trips the breaker as normal — but
the retry starts from a clean count (1 after a failure), not a
stale one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant