test(e2e): accept multiturn Kimi tool calls by cv · Pull Request #5413 · NVIDIA/NemoClaw

cv · 2026-06-14T04:09:12Z

Summary

Stabilizes the live Kimi inference compatibility E2E by accepting Kimi tool calls emitted across multiple assistant turns. The full nightly run showed Kimi successfully executing all three expected tools and producing the final answer, but the trajectory checker only inspected the last assistant tool-call message and failed when the model emitted one command per assistant turn.

Related Issue

Related to #5401.

Changes

Flatten tool-call content from all assistant tool-call messages in test/e2e/test-kimi-inference-compat.sh before checking the expected hostname, date, uptime order.
Preserve the existing checks that the trace artifacts contain exactly three exec tool metas, no combined semicolon command remains, final status is success, and the final assistant response follows all tool results.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Git hooks passed during commit and push, or npx prek run --from-ref main --to-ref HEAD passes
Targeted tests pass for changed behavior
Full npm test passes (broad runtime changes only)
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Targeted verification:

bash -n test/e2e/test-kimi-inference-compat.sh

Nightly evidence:

Main full nightly run 27487294753 failed kimi-inference-compat-e2e only because sourceAssistantCommands was ['uptime'] while the same trajectory showed toolMetasCount: 3, command set date/hostname/uptime, final status success, and final assistant text was correct.

Docs review: no user-facing docs changes needed; this is E2E harness stabilization only.

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Enhanced end-to-end test validation to comprehensively examine tool call handling across all assistant messages with tool call blocks, ensuring more thorough assertion checks for command behavior.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-14T04:09:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c7b6fbce-3bde-4deb-b316-1d5bce48864c

📥 Commits

Reviewing files that changed from the base of the PR and between 1f75d03 and 94d7b0a.

📒 Files selected for processing (1)

test/e2e/test-kimi-inference-compat.sh

📝 Walkthrough

Walkthrough

The embedded Python inside check_trajectory_acceptance in the E2E test script is updated so that source_calls is built by iterating over all assistant messages containing tool-call blocks, rather than extracting content only from the last such message.

Changes

Trajectory Acceptance Tool-Call Aggregation

Layer / File(s)	Summary
Aggregate all assistant tool-call messages into `source_calls` `test/e2e/test-kimi-inference-compat.sh`	Replaces the single last-message extraction with a loop that appends tool-call content from every qualifying assistant message, producing a complete flattened `source_calls` list used for command-order and semicolon-join assertions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐇 One message was never enough for me,
I hopped through each assistant turn to see,
All tool calls collected, none left behind,
The trajectory checks now cover each find.
✨ A loop replaces a single peek — hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): accept multiturn Kimi tool calls' directly and specifically summarizes the main change: updating the E2E test to accept tool calls emitted across multiple assistant turns instead of just the last message.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/kimi-trajectory-multiturn-tools

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-code-quality · 2026-06-14T04:09:46Z

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the fix/kimi-trajectory-... branch is 96%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	fix/kimi-trajectory-... `94d7b0a`	+/-
`nemoclaw/src/se...cret-scanner.ts`	—	100%	—
`nemoclaw/src/commands/slash.ts`	—	100%	—
`nemoclaw/src/li...bprocess-env.ts`	—	100%	—
`nemoclaw/src/bl...eprint/state.ts`	—	98%	—
`nemoclaw/src/onboard/config.ts`	—	98%	—
`nemoclaw/src/bl...int/snapshot.ts`	—	97%	—
`nemoclaw/src/bl...print/runner.ts`	—	95%	—
`nemoclaw/src/co...ration-state.ts`	—	94%	—
`nemoclaw/src/bl...ate-networks.ts`	—	94%	—
`nemoclaw/src/index.ts`	—	94%	—

TypeScript / code-coverage/cli

The overall coverage in the fix/kimi-trajectory-... branch is 44%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	fix/kimi-trajectory-... `94d7b0a`	+/-
`src/lib/state/o...oard-session.ts`	—	90%	—
`src/lib/inference/local.ts`	—	77%	—
`src/lib/sandbox/config.ts`	—	72%	—
`src/lib/inference/nim.ts`	—	72%	—
`src/lib/onboard/preflight.ts`	—	64%	—
`src/lib/state/sandbox.ts`	—	55%	—
`src/lib/onboard...er-gpu-patch.ts`	—	50%	—
`src/lib/actions...licy-channel.ts`	—	49%	—
`src/lib/policy/index.ts`	—	48%	—
`src/lib/onboard.ts`	—	17%	—

_{Updated June 14, 2026 04:12 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.}

github-actions · 2026-06-14T04:10:18Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: kimi-inference-compat-e2e

Dispatch hint: kimi-inference-compat-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No merge-blocking E2E is required because this is a tests-only change and does not affect NemoClaw runtime or user flows. Running the touched Kimi compatibility E2E is still useful to validate the updated test acceptance logic.

Optional E2E

kimi-inference-compat-e2e (medium): Optional confidence check for the modified E2E script itself. This job runs test/e2e/test-kimi-inference-compat.sh and validates Kimi K2.6 safe exec splitting through OpenClaw trajectories, including the acceptance logic changed by this PR.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: kimi-inference-compat-e2e

github-actions · 2026-06-14T04:10:19Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

None. Only a legacy test/e2e shell test changed outside test/e2e-scenario/ and the Vitest scenario workflow; this does not affect Vitest scenario behavior or dispatchable Vitest scenario coverage.

Optional Vitest E2E scenarios

None.

Relevant changed files

None.

github-actions · 2026-06-14T04:11:15Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Filter flattened source blocks to toolCall entries

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Kimi trajectory acceptance verifier: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The diff changes last assistant tool-call message inspection to aggregating all selected messages, but does not filter the flattened blocks to `type == "toolCall"`.
Filter flattened source blocks to toolCall entries (test/e2e/test-kimi-inference-compat.sh:652): The new loop extends each selected assistant message's entire content array. Because messages are selected when they contain at least one toolCall, any valid assistant message that also includes text, reasoning, or another non-tool content block will add an entry whose command is None, making the strict source_commands == ["hostname", "date", "uptime"] assertion fail even though the actual tool calls are correct.
- Recommendation: When flattening across assistant messages, append only blocks with block.get("type") == "toolCall" before deriving source_commands. A small fixture/helper test for mixed text-plus-tool-call content would make the intended acceptance rule explicit.
- Evidence: The prior predicate already checks any(block.get("type") == "toolCall" ...), but the new code does source_calls.extend(message.get("content", [])) without applying that same block-type filter.

🌱 Nice ideas

None.

Consider writing more tests for

**Acceptance clause:** No deterministic linked issue clauses or linked issue comments were available in the validation context. — add test evidence or identify existing coverage. validation_context.linkedIssues is empty. The PR body says it is related to fix(e2e): stabilize nightly recovery coverage #5401, but no linked issue clauses/comments were provided by the deterministic context to map literally.
**Acceptance clause:** Flatten tool-call content from all assistant tool-call messages in `test/e2e/test-kimi-inference-compat.sh` before checking the expected `hostname`, `date`, `uptime` order. — add test evidence or identify existing coverage. The diff now aggregates content from all assistant messages with tool calls, replacing last-message-only behavior. However, it flattens all content blocks from those messages rather than only tool-call blocks.
**Kimi trajectory acceptance verifier** — No dedicated fixture or helper test was added for split multi-turn source commands or mixed content-plus-tool-call messages; coverage currently depends on running this E2E script against live/mock behavior.. The diff changes last assistant tool-call message inspection to aggregating all selected messages, but does not filter the flattened blocks to `type == "toolCall"`.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-14T04:15:18Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27487923357
Target ref: fix/kimi-trajectory-multiturn-tools
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 0 cancelled, 0 skipped

Job	Result
kimi-inference-compat-e2e	✅ success

test(e2e): accept multiturn Kimi tool calls

94d7b0a

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 14, 2026

cv merged commit 618e978 into main Jun 14, 2026
116 of 117 checks passed

cv deleted the fix/kimi-trajectory-multiturn-tools branch June 14, 2026 16:11

cv added the v0.0.65 Release target label Jun 15, 2026

Conversation

cv commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-code-quality Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Overview

TypeScript / code-coverage/plugin

TypeScript / code-coverage/cli

Uh oh!

github-actions Bot commented Jun 14, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 14, 2026

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 14, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 14, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cv commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

github-code-quality Bot commented Jun 14, 2026 •

edited

Loading