Skip to content

Add Kimi-K2 tool simulation environment#1339

Open
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/kimi-k2-tool-sim
Open

Add Kimi-K2 tool simulation environment#1339
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/kimi-k2-tool-sim

Conversation

@poofeth
Copy link
Copy Markdown

@poofeth poofeth commented May 11, 2026

Claims https://algora.io/PrimeIntellect-ai/bounties/de4q6RZcDpDDQPXL

Summary

  • adds a deterministic kimi_k2_tool_sim SingleTurnEnv for tool-call sequence simulation
  • includes four provider-free scenarios covering lookup/action, refund, issue-summary, and currency/invoice workflows
  • scores completions with exact tool sequence, argument matching, and final-answer rewards
  • documents installation/usage and adds focused regression tests

Validation

  • uv run pytest tests/test_kimi_k2_tool_sim_environment.py -q
  • uv run ruff check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
  • uv run ruff format --check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
  • git diff --check

Note

Low Risk
Low risk: adds a new, self-contained example environment plus docs/tests, with no changes to shared runtime or security-sensitive logic.

Overview
Adds a new deterministic kimi-k2-tool-sim SingleTurnEnv that evaluates simulated tool-use by requiring JSON output with an exact tool-call sequence, argument matching, and final answer matching across four built-in scenarios.

Registers the environment via pyproject.toml entry point, documents install/usage and scoring, links it from environments/README.md, and adds targeted tests covering dataset construction, JSON extraction, reward partial credit, and error handling.

Reviewed by Cursor Bugbot for commit 169f468. Bugbot is set up for automated code reviews on this repo. Configure here.

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 592fda7. Configure here.

Comment thread environments/kimi_k2_tool_sim/kimi_k2_tool_sim.py
Comment thread environments/kimi_k2_tool_sim/pyproject.toml
@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Addressed both Bugbot findings in commit 169f468:

  • non-dict arguments values now count the expected arguments as misses instead of removing that call from the argument denominator
  • added a regression test for malformed argument maps
  • added Hatch build metadata and explicit package include files to pyproject.toml

Validation:

$ uv run pytest tests/test_kimi_k2_tool_sim_environment.py -q
.......                                                                  [100%]

$ uv run ruff check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
All checks passed!

$ uv run ruff format --check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
2 files already formatted

$ git diff --check
# no output

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

The two Bugbot inline comments appear to be stale against the original diff, but I rechecked them against the current branch:

  • argument_total is incremented before malformed/non-dict actual arguments are skipped, so expected args now count as misses.
  • pyproject.toml includes [build-system] and [tool.hatch.build] with Hatchling config.

Fresh validation on current HEAD:

uv run pytest tests/test_kimi_k2_tool_sim_environment.py -q
# 7 passed

uv run ruff check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
# All checks passed

uv run ruff format --check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
# 2 files already formatted

git diff --check
# no output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant