Add Kimi-K2 tool simulation environment by poofeth · Pull Request #1339 · PrimeIntellect-ai/verifiers

poofeth · 2026-05-11T13:16:57Z

Claims https://algora.io/PrimeIntellect-ai/bounties/de4q6RZcDpDDQPXL

Summary

adds a deterministic kimi_k2_tool_sim SingleTurnEnv for tool-call sequence simulation
includes four provider-free scenarios covering lookup/action, refund, issue-summary, and currency/invoice workflows
scores completions with exact tool sequence, argument matching, and final-answer rewards
documents installation/usage and adds focused regression tests

Validation

uv run pytest tests/test_kimi_k2_tool_sim_environment.py -q
uv run ruff check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
uv run ruff format --check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
git diff --check

Note

Low Risk
Low risk: adds a new, self-contained example environment plus docs/tests, with no changes to shared runtime or security-sensitive logic.

Overview
Adds a new deterministic kimi-k2-tool-sim SingleTurnEnv that evaluates simulated tool-use by requiring JSON output with an exact tool-call sequence, argument matching, and final answer matching across four built-in scenarios.

Registers the environment via pyproject.toml entry point, documents install/usage and scoring, links it from environments/README.md, and adds targeted tests covering dataset construction, JSON extraction, reward partial credit, and error handling.

^{Reviewed by Cursor Bugbot for commit 169f468. Bugbot is set up for automated code reviews on this repo. Configure here.}

poofeth · 2026-05-11T13:18:48Z

/claim https://algora.io/PrimeIntellect-ai/bounties/de4q6RZcDpDDQPXL

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 592fda7. Configure here.}

poofeth · 2026-05-11T13:27:01Z

Addressed both Bugbot findings in commit 169f468:

non-dict arguments values now count the expected arguments as misses instead of removing that call from the argument denominator
added a regression test for malformed argument maps
added Hatch build metadata and explicit package include files to pyproject.toml

Validation:

$ uv run pytest tests/test_kimi_k2_tool_sim_environment.py -q
.......                                                                  [100%]

$ uv run ruff check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
All checks passed!

$ uv run ruff format --check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
2 files already formatted

$ git diff --check
# no output

poofeth · 2026-05-11T13:30:17Z

The two Bugbot inline comments appear to be stale against the original diff, but I rechecked them against the current branch:

argument_total is incremented before malformed/non-dict actual arguments are skipped, so expected args now count as misses.
pyproject.toml includes [build-system] and [tool.hatch.build] with Hatchling config.

Fresh validation on current HEAD:

uv run pytest tests/test_kimi_k2_tool_sim_environment.py -q
# 7 passed

uv run ruff check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
# All checks passed

uv run ruff format --check environments/kimi_k2_tool_sim tests/test_kimi_k2_tool_sim_environment.py
# 2 files already formatted

git diff --check
# no output

Add Kimi-K2 tool simulation environment

592fda7

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread environments/kimi_k2_tool_sim/kimi_k2_tool_sim.py

Comment thread environments/kimi_k2_tool_sim/pyproject.toml

Fix Kimi-K2 tool sim packaging and scoring

169f468

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kimi-K2 tool simulation environment#1339

Add Kimi-K2 tool simulation environment#1339
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/kimi-k2-tool-sim

poofeth commented May 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

poofeth commented May 11, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poofeth commented May 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

poofeth commented May 11, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poofeth commented May 11, 2026 •

edited by cursor Bot

Loading

poofeth commented May 11, 2026 •

edited

Loading