Add GSM-Infinite training environment#1336
Conversation
|
Validation evidence for the bounty review: This PR targets https://algora.io/PrimeIntellect-ai/bounties/m4592Ap5jB1qm9yd. |
|
Proactive follow-up in commit
Validation after the update: |
|
Addressed the Bugbot answer-extraction feedback in commit
Validation: |
|
Addressed the latest Bugbot follow-up in commit
Validation: |
|
Addressed the latest Bugbot symbolic dataset finding in commit
Validation after the fix: |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e1e9a26. Configure here.
|
Addressed the latest Bugbot
Validation: |

Summary
question/answer/infoexamples and score finalAnswer: ...values with an exact-answer rewardBounty: https://algora.io/PrimeIntellect-ai/bounties/m4592Ap5jB1qm9yd
Submitted as the custom Algora bounty claim for GSM-Infinite.
Validation
uv run pytest tests/test_gsm_infinite_environment.py -quv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.pyuv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.pygit diff --checkInfiniAILab/gsm_infinite_medium_0/ops_2and verified symbolic dataset naming resolves toInfiniAILab/gsm_infinite_symbolic_example_0Note
Medium Risk
Introduces a new HF-backed environment and answer parsing/reward logic; main risk is dataset naming/split parameters and answer extraction edge cases affecting scoring correctness.
Overview
Adds a new installable
gsm-infiniteSingleTurn environment backed by InfiniAILab’s Hugging Face GSM-Infinite datasets, with configurable subset/context-length/operation split and optional train/eval example limits.Implements row normalization into
question/answer/infoplus an exact-match reward based on the finalAnswer: ...in the model output, and registers the environment via a newpyproject.tomlentry point; includes docs and focused tests for dataset naming, mapping, limiting, and reward parsing.Reviewed by Cursor Bugbot for commit 76ab25b. Bugbot is set up for automated code reviews on this repo. Configure here.