Add GSM-Infinite training environment by poofeth · Pull Request #1336 · PrimeIntellect-ai/verifiers

poofeth · 2026-05-11T05:37:44Z

Summary

add a GSM-Infinite train/eval environment backed by the public InfiniAILab Hugging Face datasets
support medium/hard/symbolic dataset variants, context-length suffixes, operation splits, and optional train/eval example limits
normalize GSM-Infinite rows into Verifiers question/answer/info examples and score final Answer: ... values with an exact-answer reward

Bounty: https://algora.io/PrimeIntellect-ai/bounties/m4592Ap5jB1qm9yd

Submitted as the custom Algora bounty claim for GSM-Infinite.

Validation

uv run pytest tests/test_gsm_infinite_environment.py -q
uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
git diff --check
live smoke-loaded one row from InfiniAILab/gsm_infinite_medium_0 / ops_2 and verified symbolic dataset naming resolves to InfiniAILab/gsm_infinite_symbolic_example_0

Note

Medium Risk
Introduces a new HF-backed environment and answer parsing/reward logic; main risk is dataset naming/split parameters and answer extraction edge cases affecting scoring correctness.

Overview
Adds a new installable gsm-infinite SingleTurn environment backed by InfiniAILab’s Hugging Face GSM-Infinite datasets, with configurable subset/context-length/operation split and optional train/eval example limits.

Implements row normalization into question/answer/info plus an exact-match reward based on the final Answer: ... in the model output, and registers the environment via a new pyproject.toml entry point; includes docs and focused tests for dataset naming, mapping, limiting, and reward parsing.

^{Reviewed by Cursor Bugbot for commit 76ab25b. Bugbot is set up for automated code reviews on this repo. Configure here.}

poofeth · 2026-05-11T05:38:03Z

Validation evidence for the bounty review:

$ uv run pytest tests/test_gsm_infinite_environment.py -q
.....                                                                    [100%]

$ uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
All checks passed!

$ uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
2 files already formatted

$ git diff --check
# no output

$ uv run python - <<'PY'
# loaded one example through load_gsm_infinite_dataset(num_examples=1)
# result: 1 ['question', 'answer', 'info']; answer: 2
# symbolic name check: InfiniAILab/gsm_infinite_symbolic_example_0
PY

This PR targets https://algora.io/PrimeIntellect-ai/bounties/m4592Ap5jB1qm9yd.

poofeth · 2026-05-11T05:44:10Z

Proactive follow-up in commit a9d549a:

added the gsm-infinite environment entry point
included README.md in the package build
listed gsm_infinite in environments/README.md

Validation after the update:

$ uv run pytest tests/test_gsm_infinite_environment.py -q
.....                                                                    [100%]

$ uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
All checks passed!

$ uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
2 files already formatted

$ git diff --check
# no output

poofeth · 2026-05-11T05:55:19Z

Addressed the Bugbot answer-extraction feedback in commit 203cfb1:

collapsed the unreachable duplicate Answer: regex path into one case-insensitive prefix parser
avoid re-extracting expected answers during reward normalization, preserving multi-value list answers
added regression coverage for multi-value answers and prefixed numeric/symbolic extraction

Validation:

$ uv run pytest tests/test_gsm_infinite_environment.py -q
.......                                                                  [100%]

$ uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
All checks passed!

$ uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
2 files already formatted

$ git diff --check
# no output

poofeth · 2026-05-11T06:06:39Z

Addressed the latest Bugbot follow-up in commit 4f732d8:

extract_answer now uses the final Answer: marker instead of the first marker, matching the prompt instruction to end with the final result
eval_subset, eval_context_length, and eval_operations now only fall back on None, preserving explicit falsy values
added regression coverage for intermediate/final answer markers and eval_operations=0

Validation:

$ uv run pytest tests/test_gsm_infinite_environment.py -q
........                                                                 [100%]

$ uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
All checks passed!

$ uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
2 files already formatted

$ git diff --check
# no output

poofeth · 2026-05-11T06:16:55Z

Addressed the latest Bugbot symbolic dataset finding in commit e1e9a26:

changed symbolic dataset resolution from InfiniAILab/gsm_infinite_symbolic_example_{context_length} to the full dataset names InfiniAILab/gsm_infinite_symbolic_{context_length}
added regression coverage for both symbolic, context_length=0 and symbolic, context_length=8k
verified the Hugging Face dataset API currently resolves InfiniAILab/gsm_infinite_symbolic_0, InfiniAILab/gsm_infinite_symbolic_8k, and InfiniAILab/gsm_infinite_symbolic_16k

Validation after the fix:

$ uv run pytest tests/test_gsm_infinite_environment.py -q
........                                                                 [100%]

$ uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
All checks passed!

$ uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
2 files already formatted

$ git diff --check
# no output

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e1e9a26. Configure here.}

poofeth · 2026-05-11T06:26:39Z

Addressed the latest Bugbot answer_list=None fallback finding in commit 76ab25b:

row_to_example now falls back to solution when the answer_list key exists but its value is None or otherwise empty
added regression coverage so a row with answer_list = None extracts the expected answer from solution instead of producing "None"

Validation:

$ uv run pytest tests/test_gsm_infinite_environment.py -q
.........                                                                [100%]

$ uv run ruff check environments/gsm_infinite tests/test_gsm_infinite_environment.py
All checks passed!

$ uv run ruff format --check environments/gsm_infinite tests/test_gsm_infinite_environment.py
2 files already formatted

$ git diff --check
# no output

poofeth · 2026-05-11T09:53:20Z

/claim https://algora.io/PrimeIntellect-ai/bounties/m4592Ap5jB1qm9yd

Add GSM-Infinite training environment

0f1f9f2

Register GSM-Infinite environment

a9d549a

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread environments/gsm_infinite/gsm_infinite.py Outdated

Comment thread environments/gsm_infinite/gsm_infinite.py

Fix GSM-Infinite answer normalization

203cfb1

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread environments/gsm_infinite/gsm_infinite.py Outdated

Comment thread environments/gsm_infinite/gsm_infinite.py Outdated

Use final GSM-Infinite answer marker

4f732d8

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread environments/gsm_infinite/gsm_infinite.py Outdated

Fix GSM-Infinite symbolic dataset names

e1e9a26

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread environments/gsm_infinite/gsm_infinite.py Outdated

Handle missing GSM-Infinite answer lists

76ab25b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GSM-Infinite training environment#1336

Add GSM-Infinite training environment#1336
poofeth wants to merge 6 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/gsm-infinite-env

poofeth commented May 11, 2026 •

edited

Loading

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

poofeth commented May 11, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

poofeth commented May 11, 2026 •

edited

Loading

Uh oh!

poofeth commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poofeth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

poofeth commented May 11, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

poofeth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poofeth commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poofeth commented May 11, 2026 •

edited

Loading

poofeth commented May 11, 2026 •

edited

Loading

poofeth commented May 11, 2026 •

edited

Loading

poofeth commented May 11, 2026 •

edited

Loading

poofeth commented May 11, 2026 •

edited

Loading