Skip to content

Add SWE-Swiss SFT environment#1337

Open
poofeth wants to merge 3 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/swe-swiss-sft
Open

Add SWE-Swiss SFT environment#1337
poofeth wants to merge 3 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/swe-swiss-sft

Conversation

@poofeth
Copy link
Copy Markdown

@poofeth poofeth commented May 11, 2026

Summary

  • add a lightweight SWE-Swiss SFT train/eval environment backed by public Hugging Face SWE-Swiss datasets
  • normalize messages rows into Verifiers prompt/question/answer/info examples, supporting both JSON-string and list message formats
  • score completions with normalized text similarity against the reference assistant response

Bounty: https://algora.io/PrimeIntellect-ai/bounties/FHr4fCjVPjFDmFED

Submitted as the custom Algora bounty claim for the SWE-Swiss training-data environment lane.

Scope note: this targets the existing SWE-Swiss training-data environment path, not the full auto-generation pipeline tier. It keeps runtime requirements light and does not require a sandboxed repository checkout.

Validation

  • uv run pytest tests/test_swe_swiss_sft_environment.py -q
  • uv run ruff check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
  • uv run ruff format --check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
  • git diff --check
  • live smoke-loaded one row from SWE-Swiss/SWESwiss-SFT-Repair-4K into prompt, question, answer, and info columns

Note

Low Risk
Low risk: adds a new standalone environment package plus tests and docs, without modifying core verifier/runtime logic. Main risk is dataset-shape assumptions (messages formatting) affecting usability for some SWE-Swiss dataset variants.

Overview
Adds a new installable swe-swiss-sft SingleTurn environment backed by Hugging Face SWE-Swiss SFT datasets, converting each row’s messages transcript into prompt/question/answer examples (supports list or JSON-string messages).

Introduces a lightweight reward (SequenceMatcher-based normalized text similarity) and exposes configurable dataset/split/example-limit/shuffle arguments via load_environment, with accompanying environment README, packaging (pyproject.toml entry point), a brief mention in environments/README.md, and new unit tests covering parsing, dataset loading, reward behavior, and lazy environment construction.

Reviewed by Cursor Bugbot for commit 16e512b. Bugbot is set up for automated code reviews on this repo. Configure here.

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Validation evidence for the bounty review:

$ uv run pytest tests/test_swe_swiss_sft_environment.py -q
.....                                                                    [100%]

$ uv run ruff check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
All checks passed!

$ uv run ruff format --check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
2 files already formatted

$ git diff --check
# no output

$ uv run python - <<'PY'
# loaded one example through load_swe_swiss_dataset(num_examples=1, shuffle_seed=None)
# result: 1 ['prompt', 'question', 'answer', 'info']
PY

This PR targets the existing-data path for https://algora.io/PrimeIntellect-ai/bounties/FHr4fCjVPjFDmFED.

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Proactive follow-up in commit a2dc196:

  • listed swe_swiss_sft in environments/README.md

Validation after the update:

$ uv run pytest tests/test_swe_swiss_sft_environment.py -q
.....                                                                    [100%]

$ uv run ruff check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
All checks passed!

$ uv run ruff format --check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
2 files already formatted

$ git diff --check
# no output

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a2dc196. Configure here.

Comment thread environments/swe_swiss_sft/swe_swiss_sft.py Outdated
@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Addressed Bugbot feedback in commit 16e512b:

  • coalesce null role/content values before stringifying, so content: null is skipped instead of becoming the literal string "None"
  • added regression coverage for null-content rows

Validation:

$ uv run pytest tests/test_swe_swiss_sft_environment.py -q
......                                                                   [100%]

$ uv run ruff check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
All checks passed!

$ uv run ruff format --check environments/swe_swiss_sft tests/test_swe_swiss_sft_environment.py
2 files already formatted

$ git diff --check
# no output

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant