-
Notifications
You must be signed in to change notification settings - Fork 1
Support custom issues #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…correct key from dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| ], | ||
| *get_test_directives(instance), | ||
| ] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyError for custom repos without test_cmds field
The PR adds support for custom repos by changing make_test_spec to use MAP_REPO_VERSION_TO_SPECS.get(repo, {}).get(version, {}). However, when an instance doesn't provide test_cmds, the fallback path in make_eval_script_list_py directly accesses MAP_REPO_VERSION_TO_SPECS[instance["repo"]][instance["version"]]["test_cmd"]. For custom repos not in this mapping, this will raise a KeyError, crashing the evaluation script generation. The fallback should use the same safe .get() pattern or handle the missing key gracefully.
| # Fallback to hardcoded mapping if custom parser not found | ||
| log_parser = MAP_REPO_TO_PARSER[repo] | ||
| else: | ||
| log_parser = MAP_REPO_TO_PARSER[repo] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyError for custom repos without parser_content field
When parser_content is not provided or doesn't contain a valid parse function, the code falls back to MAP_REPO_TO_PARSER[repo]. For custom repos not in this mapping, this will raise a KeyError and crash the grading process. Since the PR adds support for custom repos, there needs to be a safe fallback when the repo isn't in MAP_REPO_TO_PARSER, such as using .get() with a default parser or raising a more descriptive error.
Fix/2nd batch
| if custom_log_parser: | ||
| # Custom parser returns JSON string, convert to dict | ||
| def custom_parser_wrapper(log_content: str, _: TestSpec) -> dict[str, str]: | ||
| return custom_log_parser(log_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment claims JSON conversion but none performed
The comment states "Custom parser returns JSON string, convert to dict" but the custom_parser_wrapper function simply returns custom_log_parser(log_content) without any JSON conversion. If custom parsers actually return JSON strings as the comment indicates, the result would be a string instead of a dict, causing failures when downstream code tries to use it as a dictionary. Either the comment is incorrect (existing parsers return dicts) and needs to be fixed, or a json.loads() call is missing.
| "source /opt/miniconda3/bin/activate", | ||
| f"conda activate {env_name}", | ||
| # "source /opt/miniconda3/bin/activate", | ||
| # f"conda activate {env_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conda activation removed for all instances unconditionally
The conda activation commands (source /opt/miniconda3/bin/activate and conda activate {env_name}) are commented out for ALL instances, not just custom docker_image ones. The eval script runs via /bin/bash /eval.sh which is non-interactive and doesn't source .bashrc. For standard SWE-bench instances that don't use docker_image, tests will run without the conda environment activated, causing them to use the wrong Python version and missing dependencies. The removal of conda activation should be conditional on docker_image being present, similar to how make_env_script_list_py handles it.
| if not docker_image and "install_config" in instance: | ||
| docker_image = instance["install_config"].get("docker_image") | ||
| dockerfile = instance.get("dockerfile") or instance.get("DockerFile") # Handle both cases | ||
| test_cmds = instance.get("test_cmds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_cmds not parsed from JSON string format
The test_cmds field is extracted with instance.get("test_cmds") without JSON parsing, unlike FAIL_TO_PASS and PASS_TO_PASS which use _from_json_or_obj() to handle JSON string inputs. If a dataset provides test_cmds as a JSON string (e.g., '["pytest", "test.py"]'), the subsequent " && ".join() operation in make_eval_script_list_py would join individual characters instead of commands, producing malformed output.
| print(f"{'='*80}") | ||
| print(f"Total instances: {total_instances}") | ||
| print(f"Instances with at least 1 pass: {instances_with_at_least_one_pass} ({100*instances_with_at_least_one_pass/total_instances:.1f}%)") | ||
| print(f"Instances with all {k} passes: {instances_with_all_passes} ({100*instances_with_all_passes/total_instances:.1f}%)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Division by zero when dataset is empty
The code divides by total_instances on lines 282-283 without checking if it's zero, while line 286 correctly guards against this with a conditional. If the dataset file is empty or contains no valid instances, total_instances will be 0 and a ZeroDivisionError will crash the script.
| # f"cd {repo_directory}", | ||
| #] | ||
| # if "eval_commands" in specs: | ||
| # eval_commands = specs["eval_commands"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Django eval_commands for locale settings now ignored
The code that processes eval_commands from specs is completely commented out. Django specs (versions 1.7-3.2) define eval_commands with critical locale settings (LANG, LC_ALL, PYTHONIOENCODING, LANGUAGE). These locale exports are required for Django tests to handle string encoding correctly. With this code commented out, Django tests will run without the required locale configuration, likely causing encoding-related test failures.
This PR can be easily reviewed by commit and here is the changelog:
docker_imageandDockerFilefields, so it pulls the pre-built docker images, with a fallback to building the Dockerfile if the pull is unsuccessfultest_cmdsandparser_contentfields from the dataset to run the exact commands and correctly parse results customized for each test framework and instance.FAIL_TO_PASSandPASS_TO_PASSfields.conda activateand the installation commands (pip install -e .) done for each instance, as this was unnecessary since the pre-built image already provided the appropriate environment.networkflag withhostto support issues with tests that rely on setting up dummy web serversNote
Introduces configurable runtime and evaluation driven by dataset fields, plus a utility to aggregate multi-run results.
docker_build.pynow supportsdocker_imagepull-and-tag with fallback to building from a provideddockerfile; removes pre-existing containers before create; sets containernetwork_modetohost.test_cmdswhen present and pulls test selectors fromFAIL_TO_PASS/PASS_TO_PASS; skips conda env setup when adocker_imageis provided; allows custom Python version; trims conda activation from eval.grading.pyexecutesparser_content(if provided) to parse logs, with fallback to built-in parsers and whole-log parsing when markers are missing.pywhen repo ext/specs missing; safer spec lookups; addedbuild_custom_instance_image.evaluate_pass_at_k.pyto run multiple evaluations and compute pass@k;.gitignorenow ignores.python-version.Written by Cursor Bugbot for commit a166590. This will update automatically on new commits. Configure here.