-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: fixed broken swe-bench docker image generation. #1262
Conversation
- created new images with updated composio sdk - fixes: 1155
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
futures.append(executor.submit(_build, file, tag_part, multi)) | ||
[fut.result() for fut in futures] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The futures.append()
and fut.result()
operations are not protected against stop_requested
, allowing in-progress builds to continue after stop signal. Should check stop_requested
before fut.result()
.
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
futures.append(executor.submit(_build, file, tag_part, multi)) | |
[fut.result() for fut in futures] | |
futures.append(executor.submit(_build, file, tag_part, multi)) | |
[fut.result() for fut in futures if not stop_requested] |
|
||
if process.returncode == 0: | ||
print(f"Finished build for {tag}") | ||
record_success(tag) | ||
else: | ||
print(f"Error building {tag} - {logs / log}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for failed builds - successful builds are tracked but failures are silently ignored. Should maintain error state and handle failed builds appropriately.
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if process.returncode == 0: | |
print(f"Finished build for {tag}") | |
record_success(tag) | |
else: | |
print(f"Error building {tag} - {logs / log}") | |
if process.returncode == 0: | |
print(f"Finished build for {tag}") | |
record_success(tag) | |
else: | |
error_msg = f"Error building {tag} - {logs / log}" | |
print(error_msg) | |
record_failure(tag, error_msg) | |
raise BuildError(error_msg) |
if not DEEPLAKE_PATH.exists(): | ||
shutil.copytree( | ||
f"{Path.home()}/.composio/tmp/{outname}/deeplake", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition when checking and creating directories. Use mkdir(parents=True, exist_ok=True)
instead of separate check and create.
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if not DEEPLAKE_PATH.exists(): | |
shutil.copytree( | |
f"{Path.home()}/.composio/tmp/{outname}/deeplake", | |
DEEPLAKE_PATH.mkdir(parents=True, exist_ok=True) | |
shutil.copytree( | |
f"{Path.home()}/.composio/tmp/{outname}/deeplake", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to c2faf5d in 2 minutes and 8 seconds
More details
- Looked at
387
lines of code in4
files - Skipped
0
files when reviewing. - Skipped posting
11
drafted comments based on config settings.
1. python/swe/agent/benchmark.py:10
- Draft comment:
Updated import to ChatBedrock looks correct; ensure SDK change is fully compatible with client.invoke usage. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the PR author to ensure compatibility, which violates the rule against asking for confirmation or testing. It doesn't provide a specific suggestion or point out a specific issue.
2. python/swe/dockerfiles/build.py:40
- Draft comment:
Graceful shutdown support with signal handlers is added; review that stop_requested flag usage is thread‐safe as needed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
In Python, reading/writing a boolean is atomic. The flag is only ever set from False to True once, and only read after that. Even without explicit synchronization, this pattern is actually thread-safe. The worst case is a thread might see the old value (False) for a short time, which just means it would do one more iteration - not a correctness issue.
I could be wrong about Python's guarantees around boolean operations being atomic. There could be subtle race conditions I'm not considering.
Even if there are subtle race conditions, they would only result in one extra iteration of work, not corruption or crashes. The graceful shutdown is still graceful enough for this use case.
The comment raises a valid concern but the current implementation is actually good enough for this use case. The comment should be removed as it would lead to unnecessary work.
3. python/swe/dockerfiles/build.py:59
- Draft comment:
Filename splitting may fail on unexpected formats; good that ValueError is caught, but consider logging full error context. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
The code already handles the error case appropriately by catching ValueError and printing the problematic filename. The error message is clear and provides the key information needed to debug the issue. Additional error context would be redundant since the ValueError would only occur if the filename doesn't contain a period, which is already clear from the error message.
Perhaps there could be value in logging the actual ValueError message in addition to the filename. The current error handling doesn't preserve the original exception details.
The original exception would just indicate that the string couldn't be split, which adds no meaningful information beyond what we already communicate by showing the invalid filename.
The comment should be deleted since the existing error handling is already appropriate and provides sufficient context through the filename.
4. python/swe/dockerfiles/build.py:127
- Draft comment:
Ensure the _build function correctly chooses between multi and single build args; consider capturing stdout for debugging errors. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. python/swe/dockerfiles/create_index.py:105
- Draft comment:
Opening fqdn_cache.json assumes the file exists; consider adding error handling or existence check. - Reason this comment was not posted:
Comment was on unchanged code.
6. python/swe/dockerfiles/create_index.py:115
- Draft comment:
The 'docker_outdir' directory is used for writing files but not ensured to exist; call mkdir to avoid potential errors. - Reason this comment was not posted:
Comment was on unchanged code.
7. python/tox.ini:10
- Draft comment:
Tox configuration updates for isort and black exclude paths look correct; ensure these patterns match the intended directories. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is asking the PR author to ensure that the exclude patterns match the intended directories, which is against the rules of asking the author to confirm or ensure something. It doesn't provide a specific suggestion or point out a specific issue.
8. python/swe/agent/benchmark.py:10
- Draft comment:
Refactored import: Ensure that switching from BedrockChat to ChatBedrock preserves API compatibility. - Reason this comment was not posted:
Marked as duplicate.
9. python/swe/dockerfiles/create_index.py:116
- Draft comment:
Ensure the output directory exists before writing files; add mkdir with parents=True for 'docker_outdir'. - Reason this comment was not posted:
Marked as duplicate.
10. python/swe/dockerfiles/create_index.py:53
- Draft comment:
Review directory usage: 'docker_outdir' is hard-coded to 'generated', which differs from the provided 'outdir' in IndexGenerator. Consider unifying the output directories. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The code appears to intentionally use two directories for different purposes - outdir for git/temporary operations and docker_outdir for final outputs. This separation of concerns seems deliberate. The comment suggests unifying them but doesn't make a strong case for why this would be better. Having separate directories for intermediate vs final outputs is a common and valid pattern.
I could be wrong about the intentionality - maybe this is actually technical debt that should be cleaned up. The separate directories could cause confusion.
While separate directories could potentially cause confusion, the code is clear about the purpose of each directory through its usage patterns. The separation provides useful isolation between temporary and final outputs.
This appears to be an intentional design choice rather than a problem that needs fixing. The comment should be removed as it's suggesting a change without clear benefits.
11. python/tox.ini:10
- Draft comment:
Verify that the updated exclude/skip patterns for isort, black, flake8, and mypy correctly match your directory structure. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_vuBKP3T6X7N9VMH8
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
print("\n".join(errors)) | ||
return | ||
print("==== Successful Builds (after pyenv) ====") | ||
print("\n".join(successful_builds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider sorting the 'successful_builds' set before printing for reproducible output.
|
||
|
||
# Load successful builds from logs/success.log into the global set | ||
def load_successful_builds() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for file operations in load_successful_builds() to handle potential IO errors gracefully:
def load_successful_builds() -> None:
try:
success_file = logs / "success.log"
if success_file.exists():
with open(success_file, "r", encoding="utf-8") as f:
for line in f:
tag = line.strip()
if tag:
successful_builds.add(tag)
except Exception as e:
print(f"Error loading successful builds: {e}")
|
||
# Load successful builds from logs/success.log into the global set | ||
def load_successful_builds() -> None: | ||
success_file = logs / "success.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using constants for frequently used strings and paths:
SUCCESS_LOG_FILE = "success.log"
DOCKER_IMAGE_PREFIX = "composio/swe"
# Then use them like:
success_file = logs / SUCCESS_LOG_FILE
full_tag = f"{DOCKER_IMAGE_PREFIX}:{tag_part}"
This makes the code more maintainable and reduces the risk of typos.
stop_requested = False | ||
|
||
|
||
def handle_stop(signum, frame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding proper logging instead of print statements for better observability and debugging:
def handle_stop(signum, frame):
global stop_requested
logger.info("Received stop signal. Gracefully stopping new builds...")
stop_requested = True
|
||
|
||
# Record a successful build by writing to the log file and updating the global set | ||
def record_success(tag: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding docstrings to the new functions for better code documentation:
def record_success(tag: str) -> None:
"""Record a successful build by writing to the success log file and updating the global set.
Args:
tag: The docker image tag that was successfully built
Thread-safe: Uses success_lock for synchronization
"""
with success_lock:
successful_builds.add(tag)
with open(logs / "success.log", "a", encoding="utf-8") as f:
f.write(tag + "\n")
|
||
# Record a successful build by writing to the log file and updating the global set | ||
def record_success(tag: str) -> None: | ||
with success_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating the docker tag format before recording success to prevent invalid entries:
def is_valid_docker_tag(tag: str) -> bool:
"""Validate docker tag format according to Docker's rules."""
# Basic validation - can be expanded based on needs
return bool(re.match(r'^[a-z0-9][a-z0-9._-]*$', tag))
def record_success(tag: str) -> None:
if not is_valid_docker_tag(tag):
logger.warning(f"Invalid docker tag format: {tag}")
return
with success_lock:
successful_builds.add(tag)
with open(logs / "success.log", "a", encoding="utf-8") as f:
f.write(tag + "\n")
Code Review SummaryOverall, the changes look good and significantly improve the docker build system's reliability and maintainability. Here's a breakdown: Strengths:✅ Added graceful shutdown handling with signal handlers Suggestions for Improvement:
Code Quality: 8/10The code is well-structured and introduces important improvements to the build system. The suggestions above would make it even better, but they're not critical for functionality. Testing:Consider adding:
The PR successfully addresses the broken swe-bench docker image generation issue while adding several quality-of-life improvements to the build system. |
Important
Refactor Docker image generation for swe-bench with updated Composio SDK, improved logging, and error handling.
build.py
to add logging for successful builds and handle stop signals for graceful termination.load_successful_builds()
andrecord_success()
to track successful builds._base()
and_swes()
to skip already built images.get_llm_response()
inbenchmark.py
for Claude model.create_index.py
.BedrockChat
toChatBedrock
inbenchmark.py
.tox.ini
to exclude certain directories from linting and type checking.This description was created by
for c2faf5d. It will automatically update as commits are pushed.