Skip to content

Conversation

@wassafshahzad
Copy link
Contributor

@wassafshahzad wassafshahzad commented Nov 30, 2025

Description

I added threading to generate diff to make it run concurrently,. The design is same retrieve_logs. The only difference is that we returen a tuple (mapping_erros, diff_errors) and add them in the end for the total.

Linked PRS

Closes #5400

Copy link
Member

@suhaibmujahid suhaibmujahid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of this, and sorry for the late review! Please take a look at my comments.

utils.upload_s3([diff_zst_path])
else:
diff_errors += 1
for future in tqdm(as_completed(futures), totla=len(futures)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use as_completed() within the ThreadPoolExecutor() context?

return None


def process_diff(bug_id, obj, upload, repo_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a docstring to explain the returned value since it is not straightforward and could be confusing. Also, I would add a type hint for it as well.

return None


def process_diff(bug_id, obj, upload, repo_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a descriptive function name would be helpful.

Comment on lines +314 to +322
with open(diff_path, "wb") as f:
f.write(diff)

utils.zstd_compress(diff_path)

os.remove(diff_path)

if upload:
utils.upload_s3([diff_zst_path])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return None in this path, but the caller expects tuple(int, int).

Comment on lines +314 to +322
with open(diff_path, "wb") as f:
f.write(diff)

utils.zstd_compress(diff_path)

os.remove(diff_path)

if upload:
utils.upload_s3([diff_zst_path])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the happy path to the lift where possible, that is, I would flip the condition and return early for the falsy value, i.e., (0, 1). This way, we won't need to use else. That was not possible with the previous implementation, but now that the logic is wrapped in a function, it is better to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[retrieve_ci_failures] Generate diffs in parallel

2 participants