-
Notifications
You must be signed in to change notification settings - Fork 105
[autorevert] Extract test retries as separate events #7327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub. |
prev_key: Optional[Tuple[datetime, int, SignalStatus]] = None | ||
for e in c.events: # already sorted by (started_at, wf_run_id) | ||
key = (e.started_at, e.wf_run_id) | ||
key = (e.started_at, e.wf_run_id, e.status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to dedup by status? I mean, for flakiness analysis it is relevant if the job failed 3x before succeeding once or it failed 1x and succeeded once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding status to the key prevents deduping events with different status.
There are two constraints:
- github tends to reuse the same jobs (assigning them different ids) when "retry failed" is invoked for the workflow. This is what dedup is trying to solve.
- test reruns come from the same job (so they have the same timestamp and workflow run_id)
I can think of an alternative way to extract test retries and avoid deduplicating them — we can artificially increase the synthetic event time by one second.
However, you're right to question how that change would affect signal processing. I think tests are retried only when first run fails, so in theory this change only acts as a filter. Maybe we can simplify the whole thing and just not consider signals when we detect mixed retries on test levels (pre-filter them even before they are processed).
There are 3 things we need to discuss before moving on:
|
This helps filter out signals from flaky tests (that succeed after rerun) and avoid reverting based on them. Such tests are excluded from HUD, so this makes autorevert more consistent with HUD.
Changes:
Extraction result before:
2025-10-08T18-48-39.575589-00-00.html
Extraction result after:
2025-10-08T21-30-45.676074-00-00.html
notice flaky signals like:
pull:inductor/test_compile_subprocess.py::test_remove_noop_slice_scatter_cpu
pull:inductor/test_compile_subprocess.py::test_remove_noop_slice1_cpu