[autorevert] Extract test retries as separate events #7327

izaitsevfb · 2025-10-08T21:59:54Z

This helps filter out signals from flaky tests (that succeed after rerun) and avoid reverting based on them. Such tests are excluded from HUD, so this makes autorevert more consistent with HUD.

Changes:

Extract individual test reruns as separate Signal events
Dedup retains both outcomes
For each attempt, emit at most one FAILURE and one SUCCESS event
Skipped tests are excluded from "success"

Extraction result before:
2025-10-08T18-48-39.575589-00-00.html

Extraction result after:
2025-10-08T21-30-45.676074-00-00.html

notice flaky signals like:
pull:inductor/test_compile_subprocess.py::test_remove_noop_slice_scatter_cpu
pull:inductor/test_compile_subprocess.py::test_remove_noop_slice1_cpu

vercel · 2025-10-08T22:00:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Preview	Updated (UTC)
torchci	Ignored	Preview	Oct 8, 2025 10:23pm

jeanschmidt · 2025-10-09T12:49:08Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_extraction.py

+                prev_key: Optional[Tuple[datetime, int, SignalStatus]] = None
                for e in c.events:  # already sorted by (started_at, wf_run_id)
-                    key = (e.started_at, e.wf_run_id)
+                    key = (e.started_at, e.wf_run_id, e.status)


Do we want to dedup by status? I mean, for flakiness analysis it is relevant if the job failed 3x before succeeding once or it failed 1x and succeeded once.

Adding status to the key prevents deduping events with different status.

There are two constraints:

github tends to reuse the same jobs (assigning them different ids) when "retry failed" is invoked for the workflow. This is what dedup is trying to solve.

test reruns come from the same job (so they have the same timestamp and workflow run_id)

I can think of an alternative way to extract test retries and avoid deduplicating them — we can artificially increase the synthetic event time by one second.

However, you're right to question how that change would affect signal processing. I think tests are retried only when first run fails, so in theory this change only acts as a filter. Maybe we can simplify the whole thing and just not consider signals when we detect mixed retries on test levels (pre-filter them even before they are processed).

jeanschmidt · 2025-10-09T12:55:04Z

There are 3 things we need to discuss before moving on:

If the goal is to extract information about individual tests failure/success for flakiness analysis, it is relevant to know how many times it failed before succeeding, so maybe we need to change the approach;
I see that tests are green, but this change have a strong potential to impact signal detection logic in unexpected ways. Can we have some other assurances that its behaviour is consistent? Maybe we're at the point where it is starting to be more-and-more important our capacity to backtest autorevert with real historical data.
Maybe at this stage we might want to also enable some sort of feature control, where we can switch from one behaviour to another live to quickly test code changes + still gather feedback from both logics.

[autorevert] Extract test retries as separate events

55d74bc

pytorch-bot bot added the ci-no-td label Oct 8, 2025

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 8, 2025

fix dedup test

19317dd

jeanschmidt reviewed Oct 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[autorevert] Extract test retries as separate events #7327

[autorevert] Extract test retries as separate events #7327

izaitsevfb commented Oct 8, 2025

Uh oh!

vercel bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

jeanschmidt Oct 9, 2025

Uh oh!

izaitsevfb Oct 9, 2025

Uh oh!

jeanschmidt commented Oct 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[autorevert] Extract test retries as separate events #7327

Are you sure you want to change the base?

[autorevert] Extract test retries as separate events #7327

Conversation

izaitsevfb commented Oct 8, 2025

Uh oh!

vercel bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeanschmidt Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

izaitsevfb Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jeanschmidt commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Oct 8, 2025 •

edited

Loading

jeanschmidt commented Oct 9, 2025 •

edited

Loading