Retry All Tests with a Reduced Retry Limit of 1 #17939

prudhvigodithi · 2025-04-14T21:45:43Z

Description

Currently, retrying specific Gradle tests is broken due to the transition to the Gradle Develocity plugin (coming from Update gradle config for develocity rebranding #13942). Although we’ve retained the Gradle Enterprise plugin (now rebranded as Develocity) from the original fork, we don’t use it since we lack the required license. As a result, the retry functionality tied to this plugin is not working as expected. I suspect this may be contributing to the recent instability. I’ve created a PR attempting to address the issue. Retry All Tests with a Reduced Retry Limit of 1 #17939 (comment)
Using the OpenSearch Gradle Check Metrics Dashboard I was able to see the top 100 failing tests that are flaky since past one year.
This change enables Gradle to automatically retry known flaky tests, reducing the overall number of manual CI retries on open PRs. Since only maintainers can trigger retries, users are currently forced to push a new commit or reopen the PR, which this update helps avoid.
Closed the following flaky test issues as I have seen a PR linked with the fix: Will allow the automation to re-open and we can add them.

Related Issues

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

prudhvigodithi · 2025-04-14T21:47:48Z

Adding @mch2 @getsaurabh02 @andrross @reta @cwperks.

github-actions · 2025-04-14T22:28:26Z

❌ Gradle check result for 7d01417: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-14T23:38:29Z

❌ Gradle check result for 0a0f107: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-15T00:52:55Z

❌ Gradle check result for bfdd7fb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-15T17:32:52Z

❌ Gradle check result for 3da6877: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-15T19:03:55Z

❌ Gradle check result for 40c47d6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

andrross · 2025-04-15T21:16:42Z

build.gradle

@@ -557,6 +557,57 @@ subprojects {
        includeClasses.add("org.opensearch.test.rest.ClientYamlTestSuiteIT")
        includeClasses.add("org.opensearch.upgrade.DetectEsInstallationTaskTests")
        includeClasses.add("org.opensearch.cluster.MinimumClusterManagerNodesIT")
+        includeClasses.add("org.opensearch.indices.IndicesRequestCacheIT")


Obviously the plan to not automatically retry new flaky tests has not prevented the introduction of more flakiness. Should we just give up on this idea and automatically retry all tests?

I'm open for thoughts to retry for all tests as we dont have to maintain a specific list and again remove them once fixed. I have created a META issue #17974 to surface the tests passing with re-try, with this we will have information and metrics on the tests that pass on re-try and take action accordingly.

Just removed the filter with includeClasses.add in my latest commit, let me run the gradle check few times on the PR.

I have updated the list to remove the tests which I did not see fail from past 1 year and with new flaky tests.

I honestly think it might be better to remove the filter, particularly if we're going to add to the list without a good reason why we can't fix the flakiness. I don't like increasing the max retries though. How about removing the filter and setting max retries to 1? Rare flakiness won't block the build because a retry will almost always succeed. More severe flakiness will continue to block the build though. @prudhvigodithi What do you think?

Agree with you on removing the list, to see the same benefits should we continue with 3 as maxRetries ?

@prudhvigodithi Can you remove the filter and set max retries to 1 in this PR and then run gradle check a few times to see what the success rate is?

The reason I'd like to keep retries to a minimum is so that retries only fix the really rare flakiness, but failures still happen for tests with high likelihood of failures.

I have re-tried few times and got successful green CI in a row. Seen an unstable which indicates a re-try.

github-actions · 2025-04-15T21:24:04Z

❌ Gradle check result for 87b8b8d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-15T22:26:59Z

❌ Gradle check result for 6fd5c74: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-16T00:47:30Z

❕ Gradle check result for 572728a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2025-04-16T00:48:13Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.31%. Comparing base (cbaddd3) to head (3a74dc5).
Report is 11 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #17939      +/-   ##
============================================
+ Coverage     72.51%   73.31%   +0.80%     
- Complexity    67108    67743     +635     
============================================
  Files          5475     5478       +3     
  Lines        309916   310034     +118     
  Branches      45060    45066       +6     
============================================
+ Hits         224725   227316    +2591     
+ Misses        66895    64776    -2119     
+ Partials      18296    17942     -354

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-04-16T02:39:59Z

❕ Gradle check result for 572728a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2025-04-16T04:26:47Z

❕ Gradle check result for 572728a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2025-04-16T05:52:42Z

❌ Gradle check result for 572728a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

prudhvigodithi · 2025-04-18T16:21:25Z

I would like to see this merged as it would give some bandwidth and unblock the contribution PR's. I have a META #17974 created which talks about having more mechanisms to control the flaky tests.
@andrross @reta @getsaurabh02

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-04-18T19:21:05Z

❌ Gradle check result for d9760c8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-04-18T20:23:43Z

✅ Gradle check result for 3a74dc5: SUCCESS

github-actions · 2025-04-18T21:17:57Z

❕ Gradle check result for 3a74dc5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2025-04-19T17:56:35Z

✅ Gradle check result for 3a74dc5: SUCCESS

github-actions · 2025-04-20T03:52:22Z

✅ Gradle check result for 3a74dc5: SUCCESS

andrross · 2025-04-21T21:20:21Z

@prudhvigodithi You should probably update the PR description to what the latest approach here is ("Retry all tests, but reduce numbers of retries to 1")

@mch2 @msfroh @jainankitk @bugmakerrrrrr @ashking94 @cwperks @reta Tagging some other maintainers here. This is a pretty consequential change in terms of how we're enforcing code quality. I don't love the idea of new tests being added that fail quite often but the retry logic kicks in so it won't cause any friction and we'll never need to look at fixing it (*). At the same time, in practice we are introducing lots of flakiness and are just retrying them manually anyway. Overall, I'm sort of neutral about this change given the realities of the situation, but if someone else thinks this is the right way to go I wouldn't block it.

(*) If #17974 can mitigate this problem, I'd have no concerns with this change. But we haven't actually implemented anything yet.

reta · 2025-04-22T00:51:15Z

@mch2 @msfroh @jainankitk @bugmakerrrrrr @ashking94 @cwperks @reta Tagging some other maintainers here. This is a pretty consequential change in terms of how we're enforcing code quality. I don't love the idea of new tests being added that fail quite often but the retry logic kicks in so it won't cause any friction and we'll never need to look at fixing it (*).

Personally, I was always against this approach and still am. Realistically, every large codebase faces flaky tests every now and than, but promptly addressing those is the key to keep codebases healthy. I do have one (somewhat radical idea) though: what if we remove all flaky tests from the test suites (isolate them in flakyXxxTest subfolders), establish clean baseline, and start working on them one by one?

ashking94 · 2025-04-22T04:38:53Z

@mch2 @msfroh @jainankitk @bugmakerrrrrr @ashking94 @cwperks @reta Tagging some other maintainers here. This is a pretty consequential change in terms of how we're enforcing code quality. I don't love the idea of new tests being added that fail quite often but the retry logic kicks in so it won't cause any friction and we'll never need to look at fixing it (*). At the same time, in practice we are introducing lots of flakiness and are just retrying them manually anyway. Overall, I'm sort of neutral about this change given the realities of the situation, but if someone else thinks this is the right way to go I wouldn't block it.

While this may reduce the PR build failures for near term, the retry would hide the underneath lurking issues/flaky tests. Also, with time, we may end up in the same ratio of failed builds if flaky tests are added unchecked. I also like the idea of adding of running new tests for higher iterations as mentioned in #17974. At the same time, I think it may be okay to go ahead with this but with a plan to undo this change in near future after implementing #17974.

prudhvigodithi · 2025-04-22T16:23:22Z

Thanks for the feedback. With the known list we can always update the includeClasses.add section to only re-try the flaky ones. While the effort goes in parallel to fix and reduce the flaky tests, I'm fine to update the PR to have the retries for only the known flaky tests by using includeClasses.add section rather than a global retry.

Also today with every Gradle Check failure irrespective if the test passed on re-try, the data will be ingested to the OpenSearch Metrics Cluster so flaky tests info is not missed.

Example for the build https://build.ci.opensearch.org/job/gradle-check/56861/ even though the CI's would be green as the one of the test passed on re-try, the dashboard will have the failure information.

Thanks

mch2 · 2025-04-22T16:48:45Z

@mch2 @msfroh @jainankitk @bugmakerrrrrr @ashking94 @cwperks @reta Tagging some other maintainers here. This is a pretty consequential change in terms of how we're enforcing code quality. I don't love the idea of new tests being added that fail quite often but the retry logic kicks in so it won't cause any friction and we'll never need to look at fixing it (*).

Agree with whats been said so far, lets not sweep more under the rug. I'd also rather we not make a habit of continuously adding to this list and make a focused effort to drive down the count.

I do have one (somewhat radical idea) though: what if we remove all flaky tests from the test suites (isolate them in flakyXxxTest subfolders), establish clean baseline, and start working on them one by one?

I like the intent here in creating a clear list to address, but I think we could use the existing list here in build.gradle for this rather than physically moving them? With that said the tests on this list have been left unaddressed for a while because the retries hide the pain.

andrross · 2025-04-22T18:21:30Z

I do have one (somewhat radical idea) though: what if we remove all flaky tests from the test suites (isolate them in flakyXxxTest subfolders), establish clean baseline, and start working on them one by one?

I like the intent here in creating a clear list to address, but I think we could use the existing list here in build.gradle for this rather than physically moving them? With that said the tests on this list have been left unaddressed for a while because the retries hide the pain.

Yeah, the original list was intended to be pretty much this idea. Unfortunately the "clean baseline" didn't stay so clean...

owaiskazi19 · 2025-04-22T18:28:37Z

remove all flaky tests from the test suites (isolate them in flakyXxxTest subfolders)

Instead of global retries we can only have retries for the tests present in the above subfolder. This can give us the best of both worlds - stable CI for known issues while maintaining visibility of new flakiness. Retries can be treated as a temporary solution until we start clearing the list from flakyXxxTest.

reta · 2025-04-22T22:10:16Z

I like the intent here in creating a clear list to address, but I think we could use the existing list here in build.gradle for this rather than physically moving them? With that said the tests on this list have been left unaddressed for a while because the retries hide the pain.

Thanks, that's exactly the issue @mch2 - the list is growing and flakyness is unaddressed (we have tried many times to get back on track as @andrross said, but no luck). With such a radical solution, we revert the change(s) as fast as possible once flakyness arises.

prudhvigodithi · 2025-04-23T19:39:17Z

what if we remove all flaky tests from the test suites (isolate them in flakyXxxTest subfolders),

@andrross Should we do a similar approach https://github.com/opensearch-project/OpenSearch/pull/18057/files by adding AwaitsFix annotation and start clean while the flaky tests are fixed in parallel?

Because I see some valid points in not adding in includeClasses.add section.

andrross · 2025-04-23T19:58:48Z

@prudhvigodithi We should definitely make use of @AwaitsFix where appropriate. You can see the code base is currently littered with it:

% git grep '@AwaitsFix' | wc -l
55

However, we have no real mechanism to currently burn down the list of ignored tests. Also I believe the automation will automatically close the flaky test issues eventually once they stop failing due to being skipped.

In the case of #18057 I don't really have a concern with muting it because it is a new test. However, we have cases like SimpleSearchIT where a recent change introduces flakiness, and I would be a little more concerned about ignoring a general test like that because it might lead to introducing even more bugs.

prudhvigodithi · 2025-04-23T20:56:23Z

Also I believe the automation will automatically close the flaky test issues eventually once they stop failing due to being skipped.

Got it. The automation will not re-open if closed and seen no new flaky failures. The close action today is on the maintainer or the assignee once the test is fixed.

So based on the discussion, following are some options.

Go with @AwaitsFix for flaky tests so that contribution PR's are not blocked and have an plan to fix the flaky tests.
Mix of @AwaitsFix and retry for some tests like SimpleSearchIT with includeClasses.add section, still having a plan to fix the flaky tests.

Also looks to me its possible to add a workflow to re-try the tests (new or modified) few times before we merge the PR. This is for #17974 to have some mechanism to early detect the failures. At high level what I was trying to do.

for i in {1..500}; do
  SEED=$(od -vAn -N8 -tx8 < /dev/urandom | tr -d ' ' | tr 'a-f' 'A-F')
  SEED=$(printf "%016s" "$SEED")
  ./gradlew ':server:test' --tests "org.opensearch.index.query.MultiMatchQueryBuilderTests.testToQueryBoost"   -Dtests.seed=$SEED 
done

From the open PR find the new or modified tests.
Find the Gradle module of the test.
Run the entire test class or just the modified or new test methods.

We can make it more robust extending this as a plugin and run each test in parallel on multiple containers ( or ec2's).

andrross · 2025-04-24T16:50:28Z

So based on the discussion, following are some options.

Go with @AwaitsFix for flaky tests so that contribution PR's are not blocked and have an plan to fix the flaky tests.

Mix of @AwaitsFix and retry for some tests like SimpleSearchIT with includeClasses.add section, still having a plan to fix the flaky tests.

@prudhvigodithi Being pretty aggressive with muting failing tests with @AwaitsFix is good for newly introduced tests or tests of a specific feature that has recently been changed. Assuming the contributor is still engaged on the feature in question we can follow up with them to investigate the fix.

For issues like SimpleSearchIT it's a little more complex. That test is very broad and muting it might meaningfully reduce test coverage across big parts of the code base. We want to be a little more careful in those cases. For this case, we had a deterministic repo and I was able to use git bisect to pin it down to specific commit, which meant I could ping the contributor to fix it. We could explore automation or tooling for making that process faster/easier. However, I don't think we should normalize expanding the list of tests to retry for all the reasons mentioned above.

prudhvigodithi changed the title ~~[Draft] Add more flaky tests to Gradle retry logic~~ [Draft] Onboard the known flaky tests to the Gradle retry logic Apr 14, 2025

prudhvigodithi force-pushed the agent branch from 7d01417 to 0a0f107 Compare April 14, 2025 23:00

prudhvigodithi force-pushed the agent branch from 0a0f107 to bfdd7fb Compare April 14, 2025 23:58

This was referenced Apr 15, 2025

[AUTOCUT] Gradle Check Flaky Test Report for ReactorNetty4StreamingStressIT #15840

Closed

[AUTOCUT] Gradle Check Flaky Test Report for RangeAggregatorTests #17928

Closed

opensearch-ci-bot mentioned this pull request Apr 15, 2025

[AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

Open

prudhvigodithi force-pushed the agent branch from bfdd7fb to 3da6877 Compare April 15, 2025 17:26

prudhvigodithi force-pushed the agent branch 3 times, most recently from b49fd88 to 40c47d6 Compare April 15, 2025 18:10

prudhvigodithi force-pushed the agent branch 2 times, most recently from 97347a1 to 87b8b8d Compare April 15, 2025 20:44

andrross reviewed Apr 15, 2025

View reviewed changes

prudhvigodithi force-pushed the agent branch 2 times, most recently from 1ce3fe3 to 6fd5c74 Compare April 15, 2025 21:32

prudhvigodithi force-pushed the agent branch from 6fd5c74 to 572728a Compare April 15, 2025 23:49

opensearch-ci-bot mentioned this pull request Apr 16, 2025

[AUTOCUT] Gradle Check Flaky Test Report for DedicatedClusterSnapshotRestoreIT #15806

Open

prudhvigodithi force-pushed the agent branch 3 times, most recently from 1b38b33 to 5563172 Compare April 18, 2025 18:22

Upstream fetch

d9760c8

Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi force-pushed the agent branch from 5563172 to d9760c8 Compare April 18, 2025 18:22

Merge remote-tracking branch 'upstream/main' into agent

3a74dc5

prudhvigodithi added this to Performance Roadmap Apr 21, 2025

github-project-automation bot moved this to Todo in Performance Roadmap Apr 21, 2025

prudhvigodithi changed the title ~~Update the gradle retry~~ Retry All Tests with a Reduced Retry Limit of 1 Apr 21, 2025

prudhvigodithi mentioned this pull request Apr 22, 2025

Visit the subquery builder recursively #17953

Merged

3 tasks

opensearch-ci-bot mentioned this pull request Apr 23, 2025

[AUTOCUT] Gradle Check Flaky Test Report for QueueResizableOpenSearchThreadPoolExecutorTests #14297

Open

opensearch-ci-bot mentioned this pull request Apr 28, 2025

[AUTOCUT] Gradle Check Flaky Test Report for AzureBlobStoreRepositoryTests #14291

Open

Retry All Tests with a Reduced Retry Limit of 1 #17939

Are you sure you want to change the base?

Retry All Tests with a Reduced Retry Limit of 1 #17939

Conversation

prudhvigodithi commented Apr 14, 2025 • edited Loading

Description

Related Issues

Check List

prudhvigodithi commented Apr 14, 2025

github-actions bot commented Apr 14, 2025

github-actions bot commented Apr 14, 2025

github-actions bot commented Apr 15, 2025

github-actions bot commented Apr 15, 2025

github-actions bot commented Apr 15, 2025

andrross Apr 15, 2025

Choose a reason for hiding this comment

prudhvigodithi Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

prudhvigodithi Apr 16, 2025

Choose a reason for hiding this comment

prudhvigodithi Apr 17, 2025

Choose a reason for hiding this comment

andrross Apr 18, 2025

Choose a reason for hiding this comment

prudhvigodithi Apr 18, 2025

Choose a reason for hiding this comment

andrross Apr 18, 2025

Choose a reason for hiding this comment

prudhvigodithi Apr 20, 2025

Choose a reason for hiding this comment

github-actions bot commented Apr 15, 2025

github-actions bot commented Apr 15, 2025

github-actions bot commented Apr 16, 2025

codecov bot commented Apr 16, 2025 • edited Loading

Codecov Report

github-actions bot commented Apr 16, 2025

github-actions bot commented Apr 16, 2025

github-actions bot commented Apr 16, 2025

prudhvigodithi commented Apr 18, 2025

github-actions bot commented Apr 18, 2025

github-actions bot commented Apr 18, 2025

github-actions bot commented Apr 18, 2025

github-actions bot commented Apr 19, 2025

github-actions bot commented Apr 20, 2025

andrross commented Apr 21, 2025

reta commented Apr 22, 2025 • edited Loading

ashking94 commented Apr 22, 2025 • edited Loading

prudhvigodithi commented Apr 22, 2025

mch2 commented Apr 22, 2025

andrross commented Apr 22, 2025

owaiskazi19 commented Apr 22, 2025

reta commented Apr 22, 2025

prudhvigodithi commented Apr 23, 2025

andrross commented Apr 23, 2025 • edited Loading

prudhvigodithi commented Apr 23, 2025 • edited Loading

andrross commented Apr 24, 2025

prudhvigodithi commented Apr 14, 2025 •

edited

Loading

prudhvigodithi Apr 16, 2025 •

edited

Loading

codecov bot commented Apr 16, 2025 •

edited

Loading

reta commented Apr 22, 2025 •

edited

Loading

ashking94 commented Apr 22, 2025 •

edited

Loading

andrross commented Apr 23, 2025 •

edited

Loading

prudhvigodithi commented Apr 23, 2025 •

edited

Loading