-
Notifications
You must be signed in to change notification settings - Fork 2k
Retry All Tests with a Reduced Retry Limit of 1 #17939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adding @mch2 @getsaurabh02 @andrross @reta @cwperks. |
❌ Gradle check result for 7d01417: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 0a0f107: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for bfdd7fb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 3da6877: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
b49fd88
to
40c47d6
Compare
❌ Gradle check result for 40c47d6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
97347a1
to
87b8b8d
Compare
build.gradle
Outdated
@@ -557,6 +557,57 @@ subprojects { | |||
includeClasses.add("org.opensearch.test.rest.ClientYamlTestSuiteIT") | |||
includeClasses.add("org.opensearch.upgrade.DetectEsInstallationTaskTests") | |||
includeClasses.add("org.opensearch.cluster.MinimumClusterManagerNodesIT") | |||
includeClasses.add("org.opensearch.indices.IndicesRequestCacheIT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously the plan to not automatically retry new flaky tests has not prevented the introduction of more flakiness. Should we just give up on this idea and automatically retry all tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open for thoughts to retry for all tests as we dont have to maintain a specific list and again remove them once fixed. I have created a META issue #17974 to surface the tests passing with re-try, with this we will have information and metrics on the tests that pass on re-try and take action accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removed the filter with includeClasses.add
in my latest commit, let me run the gradle check few times on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the list to remove the tests which I did not see fail from past 1 year and with new flaky tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly think it might be better to remove the filter, particularly if we're going to add to the list without a good reason why we can't fix the flakiness. I don't like increasing the max retries though. How about removing the filter and setting max retries to 1? Rare flakiness won't block the build because a retry will almost always succeed. More severe flakiness will continue to block the build though. @prudhvigodithi What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you on removing the list, to see the same benefits should we continue with 3 as maxRetries ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prudhvigodithi Can you remove the filter and set max retries to 1 in this PR and then run gradle check a few times to see what the success rate is?
The reason I'd like to keep retries to a minimum is so that retries only fix the really rare flakiness, but failures still happen for tests with high likelihood of failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have re-tried few times and got successful green CI in a row. Seen an unstable which indicates a re-try.
❌ Gradle check result for 87b8b8d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1ce3fe3
to
6fd5c74
Compare
❌ Gradle check result for 6fd5c74: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 572728a: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17939 +/- ##
============================================
+ Coverage 72.51% 73.31% +0.80%
- Complexity 67108 67743 +635
============================================
Files 5475 5478 +3
Lines 309916 310034 +118
Branches 45060 45066 +6
============================================
+ Hits 224725 227316 +2591
+ Misses 66895 64776 -2119
+ Partials 18296 17942 -354 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
❕ Gradle check result for 572728a: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
❕ Gradle check result for 572728a: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
❌ Gradle check result for 572728a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
I would like to see this merged as it would give some bandwidth and unblock the contribution PR's. I have a META #17974 created which talks about having more mechanisms to control the flaky tests. |
1b38b33
to
5563172
Compare
Signed-off-by: Prudhvi Godithi <[email protected]>
❌ Gradle check result for d9760c8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 3a74dc5: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
@prudhvigodithi You should probably update the PR description to what the latest approach here is ("Retry all tests, but reduce numbers of retries to 1") @mch2 @msfroh @jainankitk @bugmakerrrrrr @ashking94 @cwperks @reta Tagging some other maintainers here. This is a pretty consequential change in terms of how we're enforcing code quality. I don't love the idea of new tests being added that fail quite often but the retry logic kicks in so it won't cause any friction and we'll never need to look at fixing it (*). At the same time, in practice we are introducing lots of flakiness and are just retrying them manually anyway. Overall, I'm sort of neutral about this change given the realities of the situation, but if someone else thinks this is the right way to go I wouldn't block it. (*) If #17974 can mitigate this problem, I'd have no concerns with this change. But we haven't actually implemented anything yet. |
Personally, I was always against this approach and still am. Realistically, every large codebase faces flaky tests every now and than, but promptly addressing those is the key to keep codebases healthy. I do have one (somewhat radical idea) though: what if we remove all flaky tests from the test suites (isolate them in |
While this may reduce the PR build failures for near term, the retry would hide the underneath lurking issues/flaky tests. Also, with time, we may end up in the same ratio of failed builds if flaky tests are added unchecked. I also like the idea of adding of running new tests for higher iterations as mentioned in #17974. At the same time, I think it may be okay to go ahead with this but with a plan to undo this change in near future after implementing #17974. |
Thanks for the feedback. With the known list Also today with every Gradle Check failure irrespective if the test passed on re-try, the data will be ingested to the OpenSearch Metrics Cluster so flaky tests info is not missed. Example for the build https://build.ci.opensearch.org/job/gradle-check/56861/ even though the CI's would be green as the one of the test passed on re-try, the dashboard will have the failure information. Thanks |
Agree with whats been said so far, lets not sweep more under the rug. I'd also rather we not make a habit of continuously adding to this list and make a focused effort to drive down the count.
I like the intent here in creating a clear list to address, but I think we could use the existing list here in build.gradle for this rather than physically moving them? With that said the tests on this list have been left unaddressed for a while because the retries hide the pain. |
Yeah, the original list was intended to be pretty much this idea. Unfortunately the "clean baseline" didn't stay so clean... |
Instead of global retries we can only have retries for the tests present in the above subfolder. This can give us the best of both worlds - stable CI for known issues while maintaining visibility of new flakiness. Retries can be treated as a temporary solution until we start clearing the list from |
Thanks, that's exactly the issue @mch2 - the list is growing and flakyness is unaddressed (we have tried many times to get back on track as @andrross said, but no luck). With such a radical solution, we revert the change(s) as fast as possible once flakyness arises. |
@andrross Should we do a similar approach https://github.com/opensearch-project/OpenSearch/pull/18057/files by adding Because I see some valid points in not adding |
@prudhvigodithi We should definitely make use of
However, we have no real mechanism to currently burn down the list of ignored tests. Also I believe the automation will automatically close the flaky test issues eventually once they stop failing due to being skipped. In the case of #18057 I don't really have a concern with muting it because it is a new test. However, we have cases like SimpleSearchIT where a recent change introduces flakiness, and I would be a little more concerned about ignoring a general test like that because it might lead to introducing even more bugs. |
Got it. The automation will not re-open if closed and seen no new flaky failures. The close action today is on the maintainer or the assignee once the test is fixed. So based on the discussion, following are some options.
Also looks to me its possible to add a workflow to re-try the tests (new or modified) few times before we merge the PR. This is for #17974 to have some mechanism to early detect the failures. At high level what I was trying to do.
We can make it more robust extending this as a plugin and run each test in parallel on multiple containers ( or ec2's). |
@prudhvigodithi Being pretty aggressive with muting failing tests with For issues like |
Description
Currently, retrying specific Gradle tests is broken due to the transition to the Gradle Develocity plugin (coming from Update gradle config for develocity rebranding #13942). Although we’ve retained the Gradle Enterprise plugin (now rebranded as Develocity) from the original fork, we don’t use it since we lack the required license. As a result, the retry functionality tied to this plugin is not working as expected. I suspect this may be contributing to the recent instability. I’ve created a PR attempting to address the issue.Retry All Tests with a Reduced Retry Limit of 1 #17939 (comment)Using the OpenSearch Gradle Check Metrics Dashboard I was able to see the top 100 failing tests that are flaky since past one year.
This change enables Gradle to automatically retry known flaky tests, reducing the overall number of manual CI retries on open PRs. Since only maintainers can trigger retries, users are currently forced to push a new commit or reopen the PR, which this update helps avoid.
Closed the following flaky test issues as I have seen a PR linked with the fix: Will allow the automation to re-open and we can add them.
[AUTOCUT] Gradle Check Flaky Test Report for IndexingIT #14302 (comment)Related Issues
Part of #17974
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.