[AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

opensearch-ci-bot · 2024-12-16T12:40:28Z

Flaky Test Report for `SimpleSearchIT`

Noticed the SimpleSearchIT has some flaky, failing tests that failed during post-merge actions.

Details

Git Reference	Merged Pull Request	Build Details	Test Name
`069c2d8`	18097	57263	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`0d1bb9b`	18035	56909	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`15b702d`	18042	56934	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`18a3b75`	17796	56046	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`3a7d8fa`	17976	56634	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`436038d`	18041	56989	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`59a6efd`	18036	57036	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`8cc58b5`	18058	57023	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`967eee1`	17216	56041	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`b842e48`	17902	56242	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`c44d230`	17749	56412	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`e61b6ab`	18005	56703	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`e7ed33f`	17978	57115	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`693c788`	17921	56337	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`caf5d71`	18039	56924	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`d18982c`	17912	56693	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}`
`d29e95c`	17882	56063	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}`
`d37cc9b`	16831	51449	`org.opensearch.search.simple.SimpleSearchIT.classMethod`
`eba12fa`	17370	56995	`org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"false"}}`

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the SimpleSearchIT class are:

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

The text was updated successfully, but these errors were encountered:

sandeshkr419 · 2024-12-18T17:29:22Z

Looking into build logs:

[2024-12-16T08:20:25,037][ERROR][o.o.i.s.IndexShard       ] [node_t0] [test][0] Error Fetching SegmentInfos and latest checkpoint
org.apache.lucene.store.AlreadyClosedException: engine is closed
	at org.opensearch.index.shard.IndexShard.getEngine(IndexShard.java:3728) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.getSegmentInfosSnapshot(IndexShard.java:5367) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.getLatestSegmentInfosAndCheckpoint(IndexShard.java:1728) [main/:?]
	at org.opensearch.index.shard.IndexShard.updateReplicationCheckpoint(IndexShard.java:4868) [main/:?]
	at org.opensearch.index.shard.IndexShard$ReplicationCheckpointUpdater.afterRefresh(IndexShard.java:4862) [main/:?]
	at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) [lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) [lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) [lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1771) [main/:?]
	at org.opensearch.index.engine.InternalEngine.flush(InternalEngine.java:1886) [main/:?]
	at org.opensearch.index.engine.Engine.flush(Engine.java:1214) [main/:?]
	at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:2002) [main/:?]
	at org.opensearch.index.shard.IndexShard.close(IndexShard.java:2056) [main/:?]
	at org.opensearch.index.IndexService.closeShard(IndexService.java:805) [main/:?]
	at org.opensearch.index.IndexService.removeShard(IndexService.java:781) [main/:?]
	at org.opensearch.index.IndexService.close(IndexService.java:495) [main/:?]
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1160) [main/:?]
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:443) [main/:?]
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:284) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:637) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:623) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:589) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:510) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:932) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [main/:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1575) [?:?]
[2024-12-16T08:20:25,057][WARN ][o.o.c.r.a.AllocationService] [node_t0] Falling back to single shard assignment since batch mode

From initial look, it looks more like OpenSearch process failure or corrupted index failure.

bowenlan-amzn · 2025-04-14T17:04:20Z

SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize is flaky

prudhvigodithi · 2025-04-17T00:23:15Z

I was able to re-produce this with a seed

./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT"  -Dtests.seed=7560B20A7AC502D

@getsaurabh02 @andrross

andrross · 2025-04-18T20:21:57Z

This is another case that looks like it was introduced by #17769. FYI @harshavamsi

prudhvigodithi · 2025-04-25T01:03:05Z

Coming from the Dashboard seen a new failure in #18069 (reference), not sure if the impacted PR has rebased the fix.

prudhvigodithi · 2025-04-28T18:45:12Z

Seen a failure in post merge action, coming from 069c2d8. Build failure log https://build.ci.opensearch.org/job/gradle-check/57263/.
Able to re-produce with ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.seed=9D84159A50CA98F1

harshavamsi · 2025-04-28T18:49:30Z

#18075 -- the fix has not been backported to 3.0 yet, that's why we are still seeing flakiness

andrross · 2025-05-07T21:02:24Z

I just saw what looks like a new kind of failure here:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.method="testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=F888A6BE313526C9 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=se-SE -Dtests.timezone=Etc/GMT+0 -Druntime.java=21

SimpleSearchIT > testSimpleTerminateAfterTrackTotalHitsUpToSize {p0={"search.concurrent_segment_search.enabled":"true"}} FAILED
    java.lang.AssertionError: expected:<GREATER_THAN_OR_EQUAL_TO> but was:<EQUAL_TO>
        at __randomizedtesting.SeedInfo.seed([F888A6BE313526C9:161A85B12E222BF5]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.junit.Assert.assertEquals(Assert.java:146)
        at org.opensearch.search.simple.SimpleSearchIT.doTestSimpleTerminateAfterTrackTotalHitsUpTo(SimpleSearchIT.java:435)
        at org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterTrackTotalHitsUpToSize(SimpleSearchIT.java:443)

@prudhvigodithi Any chance this is related to your recent change?

prudhvigodithi · 2025-05-07T22:41:10Z

@prudhvigodithi Any chance this is related to your recent change?

I see what is happening with a range query (when approximation was not introduced) its always "relation": "gte". Through approximation the query execution is optimized to count more efficiently and since the count is exactly 5 (matching the trackTotalHitsUpTo value), the relation is set to "eq"

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/big5/_search -d '{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2023-01-01T00:00:00",
        "lt": "2023-01-03T00:00:00"
      }
    }
  },
  "track_total_hits": 5,
  "size": 4
}'

This is not a bug but the behavior of approximation as it only counts and returns the documents as required, we see this issue after this PR #18189 because the hack of adding +1 is removed. With +1 in the code and with approximation now the total return docs is 6, hence the tests are green with "relation": "gte".

prudhvigodithi · 2025-05-07T22:47:48Z

From my previous comment #16851 (comment)

 curl -XPOST -H'Content-type: application/json' -kv localhost:9200/big5/_search -d '{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2023-01-01T00:00:00",
        "lt": "2023-01-03T00:00:00"
      }
    }
  },
  "track_total_hits": 5,
  "size": 6
}' | jq '.'

The searchResponse.getHits().getTotalHits().relation() depends on track_total_hits value, even with approximation when size is greater than track_total_hits, the relation": "gte" which is expected because with approximation the logic collects number of docs max of track_total_hits or size.

Let me see if I can update the tests based on this behavior or just add +1 to the code.
Thanks
@getsaurabh02 @harshavamsi

harshavamsi · 2025-05-07T22:50:48Z

@prudhvigodithi Let's update the test instead of updating the approximation. Approximation is default now and any test using MatchAll or RangeQuery will need to be updated to reflect that.

prudhvigodithi · 2025-05-07T22:53:55Z

Yes I'm more inclined to update the test, but going over the PR #18018 and comments to see why just +1 was added.
Thanks

bowenlan-amzn · 2025-05-07T23:59:04Z

Note the assertion is the expected behavior, the relation should always be GREATER_THAN_OR_EQUAL_TO. Because there are more hits than 5.

The reason why +1 works is because the ensures the scorer is larger than size by 1, so the TotalHitCountCollector can flip the relation from equal to GREATER_THAN_OR_EQUAL_TO.

@prudhvigodithi your last change make sure the hits.total is short cut for match all query. But not range query, which called out #18189 (comment)
I was suggesting to also short cut for approximate range query. #18189 (comment)

But I know I feel +1 is a better approach in general, also doesn't add any cost anyway.

bowenlan-amzn · 2025-05-08T00:01:48Z

On the other hand, I thought range query theoretically should hit this https://github.com/apache/lucene/blob/fafd6af004e0c39582043b797555d6eeb9aa7638/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java#L47-L52
And use that shortcut to give the hits.total, weight.count(context)
But somehow the test is not hitting that, this worth to check more @prudhvigodithi

bowenlan-amzn · 2025-05-08T00:05:13Z

We should refactor this test by separating test size < track_total_size and >, then it won't be a flaky one, but consistent failing.

prudhvigodithi · 2025-05-08T03:45:27Z

From my previous comments #16851 (comment) #16851 (comment) I was able to get the failure with E6CD5109E5DA1B20 seed as well, this time even with size is 4, which is less that the trackTotalHitsUpTo (which is 5). What is expected is the relation to be "relation": "eq", but got "relation": "gte".

So the test is not flaky here but the visit methods part of ApproximatePointRangeQuery does not exactly cap the size defined on the query (or when the default is 10k), there are sometimes a little extra docs collected part of the BKD walk hence the above test with size set to 4 and trackTotalHitsUpTo is 5, I can see "relation": "gte".

Having this I would go for adding +1 which consistently gives "relation": "gte" and honors the existing behavior (similar to before approximation). I have updated the PR to add the +1 #18235.

Thanks

prudhvigodithi · 2025-05-08T03:48:06Z

Thanks @bowenlan-amzn yes today even before approximation the range queries does not go short cut, only the match_all queries does, hence in my PR #18189 I have fixed the match_all behavior to be the same with or without approximation. So it should not only add some more performance improvement (early termination + the early competitive iterator benefit) but also will get back the same behavior.

Next I will take a look at short cut of approximate range query.

opensearch-ci-bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run untriaged labels Dec 16, 2024

sandeshkr419 removed the untriaged label Dec 18, 2024

kkewwei mentioned this issue Apr 17, 2025

fix simultaneously creating a snapshot and updating the repository can potentially trigger an infinite loop #17532

Merged

3 tasks

andrross mentioned this issue Apr 22, 2025

Disable approximation when there is terminate after #18018

Merged

3 tasks

gargmanik13 mentioned this issue Apr 23, 2025

Enabled default throttling for all tasks submitted to cluster manager #17711

Merged

3 tasks

andrross mentioned this issue Apr 23, 2025

Retry All Tests with a Reduced Retry Limit of 1 #17939

Open

3 tasks

andrross closed this as completed in #18018 Apr 24, 2025

opensearch-ci-bot reopened this Apr 25, 2025

github-actions bot added the untriaged label Apr 25, 2025

prudhvigodithi closed this as completed Apr 25, 2025

opensearch-ci-bot reopened this Apr 28, 2025

prudhvigodithi mentioned this issue May 6, 2025

Improve performance for approximated match_all sort queries #18206

Closed

andrross mentioned this issue May 7, 2025

[META] Eliminate flakiness in :server:internalClusterTest #18108

Open

prudhvigodithi mentioned this issue May 7, 2025

Fix SimpleSearchIT.doTestSimpleTerminateAfterTrackTotalHitsUpTo flaky test #18235

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

[AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

opensearch-ci-bot commented Dec 16, 2024 •

edited

Loading

sandeshkr419 commented Dec 18, 2024

bowenlan-amzn commented Apr 14, 2025

prudhvigodithi commented Apr 17, 2025

andrross commented Apr 18, 2025

prudhvigodithi commented Apr 25, 2025 •

edited

Loading

prudhvigodithi commented Apr 28, 2025 •

edited

Loading

harshavamsi commented Apr 28, 2025

andrross commented May 7, 2025

prudhvigodithi commented May 7, 2025

prudhvigodithi commented May 7, 2025

harshavamsi commented May 7, 2025

prudhvigodithi commented May 7, 2025 •

edited

Loading

bowenlan-amzn commented May 7, 2025

bowenlan-amzn commented May 8, 2025

bowenlan-amzn commented May 8, 2025

prudhvigodithi commented May 8, 2025 •

edited

Loading

prudhvigodithi commented May 8, 2025 •

edited

Loading

[AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

[AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

Comments

opensearch-ci-bot commented Dec 16, 2024 • edited Loading

Flaky Test Report for SimpleSearchIT

Details

sandeshkr419 commented Dec 18, 2024

bowenlan-amzn commented Apr 14, 2025

prudhvigodithi commented Apr 17, 2025

andrross commented Apr 18, 2025

prudhvigodithi commented Apr 25, 2025 • edited Loading

prudhvigodithi commented Apr 28, 2025 • edited Loading

harshavamsi commented Apr 28, 2025

andrross commented May 7, 2025

prudhvigodithi commented May 7, 2025

prudhvigodithi commented May 7, 2025

harshavamsi commented May 7, 2025

prudhvigodithi commented May 7, 2025 • edited Loading

bowenlan-amzn commented May 7, 2025

bowenlan-amzn commented May 8, 2025

bowenlan-amzn commented May 8, 2025

prudhvigodithi commented May 8, 2025 • edited Loading

prudhvigodithi commented May 8, 2025 • edited Loading

opensearch-ci-bot commented Dec 16, 2024 •

edited

Loading

Flaky Test Report for `SimpleSearchIT`

prudhvigodithi commented Apr 25, 2025 •

edited

Loading

prudhvigodithi commented Apr 28, 2025 •

edited

Loading

prudhvigodithi commented May 7, 2025 •

edited

Loading

prudhvigodithi commented May 8, 2025 •

edited

Loading

prudhvigodithi commented May 8, 2025 •

edited

Loading