Skip search shards with INDEX_REFRESH_BLOCK #129132

benchaplin · 2025-06-09T05:56:43Z

#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.

elasticsearchmachine · 2025-06-09T05:57:12Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

tlrx

Left some comments. The search part should be reviewed by the ES Search team.

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

tlrx · 2025-06-12T09:13:29Z

server/src/internalClusterTest/java/org/elasticsearch/search/SearchWithIndexBlocksIT.java

+        var addIndexBlockRequest = new AddIndexBlockRequest(IndexMetadata.APIBlock.REFRESH, "test");
+        client().execute(TransportAddIndexBlockAction.TYPE, addIndexBlockRequest).actionGet();


The refresh block should be added automatically to newly created indices as long as they have replicas and the "use refresh block" setting is enabled in the node setting. We should remove the ability to add the refresh block through the Add Index Block API.

Thanks for taking a look @tlrx!

I was hoping to test this change outside of the context of Serverless. But I agree it's not appropriate to add the refresh block to that API for testing purposes only, so I will see if I can construct the scenario in some other way.

Alright, I was able to get the setup I was looking for by adding the block directly to cluster state in the tests.

tlrx · 2025-06-12T09:16:17Z

server/src/internalClusterTest/java/org/elasticsearch/search/SearchWithIndexBlocksIT.java

+        assertHitCount(prepareSearch().setQuery(QueryBuilders.matchAllQuery()), 0);
+    }
+
+    public void testSearchMultipleIndicesEachWithAnIndexRefreshBlock() {


I think this could be folded into a single test, where one or more indices are randomly created, most of some with replicas but other without replicas, and then allocate zero or more search shards and check the expected results, finally assigning all search shards and check the results again.

I've folded this into a single test with some additional randomization. My goal is to keep the integration tests in the Serverless PR, so I'll add the test scenario you're proposing there.

cbuescher

I did a first pass on the search related side of things and left a few questions and comments.

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

cbuescher · 2025-06-18T18:23:08Z

server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java

@@ -186,6 +186,12 @@ private void runCoordinatorRewritePhase() {
        assert assertSearchCoordinationThread();
        final List<SearchShardIterator> matchedShardLevelRequests = new ArrayList<>();
        for (SearchShardIterator searchShardIterator : shardsIts) {
+            if (searchShardIterator.prefiltered() == false && searchShardIterator.skip()) {


As far as I understand this is what actually skips the shards being searched. Why is this done here in the CanMatchPreFilterSearchPhase? My understanding is that we don't always use this phase, e.g. "shouldPreFilterSearchShards" returns false for all searches that are not QUERY_THEN_FETCH (and other cases). Wouldn't we still run into 503s for those cases?

Actually, what really skips the shards for search is this code:

elasticsearch/server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

Lines 123 to 129 in 01b6de3

for (final SearchShardIterator iterator : shardsIts) {

if (iterator.skip()) {

skipped++;

} else {

iterators.add(iterator);

}

}

.

I added this check to avoid running can-match on shards that are skipped.

This got my attention too. Is it a necessary change for this PR? I was trying to figure out how it ties to the refresh block.

cbuescher

@benchaplin thanks for the last changes, I left one more small comment but nothing that should block this PR from my end. I didn't take a closer look at the tests since @tlrx seems to have made a few suggestions there, LGTM from my end though.

server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

…luster state in tests

tlrx

LGTM but I know nothing about search shard iterators :)

tlrx · 2025-07-07T09:18:47Z

server/src/internalClusterTest/java/org/elasticsearch/search/SearchWithIndexBlocksIT.java

+            ClusterService clusterService = internalCluster().getInstance(ClusterService.class, dataNode.getName());
+            ClusterState currentState = clusterService.state();
+            ClusterState newState = ClusterState.builder(currentState).blocks(blocksBuilder).build();
+            setState(clusterService, newState);


This method is not intended to be used in integration test as it overrides the current data node cluster state.

For testing the INDEX_REFRESH_BLOCK I think it makes sense to only have unit tests in stateful elasticsearch.

javanna · 2025-07-10T12:18:55Z

server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java

+        ShardId shardId,
+        List<ShardRouting> shards,
+        OriginalIndices originalIndices,
+        boolean skip


skip is mutable, did we need to add a new constructor variant to mutate it?

javanna · 2025-07-10T15:15:12Z

server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java

@@ -186,6 +186,12 @@ private void runCoordinatorRewritePhase() {
        assert assertSearchCoordinationThread();
        final List<SearchShardIterator> matchedShardLevelRequests = new ArrayList<>();
        for (SearchShardIterator searchShardIterator : shardsIts) {
+            if (searchShardIterator.prefiltered() == false && searchShardIterator.skip()) {


This got my attention too. Is it a necessary change for this PR? I was trying to figure out how it ties to the refresh block.

javanna · 2025-07-10T15:15:44Z

server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java

@@ -83,7 +93,6 @@ public SearchShardIterator(
        assert searchContextKeepAlive == null || searchContextId != null;
        this.prefiltered = prefiltered;
        this.skip = skip;
-        assert skip == false || prefiltered : "only prefiltered shards are skip-able";


I am trying to remember what prefiltered was all about. I think it was only for bw comp, to support two variants of the search shards API, the new one that supports coordinator rewrite, and the other one that does not.

Looking at the code, i wonder if we can remove prefiltered actually (as a follow up?) in main.

But my actual question is: why remove this assert?

javanna · 2025-07-10T15:17:40Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

@@ -1338,6 +1345,33 @@ private void executeSearch(
        );
    }

+    /**
+     * Determines if only one (or zero) search shard iterators will be searched.
+     * (At this point, iterators may be marked as skipped due to index level blockers).


replace blockers with blocks?

javanna · 2025-07-10T15:26:11Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

@@ -234,6 +237,10 @@ private Map<String, OriginalIndices> buildPerIndexOriginalIndices(
        for (String index : indices) {
            if (hasBlocks) {
                blocks.indexBlockedRaiseException(projectId, ClusterBlockLevel.READ, index);
+                if (blocks.hasIndexBlock(projectState.projectId(), index, IndexMetadata.INDEX_REFRESH_BLOCK)) {
+                    res.put(index, SKIPPED_INDICES);


I wonder if we are doing the filtering in the right place. It is a bit counter intuitive that we would resolve the shards given the indices and the skip some of them. Can we not filter the indices to start with? Maybe that removes the need to use that SKIPPED_INDICES marker thing too :)

Do we need to check for this block only in the search API?

By the way, something probably needs to happen in ES|QL too around this (does not need to be in this PR, but I am raising the need to track that).

benchaplin added 5 commits June 4, 2025 19:00

Skip indices that have an index refresh block

2aa74e3

Merge branch 'main' into skip_search_shards_with_index_block

12b6b81

Construct the iterator skipped

9c705cd

Fix javadocs

1c75721

Add unit test

1ecc447

benchaplin requested a review from tlrx June 9, 2025 05:56

benchaplin added >non-issue Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.1.0 labels Jun 9, 2025

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jun 9, 2025

benchaplin requested a review from a team as a code owner June 9, 2025 14:39

github-actions bot deployed to docs-preview June 9, 2025 14:40 View deployment

benchaplin force-pushed the skip_search_shards_with_index_block branch from 998cdd5 to 1ecc447 Compare June 9, 2025 17:05

benchaplin removed the request for review from a team June 9, 2025 17:06

elasticsearchmachine and others added 9 commits June 9, 2025 17:13

[CI] Auto commit changes from spotless

cdb4bc1

Merge branch 'main' into skip_search_shards_with_index_block

b7ade2d

Merge branch 'main' into skip_search_shards_with_index_block

cd991c2

Merge branch 'main' into skip_search_shards_with_index_block

5f50d5c

Rewrite DFS if processing one or zero unskipped shard iterators

3f86fb8

Make can-match support already skipped shard iterators

0edc27c

Add IT for executing search and PIT against refresh blocked indices

9de6f06

Fix resource leak by using decRef assertion

be37bf6

[CI] Auto commit changes from spotless

17706e2

benchaplin requested review from a team as code owners June 11, 2025 21:03

benchaplin force-pushed the skip_search_shards_with_index_block branch from e233cc7 to 17706e2 Compare June 11, 2025 21:04

Merge branch 'main' into skip_search_shards_with_index_block

8759a07

tlrx reviewed Jun 12, 2025

View reviewed changes

cbuescher reviewed Jun 18, 2025

View reviewed changes

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

benchaplin added 2 commits June 30, 2025 22:12

Improve names of valid shard check method and variables

bf8a2be

Merge branch 'main' into skip_search_shards_with_index_block

8887609

cbuescher approved these changes Jul 1, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java Show resolved Hide resolved

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Show resolved Hide resolved

benchaplin added 5 commits July 1, 2025 12:35

Remove constructor used only in tests

0f0200a

Fix missed merge conflict

7689263

Remove ability to set INDEX_REFRESH_BLOCK from API, add directly to c…

598e906

…luster state in tests

Merge branch 'main' into skip_search_shards_with_index_block

4cdfbd0

Merge branch 'main' into skip_search_shards_with_index_block

76ecade

tlrx reviewed Jul 7, 2025

View reviewed changes

javanna reviewed Jul 10, 2025

View reviewed changes

		var addIndexBlockRequest = new AddIndexBlockRequest(IndexMetadata.APIBlock.REFRESH, "test");
		client().execute(TransportAddIndexBlockAction.TYPE, addIndexBlockRequest).actionGet();

	for (final SearchShardIterator iterator : shardsIts) {
	if (iterator.skip()) {
	skipped++;
	} else {
	iterators.add(iterator);
	}
	}

Skip search shards with INDEX_REFRESH_BLOCK #129132

Are you sure you want to change the base?

Skip search shards with INDEX_REFRESH_BLOCK #129132

Conversation

benchaplin commented Jun 9, 2025

Uh oh!

elasticsearchmachine commented Jun 9, 2025

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!