-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Skip search shards with INDEX_REFRESH_BLOCK #129132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Skip search shards with INDEX_REFRESH_BLOCK #129132
Conversation
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
998cdd5
to
1ecc447
Compare
e233cc7
to
17706e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. The search part should be reviewed by the ES Search team.
server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
Outdated
Show resolved
Hide resolved
var addIndexBlockRequest = new AddIndexBlockRequest(IndexMetadata.APIBlock.REFRESH, "test"); | ||
client().execute(TransportAddIndexBlockAction.TYPE, addIndexBlockRequest).actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refresh block should be added automatically to newly created indices as long as they have replicas and the "use refresh block" setting is enabled in the node setting. We should remove the ability to add the refresh block through the Add Index Block API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @tlrx!
I was hoping to test this change outside of the context of Serverless. But I agree it's not appropriate to add the refresh block to that API for testing purposes only, so I will see if I can construct the scenario in some other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I was able to get the setup I was looking for by adding the block directly to cluster state in the tests.
assertHitCount(prepareSearch().setQuery(QueryBuilders.matchAllQuery()), 0); | ||
} | ||
|
||
public void testSearchMultipleIndicesEachWithAnIndexRefreshBlock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be folded into a single test, where one or more indices are randomly created, most of some with replicas but other without replicas, and then allocate zero or more search shards and check the expected results, finally assigning all search shards and check the results again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've folded this into a single test with some additional randomization. My goal is to keep the integration tests in the Serverless PR, so I'll add the test scenario you're proposing there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass on the search related side of things and left a few questions and comments.
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
Outdated
Show resolved
Hide resolved
@@ -186,6 +186,12 @@ private void runCoordinatorRewritePhase() { | |||
assert assertSearchCoordinationThread(); | |||
final List<SearchShardIterator> matchedShardLevelRequests = new ArrayList<>(); | |||
for (SearchShardIterator searchShardIterator : shardsIts) { | |||
if (searchShardIterator.prefiltered() == false && searchShardIterator.skip()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand this is what actually skips the shards being searched. Why is this done here in the CanMatchPreFilterSearchPhase? My understanding is that we don't always use this phase, e.g. "shouldPreFilterSearchShards" returns false
for all searches that are not QUERY_THEN_FETCH (and other cases). Wouldn't we still run into 503s for those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what really skips the shards for search is this code:
elasticsearch/server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java
Lines 123 to 129 in 01b6de3
for (final SearchShardIterator iterator : shardsIts) { | |
if (iterator.skip()) { | |
skipped++; | |
} else { | |
iterators.add(iterator); | |
} | |
} |
I added this check to avoid running can-match on shards that are skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got my attention too. Is it a necessary change for this PR? I was trying to figure out how it ties to the refresh block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benchaplin thanks for the last changes, I left one more small comment but nothing that should block this PR from my end. I didn't take a closer look at the tests since @tlrx seems to have made a few suggestions there, LGTM from my end though.
server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
Show resolved
Hide resolved
…luster state in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I know nothing about search shard iterators :)
ClusterService clusterService = internalCluster().getInstance(ClusterService.class, dataNode.getName()); | ||
ClusterState currentState = clusterService.state(); | ||
ClusterState newState = ClusterState.builder(currentState).blocks(blocksBuilder).build(); | ||
setState(clusterService, newState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not intended to be used in integration test as it overrides the current data node cluster state.
For testing the INDEX_REFRESH_BLOCK I think it makes sense to only have unit tests in stateful elasticsearch.
ShardId shardId, | ||
List<ShardRouting> shards, | ||
OriginalIndices originalIndices, | ||
boolean skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip is mutable, did we need to add a new constructor variant to mutate it?
@@ -186,6 +186,12 @@ private void runCoordinatorRewritePhase() { | |||
assert assertSearchCoordinationThread(); | |||
final List<SearchShardIterator> matchedShardLevelRequests = new ArrayList<>(); | |||
for (SearchShardIterator searchShardIterator : shardsIts) { | |||
if (searchShardIterator.prefiltered() == false && searchShardIterator.skip()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got my attention too. Is it a necessary change for this PR? I was trying to figure out how it ties to the refresh block.
@@ -83,7 +93,6 @@ public SearchShardIterator( | |||
assert searchContextKeepAlive == null || searchContextId != null; | |||
this.prefiltered = prefiltered; | |||
this.skip = skip; | |||
assert skip == false || prefiltered : "only prefiltered shards are skip-able"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to remember what prefiltered was all about. I think it was only for bw comp, to support two variants of the search shards API, the new one that supports coordinator rewrite, and the other one that does not.
Looking at the code, i wonder if we can remove prefiltered actually (as a follow up?) in main.
But my actual question is: why remove this assert?
@@ -1338,6 +1345,33 @@ private void executeSearch( | |||
); | |||
} | |||
|
|||
/** | |||
* Determines if only one (or zero) search shard iterators will be searched. | |||
* (At this point, iterators may be marked as skipped due to index level blockers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace blockers with blocks?
@@ -234,6 +237,10 @@ private Map<String, OriginalIndices> buildPerIndexOriginalIndices( | |||
for (String index : indices) { | |||
if (hasBlocks) { | |||
blocks.indexBlockedRaiseException(projectId, ClusterBlockLevel.READ, index); | |||
if (blocks.hasIndexBlock(projectState.projectId(), index, IndexMetadata.INDEX_REFRESH_BLOCK)) { | |||
res.put(index, SKIPPED_INDICES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are doing the filtering in the right place. It is a bit counter intuitive that we would resolve the shards given the indices and the skip some of them. Can we not filter the indices to start with? Maybe that removes the need to use that SKIPPED_INDICES marker thing too :)
Do we need to check for this block only in the search API?
By the way, something probably needs to happen in ES|QL too around this (does not need to be in this PR, but I am raising the need to track that).
#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.