-
Notifications
You must be signed in to change notification settings - Fork 311
feat: trim fragment reuse index after remapping #3911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1f6bcc1
to
5240fef
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3911 +/- ##
==========================================
+ Coverage 78.24% 78.25% +0.01%
==========================================
Files 283 284 +1
Lines 109206 109376 +170
Branches 109206 109376 +170
==========================================
+ Hits 85450 85595 +145
- Misses 20388 20408 +20
- Partials 3368 3373 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5240fef
to
f318047
Compare
build_frag_reuse_index_metadata(dataset, frag_reuse_index_details, fragment_bitmaps) | ||
.await?; | ||
|
||
let transaction = Transaction::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that this PR does not include the conflict resolution logic when there is a CreateIndex transaction vs a Rewrite transaction that both tries to create a new frag reuse index. That is handled in the next PR.
f318047
to
dfa16c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Do you think we will eventually tie this in with cleanup_old_versions
or will it remain a separate cleanup function even up at the user level?
let indices = dataset.load_indices().await?; | ||
let frag_reuse_index_meta = match indices.iter().find(|idx| idx.name == FRAG_REUSE_INDEX_NAME) { | ||
None => return Ok(()), | ||
Some(idx) => idx, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a load_indices_by_name
method. Maybe we should convert it to load_index_by_name
and then you can use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I originally used that but ended up doing this because that one returns a list and I still need to check empty or select the first one. I think it is definitely worth adding a load_index_by_name which can remove a lot of similar code like this even for other indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should even start considering name
to be unique. We have APIs like drop_index
which take the index name as a parameter and sort of imply that a unique index name is already the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The APIs are a bit inconsistent at this point, for example drop_index
will drop all the indices sharing the same name, so technically drop_indices(&mut self, name: &str)
is a more correct name:
let transaction = Transaction::new(
self.manifest.version,
Operation::CreateIndex {
new_indices: vec![],
removed_indices: indices.clone(),
},
/*blobs_op= */ None,
None,
);
But in prewarm_index
, it assumes there is only 1 index per name and only warms the first index of that name:
let index = self
.open_generic_index(name, &indices[0].uuid.to_string(), &NoOpMetricsCollector)
.await?;
index.prewarm().await?;
Maybe we need to just re-examine all of them to make them at least consistent. But my understanding is that there could be delta indices that share the same name but are different UUIDs, so we cannot really enforce unique index name?
|
||
let mut retained_versions = Vec::new(); | ||
let mut fragment_bitmaps = RoaringBitmap::new(); | ||
for version in frag_reuse_details.versions.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep track of whether any versions were cleaned up? If nothing is cleaned up we could probably return early instead of proceeding with a transaction?
} | ||
None => { | ||
warn!( | ||
"Index {} ({}) missing fragment bitmap, cannot determine if it is caught up with the fragment reuse version, consider it caught up to unblock cleanup", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Index {} ({}) missing fragment bitmap, cannot determine if it is caught up with the fragment reuse version, consider it caught up to unblock cleanup", | |
"Index {} ({}) missing fragment bitmap, cannot determine if it is caught up with the fragment reuse version, consider retraining the index", |
We should maybe suggest a retrain here so users have a way to remove the warning.
if old_frag_in_index != frag_reuse_version.old_frags.len() { | ||
// This should never happen because we always commit a full rewrite group | ||
// and we always reindex either the entire group or nothing. | ||
// We use invalid input to be consistent with | ||
// dataset::transaction::recalculate_fragment_bitmap | ||
return Err(Error::invalid_input( | ||
format!("The compaction plan included a rewrite group that was a split of indexed and non-indexed data: {:?}", | ||
frag_reuse_version.old_frags.iter().map(|frag| frag.id).collect::<Vec<_>>()), | ||
location!())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be "return error" or just "log warning and skip"? Hopefully it just never happens, but if it did happen what would be the cleanup? Should the user just remove the frag reuse index at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if this ever happens that means the reuse version is corrupted. I kept the logic here, but made the cleanup function remove the version when this error is triggered.
Yeah I was considering doing that, but I also felt it might be more flexible to keep them separated because you can cleanup this index much more eagerly as soon as you know you have updated all indices, compared to the old versions cleanup the user might not really run that often if they keep the default of 14 days. |
dfa16c4
to
6bb53b4
Compare
Closes #3836