-
Notifications
You must be signed in to change notification settings - Fork 918
Skip page should also support skip dict page #7409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and nice sleuthing @zhuqi-lucas
What is weird is that we haven't hit this in production given how widely used this parquet-rs reader is. There are several systems I know of (like coralogix, pydantic and influxdb_iox that have filter pushdown enabled for years now) and skipping pages is done all the time via statistics.
Is there any way to trigger this issue from the public APIs (the unit test in this PR is using the low level APIs (like skip_next_page
directly) 🤔
Thank you @alamb for review and good question, i try to explain:
/// Skips over `num_records` records, where records are delimited by repetition levels of 0
///
/// # Returns
///
/// Returns the number of records skipped
pub fn skip_records(&mut self, num_records: usize) -> Result<usize> {
let mut remaining_records = num_records;
while remaining_records != 0 {
if self.num_buffered_values == self.num_decoded_values {
let metadata = match self.page_reader.peek_next_page()? {
None => return Ok(num_records - remaining_records),
Some(metadata) => metadata,
};
// If dictionary, we must read it
if metadata.is_dict {
self.read_dictionary_page()?;
continue;
}
// If page has less rows than the remaining records to
// be skipped, skip entire page
let rows = metadata.num_rows.or_else(|| {
// If no repetition levels, num_levels == num_rows
self.rep_level_decoder
.is_none()
.then_some(metadata.num_levels)?
});
if let Some(rows) = rows {
if rows <= remaining_records {
self.page_reader.skip_next_page()?;
remaining_records -= rows;
continue;
}
}
..............
} And the logic here, we already exclude the dict page case for above high level API, so we will not hit the bug: // If dictionary, we must read it
if metadata.is_dict {
self.read_dictionary_page()?;
continue;
} So the following code will not trigger the dict page skip: if let Some(rows) = rows {
if rows <= remaining_records {
self.page_reader.skip_next_page()?;
remaining_records -= rows;
continue;
}
}
|
I'm not sure this is correct, skip_next_page skips the next page regardless of its type? It seems especially odd as there isn't equivalent logic in the non-offset index case. Is this possibly a bug in whatever benchmark/cache logic you're referring to? |
Thank you @tustvold for review: More details in this ticket: We are trying to add benchmark for row filter push down: And also try to benchmark with the page cache PR: The benchmark will work well for main branch, but stuck for #6921 page cache branch, so i try to debug it and found the low level skip page API will not skip dict page. And the page cache PR will use this low level API. And main branch will also use this API, but it will exclude the dict page before use it:
/// Skips over `num_records` records, where records are delimited by repetition levels of 0
///
/// # Returns
///
/// Returns the number of records skipped
pub fn skip_records(&mut self, num_records: usize) -> Result<usize> {
let mut remaining_records = num_records;
while remaining_records != 0 {
if self.num_buffered_values == self.num_decoded_values {
let metadata = match self.page_reader.peek_next_page()? {
None => return Ok(num_records - remaining_records),
Some(metadata) => metadata,
};
// If dictionary, we must read it
if metadata.is_dict {
self.read_dictionary_page()?;
continue;
}
// If page has less rows than the remaining records to
// be skipped, skip entire page
let rows = metadata.num_rows.or_else(|| {
// If no repetition levels, num_levels == num_rows
self.rep_level_decoder
.is_none()
.then_some(metadata.num_levels)?
});
if let Some(rows) = rows {
if rows <= remaining_records {
self.page_reader.skip_next_page()?;
remaining_records -= rows;
continue;
}
}
..............
} And the logic here, we already exclude the dict page case for above high level API, so we will not hit the bug: // If dictionary, we must read it
if metadata.is_dict {
self.read_dictionary_page()?;
continue;
} So the following code will not trigger the dict page skip: if let Some(rows) = rows {
if rows <= remaining_records {
self.page_reader.skip_next_page()?;
remaining_records -= rows;
continue;
}
}
|
Right so this is a bug in the cache page PR which is using this API incorrectly, this is not a bug in skip_next_page, it is behaving correctly as documented. |
Got it @tustvold thanks, so you mean we don't need to make the skip page support dict page, it's the right behaviour. If so, we can fix it in the page cache PR. |
But when we call the get next page, we will also include the dict page for low level, so i was wandering if we can make the skip page consistent, and it will not make main branch any error. impl<R: ChunkReader> PageReader for SerializedPageReader<R> {
fn get_next_page(&mut self) -> Result<Option<Page>> {
loop {
let page = match &mut self.state {
SerializedPageReaderState::Values {
offset,
remaining_bytes: remaining,
next_page_header,
page_ordinal,
require_dictionary,
} => {
......
page
}
SerializedPageReaderState::Pages {
page_locations,
dictionary_page,
..
} => {
let front = match dictionary_page
.take()
.or_else(|| page_locations.pop_front())
{
Some(front) => front,
None => return Ok(None),
};
}
}
} |
Yes, skip_next_page skips the next page regardless of its type, as documented. TBC we could change this, but it would be a breaking API change that we'd need to discuss and evaluate. |
Oh wait I think I misread this PR, standby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for some reason I read this as skipping two pages at once
Got it, thank you @tustvold for double check. |
This is a nice catch, thank you @zhuqi-lucas |
Thank you @zhuqi-lucas and @tustvold and @XiangpengHao |
Which issue does this PR close?
Closes part of #7363
Rationale for this change
We hit the bug during the benchmark testing for the cache page branch.
The bug is we skip page not skipping the dict page which will make the corner case dead lock for the benchmark testing.
What changes are included in this PR?
Fix the skip page also support skip dict page.
Are there any user-facing changes?
No, the skip dict page use case only happen for the cache page branch.