Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,6 @@ private static void writeTwoColumnEncryptedDictionaryFile(java.nio.file.Path pat
.filter(column -> column.getPath().equals(ID_PATH))
.findFirst().orElseThrow();

assertThat(ageChunk.getDictionaryPageOffset()).isGreaterThan(0);
assertThat(idChunk.getDictionaryPageOffset()).isGreaterThan(0);
Comment on lines -294 to -295
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will address the problem
If a column uses dictionary encoding, then it must write a dictionary page and that has to be a non-zero offset from beginning of file (because parquet file has a compulsory header of few bytes).
So if this was failing sometimes, then the next check for use of dictionary encoding should also fail, as the dictionary page offset can be zero only if there is no dictionary page at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot really reproduce the issue locally.

Take a look at https://stackoverflow.com/a/55226688.

If a column uses dictionary encoding, then it must write a dictionary page and that has to be a non-zero offset from beginning of file (because parquet file has a compulsory header of few bytes).

Assuming that example is correct (where parquet-mr puts 0 even when dictionaries are present), does it mean we do not support dictionary offset correctly?

assertThat(ageChunk.getEncodings()).anyMatch(org.apache.parquet.column.Encoding::usesDictionary);
assertThat(idChunk.getEncodings()).anyMatch(org.apache.parquet.column.Encoding::usesDictionary);
}
Expand Down
Loading