Commit 8eca76d
authored
Move ParquetMetadata decoder state machine into ParquetMetadataPushDecoder (#8340)
# Which issue does this PR close?
- part of #8000
- Follow on to #8080
- Closes #8439
# Rationale for this change
The current ParquetMetadataDecoder intermixes three things:
1. The state machine for decoding parquet metadata (footer, then
metadata, then (optional) indexes)
2. orchestrating IO (aka calling read, etc)
3. Decoding thrift encoded byte into objects
This makes it almost impossible to add features like "only decode a
subset of the columns in the ColumnIndex" and other potentially advanced
usecases
Now that we have a "push" style API for metadata decoding that avoids
IO, the next step is to extract out the actual work into this API so
that the existing ParquetMetadataDecoder just calls into the PushDecoder
# What changes are included in this PR?
1. Extract decoding state machine into PushMetadataDecoder
2. Extract thrift parsing into its own `parser` module
3. Update ParquetMetadataDecoder to use the PushMetadataDecoder
4. Extract the bytes --> object code into its own module
This almost certainly will conflict with @etseidl 's plans in
thrift-remodel.
# Are these changes tested?
by existing tests
# Are there any user-facing changes?
Not really -- this is an internal change that will make it easier to add
features like "only decode a subset of the columns in the ColumnIndex,
for example1 parent 07ae1dd commit 8eca76d
File tree
4 files changed
+472
-175
lines changed- parquet
- src/file/metadata
- tests/arrow_reader/io
4 files changed
+472
-175
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
46 | 126 | | |
47 | 127 | | |
48 | 128 | | |
| |||
79 | 159 | | |
80 | 160 | | |
81 | 161 | | |
82 | | - | |
| 162 | + | |
83 | 163 | | |
84 | 164 | | |
85 | 165 | | |
| |||
288 | 368 | | |
289 | 369 | | |
290 | 370 | | |
291 | | - | |
| 371 | + | |
292 | 372 | | |
293 | 373 | | |
294 | 374 | | |
| |||
0 commit comments