-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Selfeer edited this page Dec 18, 2024
·
43 revisions
| Versions | Releases |
|---|---|
| License | Apache-2.0 |
Parquetify is a lightweight tool leveraging the parquet-java library to generate Apache Parquet files based on the file definition provided in a JSON file.
| Feature | Description |
|---|---|
| Physical Data Types: | All physical data types: INT32, INT64, BOOLEAN, FLOAT, DOUBLE, BINARY, FIXED_LEN_BYTE_ARRAY. |
| Logical Data Types: | Most logical types : UTF8, DECIMAL, DATE, TIME_MILLIS, TIME_MICROS, TIMESTAMP_MILLIS, TIMESTAMP_MICROS, ENUM, NONE, MAP, LIST, STRING, MAP_KEY_VALUE, TIME, INTEGER, JSON, BSON, UUID, INTERVAL, UINT_8, UINT_16, UINT_32, UINT_64, INT_8, INT_16, INT_32, INT_64, FLOAT16. |
| Precision & Scale: | Precision and scale for DECIMAL types. |
| Compression: |
NONE, SNAPPY, GZIP, LZO, BROTLI, LZ4, ZSTD. |
| Encodings: | Automatically set by the writer for a given column. |
| Bloom Filter: | Apply a bloom filter to specific columns or all columns (including those within groups). |
| Writer Version: | Specify writer version (1.0, 2.0). |
| Customizable Sizes: | Row group and page sizes. |
- Parquet File Name
- Options of the File
- File Compression
- Writer Version
- Row and Page Size
- Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Float16 Column
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
- Encodings
- File Encryption
- Extra Metadata Entries
Developed and maintained by the Altinity team.
- Home
- Parquet File Name
- Options of the File
- File Compression
- Writer Version
- Row and Page Size
- Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Float16 Column
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
- Encodings
- File Encryption
- Extra Metadata Entries