Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Add links to projects on README and update parquet-compatibility to parquet-testing #482

Merged
merged 2 commits into from
Feb 10, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,15 @@ Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data pr

## Modules

The `parquet-format` project contains format specifications and Thrift definitions of metadata required to properly read Parquet files.
The [parquet-format] project contains format specifications and Thrift definitions of metadata required to properly read Parquet files.

The `parquet-java` project contains multiple sub-modules, which implement the core components of reading and writing a nested, column-oriented data stream, map this core onto the parquet format, and provide Hadoop Input/Output Formats, Pig loaders, and other java-based utilities for interacting with Parquet.
The [parquet-java] project contains multiple sub-modules, which implement the core components of reading and writing a nested, column-oriented data stream, map this core onto the parquet format, and provide Hadoop Input/Output Formats, Pig loaders, and other java-based utilities for interacting with Parquet.

The `parquet-compatibility` project contains compatibility tests that can be used to verify that implementations in different languages can read and write each other's files.
The [parquet-testing] project contains a set of files that can be used to verify that implementations in different languages can read and write each other's files.

[parquet-format]: https://github.com/apache/parquet-format
[parquet-java]: https://github.com/apache/parquet-java
[parquet-testing]: https://github.com/apache/parquet-testing

## Building

Expand Down Expand Up @@ -295,10 +299,6 @@ There are many places in the format for compatible extensions:
Parquet Thrift IDL reserves field-id `32767` of every Thrift struct for extensions.
The (Thrift) type of this field is always `binary`.

## Testing

The [apache/parquet-testing](https://github.com/apache/parquet-testing) contains a set of Parquet files for testing purposes.

## Contributing
Comment on the issue and/or contact [the parquet-dev mailing list](http://mail-archives.apache.org/mod_mbox/parquet-dev/) with your questions and ideas.
Changes to this core format definition are proposed and discussed in depth on the mailing list. You may also be interested in contributing to the Parquet-Java subproject, which contains all the Java-side implementation and APIs. See the "How To Contribute" section of the [Parquet-Java project](https://github.com/apache/parquet-java#how-to-contribute)
Expand Down