Add files to add existing Parquet files to a table #932

ZENOTME · 2025-02-01T10:36:04Z

In #345, we support writing new data files and appending them to the table. But we haven't support appending existing data files which need to support reading existing data files and generating corresponding metadata DataFile.

jonathanc-n · 2025-02-02T22:31:19Z

I would like to try working on this.

ZENOTME · 2025-02-05T05:02:08Z

I would like to try working on this.

Thanks @jonathanc-n! Feel free to send the PR for this.

jonathanc-n · 2025-02-05T21:58:50Z

@ZENOTME When appending existing data files, should the system load file metadata by reading the current snapshot’s manifest lists from an existing Iceberg table, or would you prefer to specify a file path from which the system scans and infers metadata? I'm looking to just perform a TableScan based the answer and have it just add the DataFiles with the add_data_file.

ZENOTME · 2025-02-06T06:27:46Z

@ZENOTME When appending existing data files, should the system load file metadata by reading the current snapshot’s manifest lists from an existing Iceberg table, or would you prefer to specify a file path from which the system scans and infers metadata? I'm looking to just perform a TableScan based the answer and have it just add the DataFiles with the add_data_file.

Hi @jonathanc-n, I think we can refer the implementation of pyiceberg: https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L669C9-L669C18.

should the system load file metadata by reading the current snapshot’s manifest lists from an existing Iceberg table, or would you prefer to specify a file path from which the system scans and infers metadata?

I think the user will add file using transaction API so we can know which table it will be append and related metadata.

ZENOTME mentioned this issue Feb 1, 2025

Iceberg-rust Write support #700

Open

28 tasks

jonathanc-n mentioned this issue Feb 9, 2025

feat: Add existing parquet files #960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add files to add existing Parquet files to a table #932

Add files to add existing Parquet files to a table #932

ZENOTME commented Feb 1, 2025

jonathanc-n commented Feb 2, 2025

ZENOTME commented Feb 5, 2025

jonathanc-n commented Feb 5, 2025

ZENOTME commented Feb 6, 2025 •

edited

Loading

Add files to add existing Parquet files to a table #932

Add files to add existing Parquet files to a table #932

Comments

ZENOTME commented Feb 1, 2025

jonathanc-n commented Feb 2, 2025

ZENOTME commented Feb 5, 2025

jonathanc-n commented Feb 5, 2025

ZENOTME commented Feb 6, 2025 • edited Loading

ZENOTME commented Feb 6, 2025 •

edited

Loading