Skip to content

Conversation

@mchataigner
Copy link

Context: Slower read performance when a table has many delete files.

TL;DR: We can leverage the metadata already available in DuckLake to improve load time of delete files.

Problem & Motivation:

DuckLake stores file_size metadata for both data and delete files. For data files, there is already a mechanism to forward this metadata to the MultiFileReader and the underlying filesystem. The Parquet reader requires this file_size to access the footer metadata. When using an HTTPFileSystem instance (e.g., for S3, Azure), it performs a HEAD request on the file if metadata fields (file_size, etag, last_modified) are not present. Since all files in DuckLake are immutable, we can apply the same optimization logic for delete files to avoid these unnecessary HEAD requests.

Solution:

Implements a custom multi-file reading solution that pre-populates file metadata to eliminate redundant storage HEAD requests when scanning delete files:

Key Changes:

  1. New DeleteFileFunctionInfo struct: Extends TableFunctionInfo to carry DuckLakeFileData metadata through the table function binding process.

  2. Custom DeleteFileMultiFileReader class:

    • Extends DuckDB's MultiFileReader to intercept file list creation
    • Pre-populates ExtendedOpenFileInfo with metadata already available from DuckLake:
      • File size (file_size_bytes)
      • ETag (empty string as placeholder)
      • Last modified timestamp (set to epoch)
      • Encryption key (if present)
    • Creates a SimpleMultiFileList with this extended info upfront
    • Overrides CreateFileList() to return the pre-built list, bypassing DuckDB's default file discovery
  3. Modified ScanDeleteFile() method:

    • Changed parquet_scan from const reference to mutable copy to allow modification
    • Attaches DeleteFileFunctionInfo and custom reader factory to the table function
    • Passes the actual parquet_scan function to TableFunctionBindInput instead of a dummy function, ensuring proper function context

Performance Impact: Eliminates HEAD requests to object storage when opening Parquet delete files. This is particularly beneficial when working with remote storage (S3, Azure, etc.) and tables with many delete files, where HEAD requests were causing significant performance bottlenecks.

@mchataigner mchataigner force-pushed the mbc/improve_scan_delete_files branch 2 times, most recently from 8dfed69 to c45e07f Compare December 22, 2025 16:48
@mchataigner mchataigner changed the title Add custom MultiFileReader to avoid HEAD requests when scanning delete files Add custom MultiFileReader for reading delete files Dec 22, 2025
Copy link
Collaborator

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mchataigner thanks for the PR!
Could you add a MinIO test that demonstrates fewer requests are done ?
Could you also retarget it to v 1.4?

@mchataigner
Copy link
Author

@pdet you're welcome. I will do the changes.
Thanks for your feedback.

@mchataigner mchataigner force-pushed the mbc/improve_scan_delete_files branch from c45e07f to b625d74 Compare December 26, 2025 23:29
…e files

**Context**: We experience slow read performance when a table has many delete files.

**TL;DR**: We can leverage the metadata already available in DuckLake to improve load time of delete files.

**Problem & Motivation:**

DuckLake stores `file_size` metadata for both data and delete files. For data files, there is already a mechanism to forward this metadata to the MultiFileReader and the underlying filesystem. The Parquet reader requires this `file_size` to access the footer metadata. When using an `HTTPFileSystem` instance (e.g., for S3, Azure), it performs a HEAD request on the file if metadata fields (`file_size`, `etag`, `last_modified`) are not present. Since all files in DuckLake are immutable, we can apply the same optimization logic for delete files to avoid these unnecessary HEAD requests.

**Solution:**

Implements a custom multi-file reading solution that pre-populates file metadata to eliminate redundant storage HEAD requests when scanning delete files:

**Key Changes:**

1. **New `DeleteFileFunctionInfo` struct**: Extends `TableFunctionInfo` to carry `DuckLakeFileData` metadata through the table function binding process.

2. **Custom `DeleteFileMultiFileReader` class**:
   - Extends DuckDB's `MultiFileReader` to intercept file list creation
   - Pre-populates `ExtendedOpenFileInfo` with metadata already available from DuckLake:
     - File size (`file_size_bytes`)
     - ETag (empty string as placeholder)
     - Last modified timestamp (set to epoch)
     - Encryption key (if present)
   - Creates a `SimpleMultiFileList` with this extended info upfront
   - Overrides `CreateFileList()` to return the pre-built list, bypassing DuckDB's default file discovery

3. **Modified `ScanDeleteFile()` method**:
   - Changed `parquet_scan` from const reference to mutable copy to allow modification
   - Attaches `DeleteFileFunctionInfo` and custom reader factory to the table function
   - Passes the actual `parquet_scan` function to `TableFunctionBindInput` instead of a dummy function, ensuring proper function context

**Performance Impact**: Eliminates HEAD requests to object storage when opening Parquet delete files. This is particularly beneficial when working with remote storage (S3, Azure, etc.) and tables with many delete files, where HEAD requests were causing significant performance bottlenecks.
@mchataigner mchataigner force-pushed the mbc/improve_scan_delete_files branch from b625d74 to db066de Compare December 26, 2025 23:31
@mchataigner mchataigner changed the base branch from main to v1.4-andium December 26, 2025 23:31
@mchataigner mchataigner requested a review from pdet December 26, 2025 23:32
@mchataigner
Copy link
Author

@pdet sorry for the delay, I updated the PR with format fix and added a test with MinIO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants