Skip to content

ducklake_delete_orphaned_files fails on S3 with non‑DNS bucket, even with path‑style + explicit endpoint #562

@phmu16ab

Description

@phmu16ab

What happens?

Environment

  • DuckDB: 1.4.2
  • Extensions: httpfs and ducklake (matching 1.4.2)
  • Object storage: S3
  • Bucket name: non‑DNS compliant (contains uppercase and/or underscore)
  • Data prefix: s3://<BUCKET_NON_DNS>/path%20with%20spaces/prefix/

Expected
With s3_url_style='path' and a region endpoint set (e.g. s3.eu-central-1.amazonaws.com), all maintenance operations—including ducklake_delete_orphaned_files—should succeed using path‑style URLs.

Actual

  • Reads/writes and all other maintenance steps succeed.
  • ducklake_delete_orphaned_files(...) fails with an SSL error during ListObjectsV2.
  • Error shows a request path like:
    '/?encoding-type=url&list-type=2&prefix=...'
    → Bucket name is missing from the path, suggesting virtual-hosted-style was used.

Full error message:

Exception has occurred: IOException
IO Error: Failed to perform CHECKPOINT; in DuckLake:  Failed to get files scheduled for deletion from DuckLake: SSL connection failed error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=Data%20Tools%2FData%2Fparquet%2F'

LINE 2: FROM read_blob('s3://LIB/Data Tools/Data/parquet...
             ^

LINE 1: CALL ducklake_delete_orphaned_files('lake')
                             ^
  File "/ducklake_script.py", line 87, in <module>
    lm.con.execute("CHECKPOINT;")
_duckdb.IOException: IO Error: Failed to perform CHECKPOINT; in DuckLake:  Failed to get files scheduled for deletion from DuckLake: SSL connection failed error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=Data%20Tools%2FData%2Fparquet%2F'

LINE 2: FROM read_blob('s3://LIB/Data Tools/Data/parquet...
             ^

LINE 1: CALL ducklake_delete_orphaned_files('lake')
                             ^

To Reproduce

import duckdb

con = duckdb.connect()

# 1) Install/load httpfs; force PATH-STYLE + explicit region endpoint
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_endpoint='s3.eu-central-1.amazonaws.com'")  
con.execute("SET s3_region='eu-central-1'")
con.execute("SET s3_url_style='path'")
con.execute("SET s3_use_ssl=true")

# 2) Load ducklake and attach (metadata local; data on S3 with encoded spaces)
con.execute("INSTALL ducklake; LOAD ducklake;")
meta = 'ducklake:/memory_meta/metadata.ducklake' 
data = "s3://<BUCKET_NON_DNS>/path%20with%20spaces/prefix/"      # non-DNS bucket name here

con.execute(f"ATTACH '{meta}' AS lake (DATA_PATH '{data}');")
con.execute("USE lake;")

# 3) Sanity check: listing via HTTPFS (path-style) should succeed
con.execute(f"SELECT COUNT(*) FROM list_files('{data}')").fetchall()

# 4) Maintenance: all succeed except orphan deletion
for stmt in [
    "CALL ducklake_flush_inlined_data('lake');",
    "CALL ducklake_expire_snapshots('lake');",
    "CALL ducklake_merge_adjacent_files('lake');",
    "CALL ducklake_rewrite_data_files('lake');",
    "CALL ducklake_cleanup_old_files('lake', cleanup_all => true);"
]:
    con.execute(stmt)

# 5) Repro: orphan deletion (requires ListObjectsV2) -> FAILS with SSL error
con.execute("CALL ducklake_delete_orphaned_files('lake');")

# Also fails:
# con.execute(\"CALL ducklake_delete_orphaned_files('lake', dry_run => true, older_than => now() - INTERVAL '1 week');\")

OS:

Linux - x86_64

DuckDB Version:

1.4.2

DuckLake Version:

Extension matching DuckDB 1.4.2

DuckDB Client:

Python

Hardware:

No response

Full Name:

Peter Schmidt

Affiliation:

Private

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions