generated from duckdb/extension-template
-
Notifications
You must be signed in to change notification settings - Fork 128
Closed
Description
What happens?
Environment
- DuckDB: 1.4.2
- Extensions: httpfs and ducklake (matching 1.4.2)
- Object storage: S3
- Bucket name: non‑DNS compliant (contains uppercase and/or underscore)
- Data prefix: s3://<BUCKET_NON_DNS>/path%20with%20spaces/prefix/
Expected
With s3_url_style='path' and a region endpoint set (e.g. s3.eu-central-1.amazonaws.com), all maintenance operations—including ducklake_delete_orphaned_files—should succeed using path‑style URLs.
Actual
- Reads/writes and all other maintenance steps succeed.
- ducklake_delete_orphaned_files(...) fails with an SSL error during ListObjectsV2.
- Error shows a request path like:
'/?encoding-type=url&list-type=2&prefix=...'
→ Bucket name is missing from the path, suggesting virtual-hosted-style was used.
Full error message:
Exception has occurred: IOException
IO Error: Failed to perform CHECKPOINT; in DuckLake: Failed to get files scheduled for deletion from DuckLake: SSL connection failed error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=Data%20Tools%2FData%2Fparquet%2F'
LINE 2: FROM read_blob('s3://LIB/Data Tools/Data/parquet...
^
LINE 1: CALL ducklake_delete_orphaned_files('lake')
^
File "/ducklake_script.py", line 87, in <module>
lm.con.execute("CHECKPOINT;")
_duckdb.IOException: IO Error: Failed to perform CHECKPOINT; in DuckLake: Failed to get files scheduled for deletion from DuckLake: SSL connection failed error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=Data%20Tools%2FData%2Fparquet%2F'
LINE 2: FROM read_blob('s3://LIB/Data Tools/Data/parquet...
^
LINE 1: CALL ducklake_delete_orphaned_files('lake')
^
To Reproduce
import duckdb
con = duckdb.connect()
# 1) Install/load httpfs; force PATH-STYLE + explicit region endpoint
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_endpoint='s3.eu-central-1.amazonaws.com'")
con.execute("SET s3_region='eu-central-1'")
con.execute("SET s3_url_style='path'")
con.execute("SET s3_use_ssl=true")
# 2) Load ducklake and attach (metadata local; data on S3 with encoded spaces)
con.execute("INSTALL ducklake; LOAD ducklake;")
meta = 'ducklake:/memory_meta/metadata.ducklake'
data = "s3://<BUCKET_NON_DNS>/path%20with%20spaces/prefix/" # non-DNS bucket name here
con.execute(f"ATTACH '{meta}' AS lake (DATA_PATH '{data}');")
con.execute("USE lake;")
# 3) Sanity check: listing via HTTPFS (path-style) should succeed
con.execute(f"SELECT COUNT(*) FROM list_files('{data}')").fetchall()
# 4) Maintenance: all succeed except orphan deletion
for stmt in [
"CALL ducklake_flush_inlined_data('lake');",
"CALL ducklake_expire_snapshots('lake');",
"CALL ducklake_merge_adjacent_files('lake');",
"CALL ducklake_rewrite_data_files('lake');",
"CALL ducklake_cleanup_old_files('lake', cleanup_all => true);"
]:
con.execute(stmt)
# 5) Repro: orphan deletion (requires ListObjectsV2) -> FAILS with SSL error
con.execute("CALL ducklake_delete_orphaned_files('lake');")
# Also fails:
# con.execute(\"CALL ducklake_delete_orphaned_files('lake', dry_run => true, older_than => now() - INTERVAL '1 week');\")
OS:
Linux - x86_64
DuckDB Version:
1.4.2
DuckLake Version:
Extension matching DuckDB 1.4.2
DuckDB Client:
Python
Hardware:
No response
Full Name:
Peter Schmidt
Affiliation:
Private
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have