generated from duckdb/extension-template
-
Notifications
You must be signed in to change notification settings - Fork 127
Open
Labels
Description
What happens?
If you create a column in a DuckLake table connected to a Postgres catalog, the maximum column name length is 64 in Postgres, while DuckLake is unbounded. This causes queries to fail when data is inlined in the Postgres table and needs to be unioned with the DuckLake materialized data.
It takes some time for the issue to present. I was chasing after whether this was azure specific or ducklake. I am able to reproduce locally with ducklake now.
To Reproduce
ENV, update as you need
export DUCKLAKE_HOST=localhost
export DUCKLAKE_PORT=5432
export DUCKLAKE_USER=ducklake
export DUCKLAKE_PASSWORD=ducklake
export DUCKLAKE_DB=ducklake_catalog
export PG_CONN="dbname=${DUCKLAKE_DB} host=${DUCKLAKE_HOST} port=${DUCKLAKE_PORT} user=${DUCKLAKE_USER} password=${DUCKLAKE_PASSWORD}"
export DUCKLAKE_CONN="ducklake:postgres:${PG_CONN}"
Init DuckDB
cat > init.sql <<EOF
INSTALL ducklake;
INSTALL postgres;
LOAD ducklake;
LOAD postgres;
ATTACH '${DUCKLAKE_CONN}' AS lakehouse (DATA_PATH 'column_name_too_long_test', DATA_INLINING_ROW_LIMIT 10, OVERRIDE_DATA_PATH true);
ATTACH '${PG_CONN}' AS pg (TYPE postgres);
EOF
duckdb -init init.sql
Local DuckLake Tests
-- Get NAMEDATALEN (returns 63)
SELECT * FROM postgres_query('pg', $$
SELECT current_setting('max_identifier_length')
$$);
-- Create a table with the specified column name
CREATE TABLE lakehouse.test_long_column (
id INTEGER,
"this_is_a_very_long_column_name_that_exceeds_sixty_four_characters_limit" VARCHAR
);
-- Insert 15 rows one at a time
INSERT INTO lakehouse.test_long_column VALUES (1, 'First row data');
INSERT INTO lakehouse.test_long_column VALUES (2, 'Second row data');
INSERT INTO lakehouse.test_long_column VALUES (3, 'Third row data');
INSERT INTO lakehouse.test_long_column VALUES (4, 'Fourth row data');
INSERT INTO lakehouse.test_long_column VALUES (5, 'Fifth row data');
INSERT INTO lakehouse.test_long_column VALUES (6, 'Sixth row data');
INSERT INTO lakehouse.test_long_column VALUES (7, 'Seventh row data');
INSERT INTO lakehouse.test_long_column VALUES (8, 'Eighth row data');
INSERT INTO lakehouse.test_long_column VALUES (9, 'Ninth row data');
INSERT INTO lakehouse.test_long_column VALUES (10, 'Tenth row data');
INSERT INTO lakehouse.test_long_column VALUES (11, 'Eleventh row data');
INSERT INTO lakehouse.test_long_column VALUES (12, 'Twelfth row data');
INSERT INTO lakehouse.test_long_column VALUES (13, 'Thirteenth row data');
INSERT INTO lakehouse.test_long_column VALUES (14, 'Fourteenth row data');
INSERT INTO lakehouse.test_long_column VALUES (15, 'Fifteenth row data');
-- Flush all record
CALL ducklake_flush_inlined_data('lakehouse', schema_name := 'main', table_name := 'test_long_column');
-- Add one more record so it is inlined
INSERT INTO lakehouse.test_long_column VALUES (16, 'Sixteeth row data');
-- Query the table which requires inline DB + files
SELECT * FROM lakehouse.main.test_long_column;
-- Everything above here works
-- Wait some time
.shell sleep 300
-- Stuff breaks (Failed to read inlined data from DuckLake: Referenced column long one)
CALL ducklake_flush_inlined_data('lakehouse', schema_name := 'main', table_name := 'test_long_column');
-- Stuff breaks (Failed to read inlined data from DuckLake: Referenced column long one)
select "this_is_a_very_long_column_name_that_exceeds_sixty_four_characters_limit" from lakehouse.main.test_long_column;
Binder Error:
Failed to read inlined data from DuckLake: Referenced column "this_is_a_very_long_column_name_that_exceeds_sixty_four_characters_limit" not found in FROM clause!
Candidate bindings: "this_is_a_very_long_column_name_that_exceeds_sixty_four_charact", "begin_snapshot", "end_snapshot", "id"
LINE 2: SELECT id::INTEGER, this_is_a_very_long_column_name_that_exceeds_sixty_four_cha...
^
LINE 1: CALL ducklake_flush_inlined_data('lakehouse', schema_name := 'main', table_name...
OS:
Mac M5
DuckDB Version:
1.4.3
DuckLake Version:
0.3
DuckDB Client:
Multiple (python 3.12, dbt, CLI)
Hardware:
No response
Full Name:
James Winegar
Affiliation:
CorrDyn
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have