Skip to content

Data Inlining Causing Binder Error for columns > 64 characters #619

@jameswinegar

Description

@jameswinegar

What happens?

If you create a column in a DuckLake table connected to a Postgres catalog, the maximum column name length is 64 in Postgres, while DuckLake is unbounded. This causes queries to fail when data is inlined in the Postgres table and needs to be unioned with the DuckLake materialized data.

It takes some time for the issue to present. I was chasing after whether this was azure specific or ducklake. I am able to reproduce locally with ducklake now.

To Reproduce

ENV, update as you need

export DUCKLAKE_HOST=localhost
export DUCKLAKE_PORT=5432
export DUCKLAKE_USER=ducklake
export DUCKLAKE_PASSWORD=ducklake
export DUCKLAKE_DB=ducklake_catalog

export PG_CONN="dbname=${DUCKLAKE_DB} host=${DUCKLAKE_HOST} port=${DUCKLAKE_PORT} user=${DUCKLAKE_USER} password=${DUCKLAKE_PASSWORD}"
export DUCKLAKE_CONN="ducklake:postgres:${PG_CONN}"

Init DuckDB

cat > init.sql <<EOF
INSTALL ducklake;
INSTALL postgres;
LOAD ducklake;
LOAD postgres;
ATTACH '${DUCKLAKE_CONN}' AS lakehouse (DATA_PATH 'column_name_too_long_test', DATA_INLINING_ROW_LIMIT 10, OVERRIDE_DATA_PATH true);
ATTACH '${PG_CONN}' AS pg (TYPE postgres);
EOF

duckdb -init init.sql

Local DuckLake Tests

-- Get NAMEDATALEN (returns 63)
SELECT * FROM postgres_query('pg', $$
      SELECT current_setting('max_identifier_length')
  $$);

-- Create a table with the specified column name
CREATE TABLE lakehouse.test_long_column (
    id INTEGER,
    "this_is_a_very_long_column_name_that_exceeds_sixty_four_characters_limit" VARCHAR
);

-- Insert 15 rows one at a time
INSERT INTO lakehouse.test_long_column VALUES (1, 'First row data');
INSERT INTO lakehouse.test_long_column VALUES (2, 'Second row data');
INSERT INTO lakehouse.test_long_column VALUES (3, 'Third row data');
INSERT INTO lakehouse.test_long_column VALUES (4, 'Fourth row data');
INSERT INTO lakehouse.test_long_column VALUES (5, 'Fifth row data');
INSERT INTO lakehouse.test_long_column VALUES (6, 'Sixth row data');
INSERT INTO lakehouse.test_long_column VALUES (7, 'Seventh row data');
INSERT INTO lakehouse.test_long_column VALUES (8, 'Eighth row data');
INSERT INTO lakehouse.test_long_column VALUES (9, 'Ninth row data');
INSERT INTO lakehouse.test_long_column VALUES (10, 'Tenth row data');
INSERT INTO lakehouse.test_long_column VALUES (11, 'Eleventh row data');
INSERT INTO lakehouse.test_long_column VALUES (12, 'Twelfth row data');
INSERT INTO lakehouse.test_long_column VALUES (13, 'Thirteenth row data');
INSERT INTO lakehouse.test_long_column VALUES (14, 'Fourteenth row data');
INSERT INTO lakehouse.test_long_column VALUES (15, 'Fifteenth row data');

-- Flush all record
CALL ducklake_flush_inlined_data('lakehouse', schema_name := 'main', table_name := 'test_long_column');
-- Add one more record so it is inlined
INSERT INTO lakehouse.test_long_column VALUES (16, 'Sixteeth row data');

-- Query the table which requires inline DB + files
SELECT * FROM lakehouse.main.test_long_column;

-- Everything above here works
-- Wait some time
.shell sleep 300
-- Stuff breaks (Failed to read inlined data from DuckLake: Referenced column long one)
CALL ducklake_flush_inlined_data('lakehouse', schema_name := 'main', table_name := 'test_long_column');
-- Stuff breaks (Failed to read inlined data from DuckLake: Referenced column long one)
select "this_is_a_very_long_column_name_that_exceeds_sixty_four_characters_limit" from lakehouse.main.test_long_column;
Binder Error:
Failed to read inlined data from DuckLake: Referenced column "this_is_a_very_long_column_name_that_exceeds_sixty_four_characters_limit" not found in FROM clause!
Candidate bindings: "this_is_a_very_long_column_name_that_exceeds_sixty_four_charact", "begin_snapshot", "end_snapshot", "id"

LINE 2: SELECT id::INTEGER, this_is_a_very_long_column_name_that_exceeds_sixty_four_cha...
                            ^

LINE 1: CALL ducklake_flush_inlined_data('lakehouse', schema_name := 'main', table_name...

OS:

Mac M5

DuckDB Version:

1.4.3

DuckLake Version:

0.3

DuckDB Client:

Multiple (python 3.12, dbt, CLI)

Hardware:

No response

Full Name:

James Winegar

Affiliation:

CorrDyn

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions