in Google Sheets column droped if first value is null #591

korbash · 2024-12-27T09:10:21Z

dlt version

1.4.1

Source name

google_sheets

Describe the problem

Issue:

When working with this table in Google Sheets:

Col0	Col1
a
b	c

The second column (Col1) droped by source.

What should happen:

Both columns (Col0 and Col1) should keep their data as entered.

What actually happens:

The second column (Col1) gets dropped.

Expected behavior

No response

Steps to reproduce

load this csv to google sheets
col0, col1
a,
b, c
then read it with
google_spreadsheet(
spreadsheet_url_or_id=url,
credentials=creds,
range_names=["list1!a1:b3"]
)

How you are using the source?

I run this source in production.

Operating system

Linux

Runtime environment

Local

Python version

3.12.3

dlt destination

snowflake

Additional information

No response

The text was updated successfully, but these errors were encountered:

neuromantik33 · 2025-01-16T19:52:28Z

I have also reproduced this bug. The problem lies in this block of code:

# first row contains headers and is skipped
data_row_metadata = metadata["rowData"][1]["values"]
rows_data = values[1:]

The problem is that it looks at the first row to get the data_row_metadata. If the 1st data row is smaller than the header row, as it is in my case, the column is silently ignored.

Unfortunately this trick cannot work. It would be probably best to simply find the 1st row with all columns, or pad the data_types so that it's length is equal to the number of headers.

neuromantik33 · 2025-01-16T20:05:39Z

If it helps, this is my very ugly workaround 😅

diff --git a/intra_dlt/helpers/google_sheets/__init__.py b/intra_dlt/helpers/google_sheets/__init__.py
index 5bdb623..5a5db60 100644
--- a/intra_dlt/helpers/google_sheets/__init__.py
+++ b/intra_dlt/helpers/google_sheets/__init__.py
@@ -138,6 +138,10 @@ def google_spreadsheet(
             rows_data = values[1:]
 
         data_types = get_data_types(data_row_metadata)
+        len_diff = len(headers) - len(data_types)
+        if len_diff > 0:
+            for i in range(len_diff):
+                data_types.append(None)
 
         yield dlt.resource(
             process_range(rows_data, headers=headers, data_types=data_types),

rudolfix · 2025-01-20T14:17:00Z

@neuromantik33 this will prevent an error but also prevent getting correct metadata for the column where the value is missing. A better solution would be to take more metadata rows and find one best matching the list of columns. This is OFC way more work than fix you do here

github-project-automation bot added this to Verified Sources Dec 27, 2024

github-project-automation bot moved this to Planned in Verified Sources Dec 27, 2024

korbash changed the title ~~in Google Sheets drop column if first value is null~~ in Google Sheets column droped if first value is null Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in Google Sheets column droped if first value is null #591

in Google Sheets column droped if first value is null #591

korbash commented Dec 27, 2024 •

edited

Loading

neuromantik33 commented Jan 16, 2025 •

edited

Loading

neuromantik33 commented Jan 16, 2025

rudolfix commented Jan 20, 2025

in Google Sheets column droped if first value is null #591

in Google Sheets column droped if first value is null #591

Comments

korbash commented Dec 27, 2024 • edited Loading

dlt version

Source name

Describe the problem

Expected behavior

Steps to reproduce

How you are using the source?

Operating system

Runtime environment

Python version

dlt destination

Additional information

neuromantik33 commented Jan 16, 2025 • edited Loading

neuromantik33 commented Jan 16, 2025

rudolfix commented Jan 20, 2025

korbash commented Dec 27, 2024 •

edited

Loading

neuromantik33 commented Jan 16, 2025 •

edited

Loading