-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in Google Sheets column droped if first value is null #591
Comments
I have also reproduced this bug. The problem lies in this block of code: # first row contains headers and is skipped
data_row_metadata = metadata["rowData"][1]["values"]
rows_data = values[1:] The problem is that it looks at the first row to get the Unfortunately this trick cannot work. It would be probably best to simply find the 1st row with all columns, or pad the |
If it helps, this is my very ugly workaround 😅 diff --git a/intra_dlt/helpers/google_sheets/__init__.py b/intra_dlt/helpers/google_sheets/__init__.py
index 5bdb623..5a5db60 100644
--- a/intra_dlt/helpers/google_sheets/__init__.py
+++ b/intra_dlt/helpers/google_sheets/__init__.py
@@ -138,6 +138,10 @@ def google_spreadsheet(
rows_data = values[1:]
data_types = get_data_types(data_row_metadata)
+ len_diff = len(headers) - len(data_types)
+ if len_diff > 0:
+ for i in range(len_diff):
+ data_types.append(None)
yield dlt.resource(
process_range(rows_data, headers=headers, data_types=data_types), |
@neuromantik33 this will prevent an error but also prevent getting correct metadata for the column where the value is missing. A better solution would be to take more metadata rows and find one best matching the list of columns. This is OFC way more work than fix you do here |
dlt version
1.4.1
Source name
google_sheets
Describe the problem
Issue:
When working with this table in Google Sheets:
The second column (
Col1
) droped by source.What should happen:
Both columns (
Col0
andCol1
) should keep their data as entered.What actually happens:
The second column (
Col1
) gets dropped.Expected behavior
No response
Steps to reproduce
load this csv to google sheets
col0, col1
a,
b, c
then read it with
google_spreadsheet(
spreadsheet_url_or_id=url,
credentials=creds,
range_names=["list1!a1:b3"]
)
How you are using the source?
I run this source in production.
Operating system
Linux
Runtime environment
Local
Python version
3.12.3
dlt destination
snowflake
Additional information
No response
The text was updated successfully, but these errors were encountered: