You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pg_query = \
"""
SELECT column_name, udt_name
FROM information_schema.columns
WHERE table_schema = '{0}' AND table_name = '{1}';
""".format(self.redshift_schema, self.table)
pg_schema = dict(pg_hook.get_records(pg_query))
incoming_keys = [column['name'] for column in schema]
diff = list(set(incoming_keys) - set(pg_schema.keys()))
In above snippet:
If "schema" column name contains any uppercase character, the column differential (diff) will erroneously be a non-empty set. This will in turn cause logic to attempt to insert a column that is already present in created table.
pg_query will report column_name == "columnname" (automatically lowercased by redshift) but incoming keys will leverage column['name'] == "ColumnName" so:
In [1]: diff = list(set(["ColumnName"]) - set(["columnname"]))
In [2]: diff
Out[2]: ['ColumnName']
This will cause subsequent logic to try to insert a new column called 'ColumnName' which will fail since 'columnname' already exists in created table.
The text was updated successfully, but these errors were encountered:
gavinpaes
changed the title
S3ToRedshiftOperator.reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check
reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check
May 22, 2020
gavinpaes
added a commit
to gavinpaes/redshift_plugin
that referenced
this issue
May 22, 2020
operators/s3_to_redshift_operator.py (Lines 189-191)
In above snippet:
If "schema" column name contains any uppercase character, the column differential (diff) will erroneously be a non-empty set. This will in turn cause logic to attempt to insert a column that is already present in created table.
Example
Assume schema = {"name": "ColumnName", "type": _ }
pg_query will report column_name == "columnname" (automatically lowercased by redshift) but incoming keys will leverage column['name'] == "ColumnName" so:
This will cause subsequent logic to try to insert a new column called 'ColumnName' which will fail since 'columnname' already exists in created table.
The text was updated successfully, but these errors were encountered: