-
Notifications
You must be signed in to change notification settings - Fork 716
fix: do not remove new columns values #3181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @Braalfa thank you for spotting this. We have a test case for new columns but it seems to only check the length, not the vallues. Would you mind to also update the test case to verify the fix is working as intended? |
Hi @kukushking! Thanks for reviewing my PR. I included a DataFrame comparison in the test to make sure the remote values are as expected. Also, I made some additional changes to the test: I removed the column aws-sdk-pandas/awswrangler/athena/_write_iceberg.py Lines 542 to 551 in e939741
Also, I used |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Feature or Bugfix
Detail
to_iceberg
with a DataFrame that has new columns in it, the process adds the new columns to the schema, but doesn't upload the values. Therefore, a second call to the function is needed to actually upload the new columns values. This happens because the statementdf = df[catalog_cols]
removes the values of new columns in the DataFrame.Relates
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.