Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCSToBQLoadRunnable doesn't respect GCS folder #256

Open
zinok opened this issue Dec 24, 2022 · 0 comments
Open

GCSToBQLoadRunnable doesn't respect GCS folder #256

zinok opened this issue Dec 24, 2022 · 0 comments

Comments

@zinok
Copy link

zinok commented Dec 24, 2022

I have run into a problem when using the GCS->BQ batch mode of the BigQuerySinkConnector; each connector schedules it's own instance of the GCSToBQLoadRunnable which does not use the GCS folder when listing objects to load into BigQuery.

Because of this, if you have multiple connectors using the same bucket but different folders they all load all the objects in the bucket, irrespective of the folder they are in, and so you receive many duplicates in BQ. Further to this, only one instance will successfully delete the object and when the other instances try and fail, they will simply try again and again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant