Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better S3 test cases to hubverse-transform #30

Open
1 task
bsweger opened this issue Nov 14, 2024 · 0 comments
Open
1 task

Add better S3 test cases to hubverse-transform #30

bsweger opened this issue Nov 14, 2024 · 0 comments
Assignees

Comments

@bsweger
Copy link
Collaborator

bsweger commented Nov 14, 2024

We've hit a few errors when syncing hub files to S3 because our test suite isn't robust enough for S3-style file syncing.

Moto, my usual go-to for AWS mocking doesn't help because pyarrow's S3FS object isn't based on boto (it's possible that moto server could help)

We should do one of the following:

  • Figure out how to mock S3 in a way that works with pyarrow
  • Stand up a live S3 test hub with a "raw/model-output" folder (similar to how our R tools test S3)

**Definition of Done

  • all of hubverse-transform's test cases are run against S3 as well as the local filesystem
@bsweger bsweger converted this from a draft issue Nov 14, 2024
@bsweger bsweger added this to the hubverse cloud sync milestone Nov 14, 2024
bsweger added a commit that referenced this issue Nov 14, 2024
When reading parquet files from S3, hubverse-transform
does an initial read to get the schema (so we can override
it if necessary). However, the read fails because it's
reading the wrong thing, and the transform process tries
to open the model-output data on the local filesystem instead
of on S3.

I opened an issue to address the lack of S3 test cases,
which resulting in this bug hitting production:
#30
bsweger added a commit that referenced this issue Nov 14, 2024
When reading parquet files from S3, hubverse-transform
does an initial read to get the schema (so we can override
it if necessary). However, the read fails because it's
reading the wrong thing, and the transform process tries
to open the model-output data on the local filesystem instead
of on S3.

I opened an issue to address the lack of S3 test cases,
which resulting in this bug hitting production:
#30
@bsweger bsweger self-assigned this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Wishlist
Development

No branches or pull requests

1 participant