Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read s3 time-series target data #78

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

annakrystalli
Copy link
Member

This PR resoves #75 by expanding on #71 but adding class based methods and adapting higher level functions to access target time-series data from the cloud.

This approach should make it easier to mirror cloud functionality when we move on to oracle data.

The PR also adds some cloud/target data utilities

I've also memoised some SubTreeFileSystem methods to reduce calls to s3. Should probably pay attention that it doesn't cause any unexpected behaviour.

Copy link

github-actions bot commented Mar 5, 2025

@annakrystalli annakrystalli linked an issue Mar 5, 2025 that may be closed by this pull request
@annakrystalli annakrystalli requested a review from zkamvar March 5, 2025 15:27
@annakrystalli annakrystalli marked this pull request as ready for review March 5, 2025 15:27
Copy link
Member

@zkamvar zkamvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

I made a couple of suggestions to make the code a bit more specific.

target_data_path <- hub_path$path("target-data")

td_files <- target_data_path$ls(allow_not_found = TRUE)
ts_files <- td_files[grepl(target_type, td_files, ignore.case = TRUE)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ts_files <- td_files[grepl(target_type, td_files, ignore.case = TRUE)]
ts_files <- td_files[startsWith(td_files, target_type)]

Since we are being prescriptive on the names, I don't think we need to be case insensitive. Also, startsWith() is faster than grepl() and prevents situations where we end up matching something like showtime-series1E02.mkv.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. To ensure we also pick up trailing characters too and also apply the same filter to local file systems, I implemented more thorough and consistent checks in f2a471d

@annakrystalli annakrystalli requested a review from zkamvar March 12, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create function to open cloud (S3) timeseries target data
2 participants