Skip to content

Conversation

@bwvdnbro
Copy link
Member

@bwvdnbro bwvdnbro commented Feb 9, 2022

This adds support for reading distributed VR catalogues, as requested in #72. Metadata is present in each file of a distributed catalogue, so the only significant difference between a normal and a distributed catalogue is the way you read datasets. This is handled by using a new VelociraptorCatalogueReader object that takes care of reading and distinguishes between single file and distributed file reads.

Currently I have only implemented this for the actual catalogue (*properties.* files), but I assume something similar would need to be done for the particle and SO files.

@bwvdnbro bwvdnbro marked this pull request as draft February 9, 2022 12:42
@JBorrow
Copy link
Member

JBorrow commented Feb 14, 2022

A couple of general comments:

  1. Don't forget your type hints!
  2. I think the filename=x.filenames[0] is pretty dirty. Maybe have x.filename be a property that returns that internally?
  3. You should use pathlib.Path objects, then you can use e.g. the glob() method on that to search rather than using regex.

@bwvdnbro
Copy link
Member Author

I have attempted to add type hints, let me know if you want more. I agree that the filenames[0] was ugly and I have created a property.

Regarding your last comment: the issue here is not so much finding all the files, but generating an ordered list of files. While glob() would work to generate the list, there is no clean way to sort that list. glob() does not sort files and because numbers for VR are not padded, sort() sorts them wrong (so you get x.properties.1, x.properties.11... x.properties.2 etc). So in order to sort correctly, you need to write your own comparison function and that requires... a regex. I thought just using the regex from the start would be easier.

Any thoughts on the particle and SO list files? Where are those used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants