Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure encoding["source"] is available for a pathlib.Path object #6974

Merged
merged 8 commits into from
Sep 13, 2022

Conversation

ColemanTom
Copy link
Contributor

@ColemanTom ColemanTom commented Sep 1, 2022

Closes #5888. At the moment if you pass a Path object, the source will not be encoded in, only if a string is passed in. Given there is already a function to handle this, _normalize_path, I've simplified the logic to just run that over the str/Path to ensure it is always encoded.

I'm not sure if this needs including in the whats-new.rst. I'm happy to add it if desired.

Resolves Issue pydata#5888.

With this change, the "source" encoding will be stored whether the filename is
a string or a Path object.
Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also possible to pass the whole dataset as bytes & we don't want this to end off in ds.encoding["source"] I hope my suggestion would take care of this.

@ColemanTom
Copy link
Contributor Author

I think it's also possible to pass the whole dataset as bytes & we don't want this to end off in ds.encoding["source"] I hope my suggestion would take care of this.

Thanks. I had forgotten you could, for example, pass file-like objects as I don't tend to. I've confirmed your suggestion works and have added it to the change.

@mathause mathause enabled auto-merge (squash) September 12, 2022 22:14
@mathause
Copy link
Collaborator

Let's see if that works now - should auto-merge on green.

Thanks for your contribution and welcome to xarray!

@mathause mathause merged commit e00e51f into pydata:main Sep 13, 2022
@kwodzicki
Copy link

It would be great to further improve this by handling file-like objects. I am running into an issue when trying to read some data from S3 buckets. I can pass in a file-like object and read the data, but there is no .encoding['source'] in the subsequent dataset.

Can simulate this 'issue' using local data:

import xarray as xr
file = '/path/to/file.nc'
fid  = open(file, 'rb')
ds   = xr.open_dataset( fid, engine='h5netcdf' )

This will have no source encoding in the dataset, but it could be grabbed from fid.name.

Could a check be added for file-like objects, and then insert the .path or .name attribute into .encoding['source']?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open_[mf]dataset ds.encoding['source'] for pathlib.Path?
4 participants