-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change local_path to save_path in get_bsrn #1282
Conversation
@kanderso-nrel Wanna add labels? :) |
@AdamRJensen as long as we're thinking about clarity... what do you think about editing the parameter description to make it explicit that |
I don't recall any other |
It's true that you haven't seen it in the other functions, but I think that is a mistake! Particularly for research projects (which need to be reproducible), I find it essential to have the raw files available locally. The main reason for this is that many data suppliers are not always consistent (e.g., BSRN updates files when mistakes are found and CAMS computes irradiance on the fly using the most recent inputs). Sure users could save the returned dataframe, but there's no straight method for packing the associated metadata. The alternative is for users to get the files themselves, but then all the super nice retrieval functions are obsolete. Personally, this would reduce the value of the If it's because you're worried about requests to add it, then I'd be happy to implement this feature for all the retrieval functions or at least the ones it makes sense for. I'm very interested in hearing other opinions on this? Do pvlib users not save data, and if you do, how do you do it? |
I'd be ok with functions like What other libraries can we look to for patterns? I found only one example of
I typically value the parsers more than the fetchers, so maybe we're coming in with different perspectives. The BSRN parser that you wrote is a great example of this - the file is horrible to parse but not difficult to get. In the interest of not holding up a release, I see two options:
Then we take up discussion elsewhere for the next release cycle. |
I like the pvlib pattern of functions only doing one thing, so my preference is for a separate |
They're not iotools functions yet, but #1264 and #1274 (WIP ERA5 and MERRA2 PRs) also have this local_path option, so BSRN would have some company if we decide to go forward with this. I see the point about functions doing one thing. But I'm also not enthusiastic about adding
If you think that chance is nontrivial, and the way forward isn't clear, I'd lean towards removing the feature for now until the way forward becomes clear. Deprecations are a pain for us and for users. |
I know that other solar related libraries save local data, e.g., SolarData and irradpy, so it seems others are also considering this useful. While it is easy to manually download a single file from one of the services, this is not the case in many use cases e.g., simulating PV systems using irradiance data from multiple sources or getting all BSRN files for the past 5 years for Europe. Besides ECMWF files as @mikofski mentioned, ERA5 and MERRA2 files can also not be downloaded from a user interface but requires API usage. Taking A thing that I don't like about the option of just saving the |
I do think the chance is nontrivial. I'm less concerned about deprecation because it feels like we might need a larger reorganization of the |
I don't think this is necessarily so. For example, imagine how should a user reuse a file that they had downloaded? Would they rerun the retrieval function, downloading another copy? Can't we move the parsing code to a separate function? This isolates each step into its own function to be grouped as needed. EG:
|
@mikofski just to be clear, are you also thinking of a def _get_raw_bsrn(lat, lon, ...):
# get the data
return data
def download_bsrn(filepath_or_buffer, lat, lon, ...):
data = _get_raw_bsrn(lat, lon, ...)
with open(filepath_or_buffer, 'w') as f:
f.write(data)
def get_bsrn(lat, lon, ...):
data = _get_raw_bsrn(lat, lon, ...)
parsed_data, metadata = parse_bsrn(data) # might need to be buffered
return parsed_data, metadata
def read_bsrn(filepath):
with open(filepath) as f:
parsed_data, metadata = parse_bsrn(fbuf)
return parsed_data, metadata
def parse_bsrn(fbuf):
# do the parsing
return parsed_data, metadata That could work. Here are some attempts to simplify the Alternative 1: def get_raw_bsrn(lat, lon, ...):
# get the data
# plain text easier for most users when API returns plain text
# but bad fit for consistency with APIs that return binary data
return buffered_data
def read_bsrn(filepath_or_buffer):
# https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext
if isinstance(filepath_or_buffer, (str, Path)):
cm = open(filepath_or_buffer)
else:
cm = nullcontext
with cm as buffered_data:
parsed_data, metadata = parse_bsrn(buffered_data)
return parsed_data, metadata
def parse_bsrn(buffered_data):
# do the parsing
return parsed_data, metadata
# my code
buffered_data = get_raw_bsrn(lat, lon, ...)
parsed_data, metadata = parse_bsrn(buffered_data)
# adam's code
buffered_data = get_raw_bsrn(lat, lon, ...)
with open('myfile', 'w') as f:
# or copyfileobj https://stackoverflow.com/a/3253819/2802993
f.write(buffered_data.read())
parsed_data, metadata = read_bsrn(myfile) Alternative 2 def get_raw_bsrn(lat, lon, ...):
# get the data
# plain text easier for most users when API returns csv
# but bad fit for consistency with APIs that return binary data
return data.read()
def read_bsrn(filepath_or_buffer):
# https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext
if isinstance(filepath_or_buffer, (str, Path)):
cm = open(filepath_or_buffer)
else:
cm = nullcontext
with cm as buffered_data:
parsed_data, metadata = parse_bsrn(buffered_data)
return parsed_data, metadata
def parse_bsrn(data):
buffered_data = StringIO(data)
# do the parsing
return parsed_data, metadata
# my code
data = get_raw_bsrn(lat, lon, ...)
parsed_data, metadata = parse_bsrn(data)
# adam's code
data = get_raw_bsrn(lat, lon, ...)
with open('myfile', 'w') as f:
f.write(data)
parsed_data, metadata = read_bsrn(myfile) Reiterating the comment in the To support direct calls to APIs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given further discussion in #1214, I recommend that we move ahead with this PR (rather than removing the feature) despite the relatively high risk of deprecation.
Is the save file feature actually tested? I don't see anything asserting that a file was written. Perhaps write to a temporary directory and assert that the file is there. Not sure if it's needed to assert the contents are as expected. I guess you could point the parse function at the (buffered) local file to ensure that it parses.
pvlib/tests/iotools/test_bsrn.py
Outdated
@@ -85,7 +85,7 @@ def test_get_bsrn(expected_index, bsrn_credentials): | |||
station='tam', | |||
username=username, | |||
password=password, | |||
local_path='') | |||
save_path='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when save_path=''
? An empty string is not None
(as the function logic tests for). Is this supposed to save a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that save_path
can either be a directory or file path (filename) depending on the nature of the function.
Since get_bsrn
retrieves data from a number of actual files (and not a database), the original files are retrieves and saved to the specified directory path with their original filename.
save_path: str or path-like, optional
If specified, a directory path of where to save files.
get_bsrn
differs quite a bit from the other functions, as the other functions generally retrieve one request from an API (which doesn't have an associated file name) and then save_path
would be a filename.
So in the get_bsrn
case I assume the empty string represents the current working directory. I can definitely write a test that checks that (better yet modify the current test to check it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's ok here since the filename includes the station name and we're not matching existing API nor committing to retaining this API.
The idea is that save_path can either be a directory or file path (filename) depending on the nature of the function.
I doubt this will be clear to most users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test looks good, thanks.
+1 for keeping functions mostly single-purpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanderso-nrel feel free to merge if you're happy with this
Just stumbled upon SunPy's Fido module which bears a strong resemblance to pvlib's iotools. The Fido functions have a keyword
Some explanation from SunPy:
It's not completely parallel, but I figured I'd mention it here in regards to the previous debate:
|
This pull request changes the input argument
local_path
tosave_path
inget_bsrn
as this is a more descriptive name.I am making this pull request now as this needs to be merged before 0.9 as otherwise it will be a breaking change.