diff --git a/docs/examples_overview.md b/docs/examples_overview.md index 6c102dc6..10f1f686 100644 --- a/docs/examples_overview.md +++ b/docs/examples_overview.md @@ -43,7 +43,7 @@ No manual setup is required - all necessary input files will be automatically ge ## Additional Resource Downloading and Upsampling Examples -Examples are provided in the `examples/inputs/` folder demonstrating how to download wind and solar data using the `hercules.resource.wind_solar_resource_downloader` module and upsample wind data using the `hercules.resource.upsample_wind_data` module to create inputs for Hercules simulations. +Examples are provided in the `examples/inputs/` folder demonstrating how to download wind and solar data using the `hercules.resource.nsrdb_downloader`, `hercules.resource.wtk_downloader`, and `hercules.resource.openmeteo_downloader` modules and upsample wind data using the `hercules.resource.upsample_wind_data` module to create inputs for Hercules simulations. - [03: Download NSRDB and WIND Toolkit Solar and Wind Data](../examples/inputs/03_download_small_nsrdb_wtk_solar_wind_example.py) - Downloads a subset of solar and wind data for a small grid of locations for a single year from the NSRDB and WIND Toolkit datasets, respectively - [04: Download and Upsample WIND Toolkit Wind Data](../examples/inputs/04_download_and_upsample_wtk_wind_example.py) - Downloads wind speed and direction for a small grid of locations for a single year from the WIND Toolkit dataset, then spatially interpolates the data at specific wind turbine locations and temporally upsamples the times series with added turbulence diff --git a/docs/resource_downloading.md b/docs/resource_downloading.md index b1daba46..1ee817fd 100644 --- a/docs/resource_downloading.md +++ b/docs/resource_downloading.md @@ -1,16 +1,16 @@ # Solar and Wind Resource Downloading and Upsampling -Functions are provided in the `hercules.resource.wind_solar_resource_downloading` module for downloading solar and wind time series data so they can be used as inputs to Hercules simulations. The `hercules.resource.upsample_wind_data` module is used to spatially interpolate downloaded wind data at specific wind turbine locations and temporally upsample the data. +Functions are provided in the `hercules.resource.nsrdb_downloader`, `hercules.resource.wtk_downloader`, and `hercules.resource.openmeteo_downloader` modules for downloading solar and wind time series data so they can be used as inputs to Hercules simulations. The `hercules.resource.upsample_wind_data` module is used to spatially interpolate downloaded wind data at specific wind turbine locations and temporally upsample the data. ## Overview -The `hercules.resource.wind_solar_resource_downloading` module contains functions for downloading solar data from the [National Solar Radiation Database (NSRDB)](https://nsrdb.nrel.gov), wind data from the [Wind Integration National Dataset (WIND) Toolkit](https://www.nrel.gov/grid/wind-toolkit), and solar and wind data from [Open-Meteo](https://open-meteo.com). +The `hercules.resource.nsrdb_downloader`, `hercules.resource.wtk_downloader`, and `hercules.resource.openmeteo_downloader` modules contain functions for downloading solar data from the [National Solar Radiation Database (NSRDB)](https://nsrdb.nrel.gov), wind data from the [Wind Integration National Dataset (WIND) Toolkit](https://www.nrel.gov/grid/wind-toolkit), and solar and wind data from [Open-Meteo](https://open-meteo.com), respectively. For downloaded wind data, the `hercules.resource.upsample_wind_data` module can be used to spatially interpolate the data at specific wind turbine locations and temporally upsample the data to represent realistic turbulent wind speeds. The downloaded and upsampled data can be saved as `.feather` files and used as inputs to Hercules simulations. ## Solar and Wind Resource Downloading Module -This section describes the functions for downloading solar and wind resource data in the `hercules.resource.wind_solar_resource_downloading` module. +This section describes the functions for downloading solar and wind resource data in the `hercules.resource.nsrdb_downloader`, `hercules.resource.wtk_downloader`, and `hercules.resource.openmeteo_downloader` modules. ### API Key @@ -41,8 +41,8 @@ Arguments to the `download_nsrdb_data` function used to specify the data to down - `start_date`: If `year` is not used, the specific start date for which data are requested. - `end_date`: If `year` is not used, the specific end date for which data are requested. - `variables`: List of variables to download. Defaults to ["ghi", "dni", "dhi", "wind_speed", "air_temperature"]. -- `nsrdb_dataset_path`: Path name of NSRDB dataset. Available datasets are described [here](https://developer.nrel.gov/docs/solar/nsrdb/) and path names can be identified [here](https://data.openei.org/s3_viewer?bucket=nrel-pds-nsrdb). Defaults to the GOES Conus v4.0.0 dataset: "/nrel/nsrdb/GOES/conus/v4.0.0". -- `nsrdb_filename_prefix`: File name prefix for the NSRDB HDF5 files in the format "{nsrdb_filename_prefix}_{year}.h5". Information about file names can be found [here](https://data.openei.org/s3_viewer?bucket=nrel-pds-nsrdb). Defaults to "nsrdb_conus". +- `nsrdb_dataset_path`: Path name of NSRDB dataset. Available datasets are described [here](https://developer.nlr.gov/docs/solar/nsrdb/) and path names can be identified [here](https://data.openei.org/s3_viewer?bucket=nrel-pds-nsrdb). You can see also identify path names of datasets directly by following the directions on [this page](https://github.com/NatLabRockies/rex/tree/main/examples/HSDS), as datasets on the NLR HPC may be named differenctly than the AWS site. Defaults to the GOES Conus v4.0.0 dataset: "/nrel/nsrdb/GOES/conus/v4.0.0". +- `nsrdb_filename_prefix`: File name prefix for the NSRDB HDF5 files in the format "{nsrdb_filename_prefix}{year}.h5". Information about file names can be found [here](https://data.openei.org/s3_viewer?bucket=nrel-pds-nsrdb). Defaults to "nsrdb_conus_". - `coord_delta`: Coordinate delta for bounding box defining grid of points for which data are requested. Bounding box is defined as target_lat +/- coord_delta and target_lon +/- coord_delta. Defaults to 0.1 degrees. ### WIND Toolkit Wind Data @@ -75,7 +75,7 @@ Arguments to the `download_openmeteo_data` function used to specify the data to ## Wind Data Upsampling Module -After downloading wind data from WIND Toolkit or Open-Meteo, the `hercules.resource.upsample_wind_data` module can be used to spatially interpolate wind speeds and directions from the grid of downloaded points to specific wind turbine locations. The spatially interpolated wind speeds and directions are then upsampled to the desired temporal resolution and realistic turbulence is added to the wind speed time series. The upsampled data are then saved in the format used for wind inputs to Hercules simulations. +After downloading wind data from `hercules.resource.wtk_downloader` or `hercules.resource.openmeteo_downloader`, the `hercules.resource.upsample_wind_data` module can be used to spatially interpolate wind speeds and directions from the grid of downloaded points to specific wind turbine locations. The spatially interpolated wind speeds and directions are then upsampled to the desired temporal resolution and realistic turbulence is added to the wind speed time series. The upsampled data are then saved in the format used for wind inputs to Hercules simulations. ### Spatial Interpolation Overview @@ -98,9 +98,9 @@ Note that in the current implementation, independent stochastic turbulence is ge The function `upsample_wind_data` is used to perform the above-mentioned steps and return the upsampled wind speeds and directions at each upsampled location as a pandas DataFrame, which is also saved as a `.feather` file. Arguments to the `upsample_wind_data` function used to specify the upsampling are as follows: -- `ws_data_filepath`: Filepath to the `.feather` file containing raw downloaded wind speed data saved by the `download_wtk_data` or `download_openmeteo_data` functions in the `wind_solar_resource_downloading` module. -- `wd_data_filepath`: Filepath to the `.feather` file containing raw downloaded wind direction data saved by the `download_wtk_data` or `download_openmeteo_data` functions in the `wind_solar_resource_downloading` module. -- `coords_filepath`: Filepath to the `.feather` file containing the coordinates corresponding to the downloaded wind data saved by the `download_wtk_data` or `download_openmeteo_data` functions in the `wind_solar_resource_downloading` module. +- `ws_data_filepath`: Filepath to the `.feather` file containing raw downloaded wind speed data saved by the `download_wtk_data` function in `hercules.resource.wtk_downloader` or the `download_openmeteo_data` function in `hercules.resource.openmeteo_downloader`. +- `wd_data_filepath`: Filepath to the `.feather` file containing raw downloaded wind direction data saved by the `download_wtk_data` function in `hercules.resource.wtk_downloader` or the `download_openmeteo_data` function in `hercules.resource.openmeteo_downloader`. +- `coords_filepath`: Filepath to the `.feather` file containing the coordinates corresponding to the downloaded wind data saved by the `download_wtk_data` function in `hercules.resource.wtk_downloader` or the `download_openmeteo_data` function in `hercules.resource.openmeteo_downloader`. - `x_locs_upsample`: The "x" (Easting) locations of the desired upsampled locations (e.g., corresponding to turbine locations) relative to the provided origin coordinates in meters. - `y_locs_upsample`: The "y" (Northing) locations of the desired upsampled locations (e.g., corresponding to turbine locations) relative to the provided origin coordinates in meters. - `origin_lat`: The "origin" latitude corresponding to a `y_locs_upsample` location of 0. diff --git a/examples/inputs/03_download_small_nsrdb_wtk_solar_wind_example.py b/examples/inputs/03_download_small_nsrdb_wtk_solar_wind_example.py index c7c4c794..e64570f2 100644 --- a/examples/inputs/03_download_small_nsrdb_wtk_solar_wind_example.py +++ b/examples/inputs/03_download_small_nsrdb_wtk_solar_wind_example.py @@ -15,10 +15,7 @@ import os import sys -from hercules.resource.wind_solar_resource_downloader import ( - download_nsrdb_data, - download_wtk_data, -) +import hercules.resource as resource from matplotlib import pyplot as plt sys.path.append(".") @@ -30,7 +27,9 @@ def run_small_example(): # ARM Southern Great Plains coordinates target_lat = 36.607322 target_lon = -97.487643 - year = 2020 + # Use 2022 for solar because the NSRDB TMY dataset only includes 2022-2024 data, + # and we want to demonstrate using a non-default dataset. + year_solar = 2022 # Create data directory data_dir = "data/small_wtk_nsrdb_example" @@ -42,12 +41,13 @@ def run_small_example(): # Download a small sample of NSRDB data with plotting try: - nsrdb_data = download_nsrdb_data( + nsrdb_data = resource.nsrdb_downloader.download_nsrdb_data( target_lat=target_lat, target_lon=target_lon, - year=year, + year=year_solar, variables=["ghi"], # Just one variable - nsrdb_dataset_path="/nrel/nsrdb/conus", # Demonstrating using a non-default dataset + nsrdb_dataset_path="/nrel/nsrdb/GOES/tmy/v4.0.0", # Using a non-default dataset + nsrdb_filename_prefix="nsrdb_tmy-", # Downloading a typical meteorological year dataset coord_delta=0.05, # Small area output_dir=data_dir, filename_prefix="nsrdb_small_example", @@ -70,11 +70,14 @@ def run_small_example(): print("=" * 60) # Download a small sample of WTK data with plotting + # Use 2020 for wind because WTK data is only avaialable 2018-2020 + year_wind = 2020 + try: - wtk_data = download_wtk_data( + wtk_data = resource.wtk_downloader.download_wtk_data( target_lat=target_lat, target_lon=target_lon, - year=year, + year=year_wind, variables=["windspeed_100m"], # Just one variable coord_delta=0.05, # Small area output_dir=data_dir, diff --git a/examples/inputs/04_download_and_upsample_wtk_wind_example.py b/examples/inputs/04_download_and_upsample_wtk_wind_example.py index 1bc4cafd..6b6b60e9 100644 --- a/examples/inputs/04_download_and_upsample_wtk_wind_example.py +++ b/examples/inputs/04_download_and_upsample_wtk_wind_example.py @@ -21,7 +21,7 @@ import pandas as pd import utm from hercules.resource.upsample_wind_data import upsample_wind_data -from hercules.resource.wind_solar_resource_downloader import download_wtk_data +from hercules.resource.wtk_downloader import download_wtk_data from matplotlib import pyplot as plt sys.path.append(".") diff --git a/examples/inputs/05_download_small_openmeteo_solar_wind_example.py b/examples/inputs/05_download_small_openmeteo_solar_wind_example.py index 7ef4f5c3..1841701a 100644 --- a/examples/inputs/05_download_small_openmeteo_solar_wind_example.py +++ b/examples/inputs/05_download_small_openmeteo_solar_wind_example.py @@ -6,9 +6,7 @@ import sys import numpy as np -from hercules.resource.wind_solar_resource_downloader import ( - download_openmeteo_data, -) +from hercules.resource.openmeteo_downloader import download_openmeteo_data from matplotlib import pyplot as plt sys.path.append(".") diff --git a/examples/inputs/06_download_and_upsample_openmeteo_wind_example.py b/examples/inputs/06_download_and_upsample_openmeteo_wind_example.py index 4f462723..ae24e2f2 100644 --- a/examples/inputs/06_download_and_upsample_openmeteo_wind_example.py +++ b/examples/inputs/06_download_and_upsample_openmeteo_wind_example.py @@ -10,10 +10,8 @@ import numpy as np import pandas as pd import utm +from hercules.resource.openmeteo_downloader import download_openmeteo_data from hercules.resource.upsample_wind_data import upsample_wind_data -from hercules.resource.wind_solar_resource_downloader import ( - download_openmeteo_data, -) from matplotlib import pyplot as plt sys.path.append(".") diff --git a/hercules/resource/__init__.py b/hercules/resource/__init__.py new file mode 100644 index 00000000..b7dd98da --- /dev/null +++ b/hercules/resource/__init__.py @@ -0,0 +1,28 @@ +"""Resource downloading and processing utilities for Hercules. + +This subpackage contains modules for downloading and processing wind and +solar resource data from multiple sources (NSRDB, WTK, Open-Meteo) and +for upsampling wind data for use in Hercules simulations. +""" + +from .nsrdb_downloader import download_nsrdb_data +from .openmeteo_downloader import download_openmeteo_data +from .resource_utilities import ( + get_variable_colormap, + get_variable_label, + plot_spatial_map, + plot_timeseries, +) +from .upsample_wind_data import upsample_wind_data +from .wtk_downloader import download_wtk_data + +__all__ = [ + "download_nsrdb_data", + "download_wtk_data", + "download_openmeteo_data", + "plot_timeseries", + "plot_spatial_map", + "get_variable_label", + "get_variable_colormap", + "upsample_wind_data", +] diff --git a/hercules/resource/nsrdb_downloader.py b/hercules/resource/nsrdb_downloader.py new file mode 100644 index 00000000..ffd28155 --- /dev/null +++ b/hercules/resource/nsrdb_downloader.py @@ -0,0 +1,205 @@ +"""NSRDB solar irradiance data downloader. + +This module provides the `download_nsrdb_data` function, which was +previously defined in `wind_solar_resource_downloader`. The implementation +is moved here without functional changes to support a more modular +resource package layout. +""" + +import math +import os +import time +from typing import List, Optional + +import numpy as np +import pandas as pd +from rex import ResourceX + +from hercules.resource.resource_utilities import ( + plot_spatial_map, + plot_timeseries, +) +from hercules.utilities import hercules_float_type + + +def download_nsrdb_data( + target_lat: float, + target_lon: float, + year: Optional[int] = None, + start_date: Optional[str] = None, + end_date: Optional[str] = None, + variables: List[str] = ["ghi", "dni", "dhi", "wind_speed", "air_temperature"], + nsrdb_dataset_path="/nrel/nsrdb/GOES/conus/v4.0.0", + nsrdb_filename_prefix="nsrdb_conus_", + coord_delta: float = 0.1, + output_dir: str = "./data", + filename_prefix: str = "nsrdb", + plot_data: bool = False, + plot_type: str = "timeseries", +) -> dict: + """Download NSRDB solar irradiance data for a specified location and time period. + + This function requires an NLR API key, which can be obtained by visiting + https://developer.nlr.gov/signup/. After receiving your API key, you must make a configuration + file at ~/.hscfg containing the following: + + hs_endpoint = https://developer.nlr.gov/api/hsds + + hs_api_key = YOUR_API_KEY_GOES_HERE + + More information can be found at: https://github.com/NatLabRockies/hsds-examples. + + Args: + target_lat (float): Target latitude coordinate. + target_lon (float): Target longitude coordinate. + year (int, optional): Year of data to download (if using full year approach). + start_date (str, optional): Start date in format 'YYYY-MM-DD' (if using date range + approach). + end_date (str, optional): End date in format 'YYYY-MM-DD' (if using date range + approach). + variables (List[str], optional): List of variables to download. + Defaults to ['ghi', 'dni', 'dhi', 'wind_speed', 'air_temperature']. + nsrdb_dataset_path (str, optional): Path name of NSRDB dataset. Available datasets at + https://developer.nlr.gov/docs/solar/nsrdb/. You can see what datasets are available by + using h5pyd to list the contents of the NSRDB folder by following the instructions here: + https://github.com/NatLabRockies/rex/tree/main/examples/HSDS. + Defaults to "/nlr/nsrdb/GOES/conus/v4.0.0". + nsrdb_filename_prefix (str, optional): File name prefix for the NSRDB HDF5 files in the + format {nsrdb_filename_prefix}{year}.h5. Defaults to "nsrdb_conus_". + coord_delta (float, optional): Coordinate delta for bounding box. Defaults to 0.1 degrees. + output_dir (str, optional): Directory to save output files. Defaults to "./data". + filename_prefix (str, optional): Prefix for output filenames. Defaults to "nsrdb". + plot_data (bool, optional): Whether to create plots of the data. Defaults to False. + plot_type (str, optional): Type of plot to create: 'timeseries' or 'map'. + Defaults to "timeseries". + + Returns: + dict: Dictionary containing DataFrames for each variable and coordinates. + + Note: + Either 'year' OR both 'start_date' and 'end_date' must be provided. Date range approach + allows for more flexible time periods than full year. Plots are not automatically shown. + If plot_data is True, call matplotlib.pyplot.show() to display the figure. + """ + + os.makedirs(output_dir, exist_ok=True) + + if year is not None and (start_date is not None or end_date is not None): + raise ValueError( + "Please provide either 'year' OR both 'start_date' and 'end_date', not both approaches." + ) + + if year is None and (start_date is None or end_date is None): + raise ValueError("Please provide either 'year' OR both 'start_date' and 'end_date'.") + + if year is not None: + file_years = [year] + time_suffix = str(year) + time_description = f"year {year}" + else: + start_dt = pd.to_datetime(start_date) + end_dt = pd.to_datetime(end_date) + + if start_dt > end_dt: + raise ValueError("start_date must be before end_date") + + file_years = list(range(start_dt.year, end_dt.year + 1)) + time_suffix = f"{start_date}_to_{end_date}".replace("-", "") + time_description = f"period {start_date} to {end_date}" + + llcrn_lat = target_lat - coord_delta + llcrn_lon = target_lon - coord_delta + urcrn_lat = target_lat + coord_delta + urcrn_lon = target_lon + coord_delta + + print(f"Downloading NSRDB data for {time_description}") + print(f"Target coordinates: ({target_lat}, {target_lon})") + print(f"Bounding box: ({llcrn_lat}, {llcrn_lon}) to ({urcrn_lat}, {urcrn_lon})") + print(f"Variables: {variables}") + print(f"Years to process: {file_years}") + + t0 = time.time() + + data_dict: dict = {} + all_dataframes: dict = {var: [] for var in variables} + + try: + for file_year in file_years: + print(f"\nProcessing year {file_year}...") + fp = f"{nsrdb_dataset_path}/{nsrdb_filename_prefix}{file_year}.h5" + + with ResourceX(fp) as res: + for var in variables: + print(f" Downloading {var} for {file_year}...") + df_year = res.get_box_df( + var, + lat_lon_1=[llcrn_lat, llcrn_lon], + lat_lon_2=[urcrn_lat, urcrn_lon], + ) + + if start_date is not None and end_date is not None: + df_year = df_year.loc[start_date:end_date] + + all_dataframes[var].append(df_year) + + if "coordinates" not in data_dict: + gids = df_year.columns.values + coordinates = res.lat_lon[gids] + df_coords = pd.DataFrame(coordinates, index=gids, columns=["lat", "lon"]) + data_dict["coordinates"] = df_coords + + for var in variables: + if all_dataframes[var]: + print(f"Concatenating {var} data across {len(all_dataframes[var])} years...") + data_dict[var] = pd.concat(all_dataframes[var], axis=0).sort_index() + + for col in data_dict[var].columns: + if pd.api.types.is_numeric_dtype(data_dict[var][col]): + data_dict[var][col] = data_dict[var][col].astype(hercules_float_type) + + all_dataframes[var].clear() + + output_file = os.path.join( + output_dir, + f"{filename_prefix}_{var}_{time_suffix}.feather", + ) + data_dict[var].reset_index().to_feather(output_file) + print(f"Saved {var} data to {output_file}") + + coords_file = os.path.join(output_dir, f"{filename_prefix}_coords_{time_suffix}.feather") + data_dict["coordinates"].reset_index().to_feather(coords_file) + print(f"Saved coordinates to {coords_file}") + + except OSError as e: + print(f"Error downloading NSRDB data: {e}") + print("This could be caused by an invalid API key, NSRDB dataset path, or date range.") + raise + except Exception as e: + print(f"Error downloading NSRDB data: {e}") + raise + + total_time = (time.time() - t0) / 60 + decimal_part = math.modf(total_time)[0] * 60 + print( + "NSRDB download completed in " + f"{int(np.floor(total_time))}:{int(np.round(decimal_part, 0)):02d} minutes" + ) + + if plot_data and data_dict and "coordinates" in data_dict: + coordinates_array = data_dict["coordinates"][["lat", "lon"]].values + if plot_type == "timeseries": + plot_timeseries( + data_dict, + variables, + coordinates_array, + f"{filename_prefix} NSRDB Data", + ) + elif plot_type == "map": + plot_spatial_map( + data_dict, + variables, + coordinates_array, + f"{filename_prefix} NSRDB Data", + ) + + return data_dict diff --git a/hercules/resource/openmeteo_downloader.py b/hercules/resource/openmeteo_downloader.py new file mode 100644 index 00000000..d73a22e4 --- /dev/null +++ b/hercules/resource/openmeteo_downloader.py @@ -0,0 +1,279 @@ +"""Open-Meteo solar and wind data downloader. + +This module provides the `download_openmeteo_data` function, which was +previously defined in `wind_solar_resource_downloader`. The implementation +is moved here without functional changes to support a more modular resource +package layout. +""" + +import math +import os +import time +import warnings +from typing import List, Optional + +import numpy as np +import openmeteo_requests +import pandas as pd +import requests_cache +from retry_requests import retry + +from hercules.resource.resource_utilities import ( + plot_spatial_map, + plot_timeseries, +) +from hercules.utilities import hercules_float_type + + +def download_openmeteo_data( + target_lat: float | List[float], + target_lon: float | List[float], + year: Optional[int] = None, + start_date: Optional[str] = None, + end_date: Optional[str] = None, + variables: List[str] = [ + "wind_speed_80m", + "wind_direction_80m", + "temperature_2m", + "shortwave_radiation_instant", + "diffuse_radiation_instant", + "direct_normal_irradiance_instant", + ], + coord_delta: float = 0.1, + output_dir: str = "./data", + filename_prefix: str = "openmeteo", + plot_data: bool = False, + plot_type: str = "timeseries", + remove_duplicate_coords=True, +) -> dict: + """Download Open-Meteo weather data for specified location(s) and time period. + + Data are retrieved from the nearest weather grid cell to the requested locations. The grid cell + resolution varies with latitude, but at ~35 degrees latitude, the grid cell resolution is + approximately 0.027 degrees latitude (~2.4 km in the N-S direction) and 0.0333 degrees + longitude (~3.7km in the E-W direction). + + Args: + target_lat (float | List[float]): Target latitude coordinate or list of latitude + coordinates. + target_lon (float | List[float]): Target longitude coordinate or list of longitude + coordinates. + year (int, optional): Year of data to download (if using full year approach). + start_date (str, optional): Start date in format 'YYYY-MM-DD' (if using date range + approach). + end_date (str, optional): End date in format 'YYYY-MM-DD' (if using date range approach). + variables (List[str], optional): List of variables to download. Available options include + wind_speed_80m, wind_direction_80m, temperature_2m, shortwave_radiation_instant, + diffuse_radiation_instant, direct_normal_irradiance_instant. + coord_delta (float, optional): Not used for Open-Meteo (points specified individually), + kept for consistency. Defaults to 0.1. + output_dir (str, optional): Directory to save output files. Defaults to "./data". + filename_prefix (str, optional): Prefix for output filenames. Defaults to "openmeteo". + plot_data (bool, optional): Whether to create plots of the data. Defaults to False. + plot_type (str, optional): Type of plot to create: 'timeseries' or 'map'. + Defaults to "timeseries". + remove_duplicate_coords (bool, optional): Whether to remove data from duplicate coordinates. + Defaults to True. + + Returns: + dict: Dictionary containing DataFrames for each variable and coordinates. + + Note: + Either 'year' OR both 'start_date' and 'end_date' must be provided. Open-Meteo provides + point data (not gridded), so coord_delta is ignored. Available historical data typically + spans from 1940 to present. Plots are not automatically shown. If plot_data is True, call + matplotlib.pyplot.show() to display the figure. + """ + + os.makedirs(output_dir, exist_ok=True) + + if year is not None and (start_date is not None or end_date is not None): + raise ValueError( + "Please provide either 'year' OR both 'start_date' and 'end_date', not both approaches." + ) + + if year is None and (start_date is None or end_date is None): + raise ValueError("Please provide either 'year' OR both 'start_date' and 'end_date'.") + + if year is not None: + start_date = f"{year}-01-01" + end_date = f"{year}-12-31" + time_suffix = str(year) + time_description = f"year {year}" + else: + start_dt = pd.to_datetime(start_date) + end_dt = pd.to_datetime(end_date) + + if start_dt > end_dt: + raise ValueError("start_date must be before end_date") + + time_suffix = f"{start_date}_to_{end_date}".replace("-", "") + time_description = f"period {start_date} to {end_date}" + + print(f"Downloading Open-Meteo data for {time_description}") + print(f"Target coordinates: ({target_lat}, {target_lon})") + print(f"Variables: {variables}") + print("Note: Open-Meteo provides point data (coord_delta ignored)") + + variable_mapping = { + "wind_speed_80m": "wind_speed_80m", + "wind_direction_80m": "wind_direction_80m", + "temperature_2m": "temperature_2m", + "shortwave_radiation_instant": "shortwave_radiation_instant", + "diffuse_radiation_instant": "diffuse_radiation_instant", + "direct_normal_irradiance_instant": "direct_normal_irradiance_instant", + "ghi": "shortwave_radiation_instant", + "dni": "direct_normal_irradiance_instant", + "dhi": "diffuse_radiation_instant", + "windspeed_80m": "wind_speed_80m", + "winddirection_80m": "wind_direction_80m", + } + + mapped_variables: list[str] = [] + for var in variables: + if var in variable_mapping: + mapped_variables.append(variable_mapping[var]) + else: + print(f"Warning: Variable '{var}' not available in Open-Meteo. Skipping.") + + if not mapped_variables: + raise ValueError("No valid variables found for Open-Meteo download.") + + t0 = time.time() + + try: + cache_session = requests_cache.CachedSession(".cache", expire_after=3600) + retry_session = retry(cache_session, retries=5, backoff_factor=0.2) + openmeteo = openmeteo_requests.Client(session=retry_session) + + url = "https://historical-forecast-api.open-meteo.com/v1/forecast" + params = { + "latitude": target_lat, + "longitude": target_lon, + "start_date": start_date, + "end_date": end_date, + "minutely_15": mapped_variables, + "wind_speed_unit": "ms", + } + + try: + responses = openmeteo.weather_api(url, params=params) + print("API request successful with SSL verification.") + except Exception as e: + print(f"SSL verification failed: {str(e)[:100]}...") + print("Trying with SSL verification disabled...") + + warnings.filterwarnings("ignore", message="Unverified HTTPS request") + + cache_session_no_ssl = requests_cache.CachedSession(".cache", expire_after=3600) + cache_session_no_ssl.verify = False + retry_session_no_ssl = retry(cache_session_no_ssl, retries=5, backoff_factor=0.2) + openmeteo_no_ssl = openmeteo_requests.Client(session=retry_session_no_ssl) + + responses = openmeteo_no_ssl.weather_api(url, params=params) + print("API request successful with SSL verification disabled.") + + data_dict: dict = {} + data_dict["coordinates"] = pd.DataFrame() + + original_var_names: list[str] = [] + for var in mapped_variables: + original_var_name = None + for orig, mapped in variable_mapping.items(): + if mapped == var and orig in variables: + original_var_name = orig + break + + var_name = original_var_name if original_var_name else var + data_dict[var_name] = pd.DataFrame() + + original_var_names.append(var_name) + + for gid, response in enumerate(responses): + print(f"Coordinates retrieved: {response.Latitude()}°N {response.Longitude()}°E") + print(f"Elevation: {response.Elevation()} m asl") + + minutely_15 = response.Minutely15() + + date_range = pd.date_range( + start=pd.to_datetime(minutely_15.Time(), unit="s", utc=True), + end=pd.to_datetime(minutely_15.TimeEnd(), unit="s", utc=True), + freq=pd.Timedelta(seconds=minutely_15.Interval()), + inclusive="left", + ) + + df_coords = pd.DataFrame( + [[response.Latitude(), response.Longitude()]], + index=[gid], + columns=["lat", "lon"], + ) + data_dict["coordinates"] = pd.concat([data_dict["coordinates"], df_coords], axis=0) + + for i, var_name in enumerate(original_var_names): + var_data = minutely_15.Variables(i).ValuesAsNumpy() + + df_var = pd.DataFrame( + var_data.astype(hercules_float_type), + index=date_range, + columns=[gid], + ) + df_var.index.name = "time_index" + + data_dict[var_name] = pd.concat([data_dict[var_name], df_var], axis=1) + + if remove_duplicate_coords and (len(data_dict["coordinates"]) > 1): + duplicate_mask = data_dict["coordinates"].duplicated( + subset=["lat", "lon"], + keep="first", + ) + data_dict["coordinates"] = data_dict["coordinates"][~duplicate_mask] + + for var_name in original_var_names: + data_dict[var_name] = data_dict[var_name][ + [c for c in data_dict["coordinates"].index] + ] + data_dict[var_name].columns = range(len(data_dict["coordinates"])) + + data_dict["coordinates"] = data_dict["coordinates"].reset_index(drop=True) + + for var_name in original_var_names: + output_file = os.path.join( + output_dir, + f"{filename_prefix}_{var_name}_{time_suffix}.feather", + ) + data_dict[var_name].reset_index().to_feather(output_file) + print(f"Saved {var_name} data to {output_file}") + + coords_file = os.path.join(output_dir, f"{filename_prefix}_coords_{time_suffix}.feather") + data_dict["coordinates"].reset_index().to_feather(coords_file) + print(f"Saved coordinates to {coords_file}") + + except Exception as e: + print(f"Error downloading Open-Meteo data: {e}") + raise + + total_time = (time.time() - t0) / 60 + decimal_part = math.modf(total_time)[0] * 60 + print( + "Open-Meteo download completed in " + f"{int(np.floor(total_time))}:{int(np.round(decimal_part, 0)):02d} minutes" + ) + + if plot_data and data_dict and "coordinates" in data_dict: + coordinates_array = data_dict["coordinates"][["lat", "lon"]].values + if plot_type == "timeseries": + plot_timeseries( + data_dict, + variables, + coordinates_array, + f"{filename_prefix} Open-Meteo Data", + ) + elif plot_type == "map": + plot_spatial_map( + data_dict, + variables, + coordinates_array, + f"{filename_prefix} Open-Meteo Data", + ) + + return data_dict diff --git a/hercules/resource/resource_utilities.py b/hercules/resource/resource_utilities.py new file mode 100644 index 00000000..2f24cde1 --- /dev/null +++ b/hercules/resource/resource_utilities.py @@ -0,0 +1,232 @@ +"""Shared utilities for resource data downloading and visualization. + +This module provides common plotting and labeling functions used by the +NSRDB, WTK, and Open-Meteo resource downloaders. Additional shared +utilities (e.g., time parameter validation, data I/O) can be added in +future changes as the resource modules are further modularized. +""" + +import math +from typing import List + +import cartopy.crs as ccrs +import cartopy.feature as cfeature +import matplotlib.pyplot as plt +import numpy as np +from scipy.interpolate import griddata + + +def plot_timeseries(data_dict: dict, variables: List[str], coordinates: np.ndarray, title: str): + """Create time-series plots for the downloaded data. + + Args: + data_dict (dict): Dictionary containing DataFrames for each variable. + variables (List[str]): List of variables to plot. + coordinates (np.ndarray): Array of coordinates for the data points. + title (str): Title for the plots. + """ + + n_vars = len(variables) + if n_vars == 0: + return + + fig, axes = plt.subplots(n_vars, 1, figsize=(12, 4 * n_vars), sharex=True) + if n_vars == 1: + axes = [axes] + + for i, var in enumerate(variables): + if var in data_dict: + df = data_dict[var] + + for col in df.columns: + axes[i].plot(df.index, df[col], alpha=0.7, linewidth=0.8) + + axes[i].set_ylabel(get_variable_label(var)) + axes[i].set_title(f"{var.replace('_', ' ').title()}") + axes[i].grid(True, alpha=0.3) + + axes[-1].set_xlabel("Time") + plt.suptitle(f"{title} - Time Series", fontsize=14, fontweight="bold") + plt.tight_layout() + + +def plot_spatial_map(data_dict: dict, variables: List[str], coordinates: np.ndarray, title: str): + """Create spatial maps showing the mean values across the region. + + Args: + data_dict (dict): Dictionary containing DataFrames for each variable. + variables (List[str]): List of variables to plot. + coordinates (np.ndarray): Array of coordinates for the data points. + title (str): Title for the plots. + """ + + n_vars = len(variables) + if n_vars == 0: + return + + n_cols = min(2, n_vars) + n_rows = math.ceil(n_vars / n_cols) + + plt.figure(figsize=(8 * n_cols, 6 * n_rows)) + + for i, var in enumerate(variables): + if var in data_dict: + df = data_dict[var] + + lats = coordinates[:, 0] + lons = coordinates[:, 1] + + mean_values = df.mean(axis=0).values + + ax = plt.subplot(n_rows, n_cols, i + 1, projection=ccrs.PlateCarree()) + + ax.add_feature(cfeature.COASTLINE, alpha=0.5) + ax.add_feature(cfeature.BORDERS, linestyle=":", alpha=0.5) + ax.add_feature(cfeature.LAND, edgecolor="black", facecolor="lightgray", alpha=0.3) + ax.add_feature(cfeature.OCEAN, facecolor="lightblue", alpha=0.3) + + if len(lats) > 4: + grid_lon = np.linspace(min(lons), max(lons), 50) + grid_lat = np.linspace(min(lats), max(lats), 50) + grid_lon, grid_lat = np.meshgrid(grid_lon, grid_lat) + + try: + grid_values = griddata( + (lons, lats), + mean_values, + (grid_lon, grid_lat), + method="cubic", + ) + contour = ax.contourf( + grid_lon, + grid_lat, + grid_values, + levels=15, + cmap=get_variable_colormap(var), + transform=ccrs.PlateCarree(), + ) + plt.colorbar( + contour, + ax=ax, + orientation="vertical", + label=get_variable_label(var), + shrink=0.8, + ) + except Exception: + sc = ax.scatter( + lons, + lats, + c=mean_values, + s=100, + cmap=get_variable_colormap(var), + transform=ccrs.PlateCarree(), + ) + plt.colorbar( + sc, + ax=ax, + orientation="vertical", + label=get_variable_label(var), + shrink=0.8, + ) + else: + sc = ax.scatter( + lons, + lats, + c=mean_values, + s=100, + cmap=get_variable_colormap(var), + transform=ccrs.PlateCarree(), + ) + plt.colorbar( + sc, + ax=ax, + orientation="vertical", + label=get_variable_label(var), + shrink=0.8, + ) + + ax.scatter(lons, lats, c="black", s=20, transform=ccrs.PlateCarree(), alpha=0.8) + + ax.set_title(f"{var.replace('_', ' ').title()}") + + ax.set_xticks(np.linspace(min(lons), max(lons), 5)) + ax.set_yticks(np.linspace(min(lats), max(lats), 5)) + ax.set_xticklabels( + [f"{lon:.2f}°" for lon in np.linspace(min(lons), max(lons), 5)], + fontsize=8, + ) + ax.set_yticklabels( + [f"{lat:.2f}°" for lat in np.linspace(min(lats), max(lats), 5)], + fontsize=8, + ) + ax.set_xlabel("Longitude") + ax.set_ylabel("Latitude") + + plt.suptitle( + f"{title} - Spatial Distribution (Time-Averaged)", + fontsize=14, + fontweight="bold", + ) + plt.tight_layout() + + +def get_variable_label(variable: str) -> str: + """Get appropriate label and units for a variable. + + Args: + variable (str): Variable name. + + Returns: + str: Label with units for the variable. + """ + + labels = { + "ghi": "GHI (W/m²)", + "dni": "DNI (W/m²)", + "dhi": "DHI (W/m²)", + "windspeed_100m": "Wind Speed at 100m (m/s)", + "winddirection_100m": "Wind Direction at 100m (°)", + "turbulent_kinetic_energy_100m": "TKE at 100m (m²/s²)", + "temperature_100m": "Temperature at 100m (°C)", + "pressure_100m": "Pressure at 100m (Pa)", + "wind_speed_80m": "Wind Speed at 80m (m/s)", + "windspeed_80m": "Wind Speed at 80m (m/s)", + "wind_direction_80m": "Wind Direction at 80m (°)", + "winddirection_80m": "Wind Direction at 80m (°)", + "temperature_2m": "Temperature at 2m (°C)", + "shortwave_radiation_instant": "Shortwave Radiation (W/m²)", + "diffuse_radiation_instant": "Diffuse Radiation (W/m²)", + "direct_normal_irradiance_instant": "Direct Normal Irradiance (W/m²)", + } + return labels.get(variable, variable.replace("_", " ").title()) + + +def get_variable_colormap(variable: str) -> str: + """Get appropriate colormap for a variable. + + Args: + variable (str): Variable name. + + Returns: + str: Matplotlib colormap name for the variable. + """ + + colormaps = { + "ghi": "plasma", + "dni": "plasma", + "dhi": "plasma", + "windspeed_100m": "viridis", + "winddirection_100m": "hsv", + "turbulent_kinetic_energy_100m": "cividis", + "temperature_100m": "RdYlBu_r", + "pressure_100m": "coolwarm", + "wind_speed_80m": "viridis", + "windspeed_80m": "viridis", + "wind_direction_80m": "hsv", + "winddirection_80m": "hsv", + "temperature_2m": "RdYlBu_r", + "shortwave_radiation_instant": "plasma", + "diffuse_radiation_instant": "plasma", + "direct_normal_irradiance_instant": "plasma", + } + return colormaps.get(variable, "viridis") diff --git a/hercules/resource/upsample_wind_data.py b/hercules/resource/upsample_wind_data.py index c53314a6..4d2a2be4 100644 --- a/hercules/resource/upsample_wind_data.py +++ b/hercules/resource/upsample_wind_data.py @@ -11,10 +11,11 @@ import numpy as np import pandas as pd import utm -from hercules.utilities import hercules_complex_type, hercules_float_type from scipy.interpolate import CloughTocher2DInterpolator from shapely.geometry import MultiPoint +from hercules.utilities import hercules_complex_type, hercules_float_type + def _spatially_interpolate_wind_data( x_locs_orig: np.ndarray, @@ -230,8 +231,9 @@ def upsample_wind_data( ) -> dict: """Spatially interpolate and temporally upsample wind speed and direction data. - Processes wind files generated using wind data downloading functions in the - wind_solar_resource_downloader module (e.g., for the Wind Toolkit or Open-Meteo datasets). + Processes wind files generated using the wind data downloading functions in the + `hercules.resource.wtk_downloader` or `hercules.resource.openmeteo_downloader` modules + (e.g., for the Wind Toolkit or Open-Meteo datasets). Spatial interpolation is achieved using 2D Clough-Tocher interpolation. Upsampling is accomplished by simple Nyquist upsampling to create a smooth signal. Lastly, for the wind speeds, stochastic, uncorrelated turbulence generated using the Kaimal spectrum is added. diff --git a/hercules/resource/wind_solar_resource_downloader.py b/hercules/resource/wind_solar_resource_downloader.py deleted file mode 100644 index 51870963..00000000 --- a/hercules/resource/wind_solar_resource_downloader.py +++ /dev/null @@ -1,882 +0,0 @@ -""" -WTK, NSRDB, and Open-Meteo Data Downloader - -This script provides functions to download weather data from multiple sources: -- NLR's Wind Toolkit (WTK) for high-resolution wind data -- NLR's National Solar Radiation Database (NSRDB) for solar irradiance data -- Open-Meteo API for historical weather data with global coverage - -All three data sources provide consistent output formats (feather files) for easy integration -into renewable energy modeling workflows. - -Author: Andrew Kumler -Date: June 2025 -Updated: September 2025 (Added Open-Meteo support) -""" - -import math -import os -import time -import warnings -from typing import List, Optional - -import cartopy.crs as ccrs -import cartopy.feature as cfeature -import matplotlib.pyplot as plt -import numpy as np -import openmeteo_requests -import pandas as pd -import requests_cache -from hercules.utilities import hercules_float_type -from retry_requests import retry -from rex import ResourceX -from scipy.interpolate import griddata - - -def download_nsrdb_data( - target_lat: float, - target_lon: float, - year: Optional[int] = None, - start_date: Optional[str] = None, - end_date: Optional[str] = None, - variables: List[str] = ["ghi", "dni", "dhi", "wind_speed", "air_temperature"], - nsrdb_dataset_path="/nrel/nsrdb/GOES/conus/v4.0.0", - nsrdb_filename_prefix="nsrdb_conus", - coord_delta: float = 0.1, - output_dir: str = "./data", - filename_prefix: str = "nsrdb", - plot_data: bool = False, - plot_type: str = "timeseries", -) -> dict: - """Download NSRDB solar irradiance data for a specified location and time period. - - This function requires an NLR API key, which can be obtained by visiting - https://developer.nrel.gov/signup/. After receiving your API key, you must make a configuration - file at ~/.hscfg containing the following: - - hs_endpoint = https://developer.nrel.gov/api/hsds - - hs_api_key = YOUR_API_KEY_GOES_HERE - - More information can be found at: https://github.com/NREL/hsds-examples. - - Args: - target_lat (float): Target latitude coordinate. - target_lon (float): Target longitude coordinate. - year (int, optional): Year of data to download (if using full year approach). - start_date (str, optional): Start date in format 'YYYY-MM-DD' (if using date range - approach). - end_date (str, optional): End date in format 'YYYY-MM-DD' (if using date range - approach). - variables (List[str], optional): List of variables to download. - Defaults to ['ghi', 'dni', 'dhi', 'wind_speed', 'air_temperature']. - nsrdb_dataset_path (str, optional): Path name of NSRDB dataset. Available datasets at - https://developer.nrel.gov/docs/solar/nsrdb/. - Defaults to "/nrel/nsrdb/GOES/conus/v4.0.0". - nsrdb_filename_prefix (str, optional): File name prefix for the NSRDB HDF5 files in the - format {nsrdb_filename_prefix}_{year}.h5. Defaults to "nsrdb_conus". - coord_delta (float, optional): Coordinate delta for bounding box. Defaults to 0.1 degrees. - output_dir (str, optional): Directory to save output files. Defaults to "./data". - filename_prefix (str, optional): Prefix for output filenames. Defaults to "nsrdb". - plot_data (bool, optional): Whether to create plots of the data. Defaults to False. - plot_type (str, optional): Type of plot to create: 'timeseries' or 'map'. - Defaults to "timeseries". - - Returns: - dict: Dictionary containing DataFrames for each variable and coordinates. - - Note: - Either 'year' OR both 'start_date' and 'end_date' must be provided. Date range approach - allows for more flexible time periods than full year. Plots are not automatically shown. - If plot_data is True, call matplotlib.pyplot.show() to display the figure. - """ - - # Create output directory if it doesn't exist - os.makedirs(output_dir, exist_ok=True) - - # Validate input parameters - if year is not None and (start_date is not None or end_date is not None): - raise ValueError( - "Please provide either 'year' OR both 'start_date' and 'end_date', not both approaches." - ) - - if year is None and (start_date is None or end_date is None): - raise ValueError("Please provide either 'year' OR both 'start_date' and 'end_date'.") - - # Determine the approach and set up file paths and time info - if year is not None: - # Full year approach - file_years = [year] - time_suffix = str(year) - time_description = f"year {year}" - else: - # Date range approach - - start_dt = pd.to_datetime(start_date) - end_dt = pd.to_datetime(end_date) - - if start_dt > end_dt: - raise ValueError("start_date must be before end_date") - - # Get all years in the date range - file_years = list(range(start_dt.year, end_dt.year + 1)) - time_suffix = f"{start_date}_to_{end_date}".replace("-", "") - time_description = f"period {start_date} to {end_date}" - - # Create the bounding box - llcrn_lat = target_lat - coord_delta - llcrn_lon = target_lon - coord_delta - urcrn_lat = target_lat + coord_delta - urcrn_lon = target_lon + coord_delta - - print(f"Downloading NSRDB data for {time_description}") - print(f"Target coordinates: ({target_lat}, {target_lon})") - print(f"Bounding box: ({llcrn_lat}, {llcrn_lon}) to ({urcrn_lat}, {urcrn_lon})") - print(f"Variables: {variables}") - print(f"Years to process: {file_years}") - - t0 = time.time() - - data_dict = {} - all_dataframes = {var: [] for var in variables} - - try: - # Process each year in the range - for file_year in file_years: - print(f"\nProcessing year {file_year}...") - fp = f"{nsrdb_dataset_path}/{nsrdb_filename_prefix}_{file_year}.h5" - - with ResourceX(fp) as res: - # Download each variable for this year - for var in variables: - print(f" Downloading {var} for {file_year}...") - df_year = res.get_box_df( - var, lat_lon_1=[llcrn_lat, llcrn_lon], lat_lon_2=[urcrn_lat, urcrn_lon] - ) - - # Filter by date range if using date range approach - if start_date is not None and end_date is not None: - # Filter the DataFrame to the specified date range - df_year = df_year.loc[start_date:end_date] - - all_dataframes[var].append(df_year) - - # Get coordinates (only need to do this once) - if "coordinates" not in data_dict: - gids = df_year.columns.values - coordinates = res.lat_lon[gids] - df_coords = pd.DataFrame(coordinates, index=gids, columns=["lat", "lon"]) - data_dict["coordinates"] = df_coords - - # Concatenate all years for each variable - for var in variables: - if all_dataframes[var]: - print(f"Concatenating {var} data across {len(all_dataframes[var])} years...") - data_dict[var] = pd.concat(all_dataframes[var], axis=0).sort_index() - - # Convert numeric columns to float32 for memory efficiency - for col in data_dict[var].columns: - if pd.api.types.is_numeric_dtype(data_dict[var][col]): - data_dict[var][col] = data_dict[var][col].astype(hercules_float_type) - - # Clear intermediate DataFrames to free memory - all_dataframes[var].clear() - - # Save to feather format - output_file = os.path.join( - output_dir, f"{filename_prefix}_{var}_{time_suffix}.feather" - ) - data_dict[var].reset_index().to_feather(output_file) - print(f"Saved {var} data to {output_file}") - - # Save coordinates - coords_file = os.path.join(output_dir, f"{filename_prefix}_coords_{time_suffix}.feather") - data_dict["coordinates"].reset_index().to_feather(coords_file) - print(f"Saved coordinates to {coords_file}") - - except OSError as e: - print(f"Error downloading NSRDB data: {e}") - print("This could be caused by an invalid API key, NSRDB dataset path, or date range.") - raise - except Exception as e: - print(f"Error downloading NSRDB data: {e}") - raise - - total_time = (time.time() - t0) / 60 - decimal_part = math.modf(total_time)[0] * 60 - print( - "NSRDB download completed in " - f"{int(np.floor(total_time))}:{int(np.round(decimal_part, 0)):02d} minutes" - ) - - # Create plots if requested - if plot_data and data_dict and "coordinates" in data_dict: - coordinates_array = data_dict["coordinates"][["lat", "lon"]].values - if plot_type == "timeseries": - plot_timeseries( - data_dict, variables, coordinates_array, f"{filename_prefix} NSRDB Data" - ) - elif plot_type == "map": - plot_spatial_map( - data_dict, variables, coordinates_array, f"{filename_prefix} NSRDB Data" - ) - - return data_dict - - -def download_wtk_data( - target_lat: float, - target_lon: float, - year: Optional[int] = None, - start_date: Optional[str] = None, - end_date: Optional[str] = None, - variables: List[str] = ["windspeed_100m", "winddirection_100m"], - coord_delta: float = 0.1, - output_dir: str = "./data", - filename_prefix: str = "wtk", - plot_data: bool = False, - plot_type: str = "timeseries", -) -> dict: - """Download WTK wind data for a specified location and time period. - - This function requires an NLR API key, which can be obtained by visiting - https://developer.nrel.gov/signup/. After receiving your API key, you must make a configuration - file at ~/.hscfg containing the following: - - hs_endpoint = https://developer.nrel.gov/api/hsds - - hs_api_key = YOUR_API_KEY_GOES_HERE - - More information can be found at: https://github.com/NREL/hsds-examples. - - Args: - target_lat (float): Target latitude coordinate. - target_lon (float): Target longitude coordinate. - year (int, optional): Year of data to download (if using full year approach). - start_date (str, optional): Start date in format 'YYYY-MM-DD' (if using date range - approach). - end_date (str, optional): End date in format 'YYYY-MM-DD' (if using date range approach). - variables (List[str], optional): List of variables to download. - Defaults to ['windspeed_100m', 'winddirection_100m']. - coord_delta (float, optional): Coordinate delta for bounding box. Defaults to 0.1 degrees. - output_dir (str, optional): Directory to save output files. Defaults to "./data". - filename_prefix (str, optional): Prefix for output filenames. Defaults to "wtk". - plot_data (bool, optional): Whether to create plots of the data. Defaults to False. - plot_type (str, optional): Type of plot to create: 'timeseries' or 'map'. - Defaults to "timeseries". - - Returns: - dict: Dictionary containing DataFrames for each variable and coordinates. - - Note: - Either 'year' OR both 'start_date' and 'end_date' must be provided. Date range approach - allows for more flexible time periods than full year. Plots are not automatically shown. - If plot_data is True, call matplotlib.pyplot.show() to display the figure. - """ - - # Create output directory if it doesn't exist - os.makedirs(output_dir, exist_ok=True) - - # Validate input parameters - if year is not None and (start_date is not None or end_date is not None): - raise ValueError( - "Please provide either 'year' OR both 'start_date' and 'end_date', not both approaches." - ) - - if year is None and (start_date is None or end_date is None): - raise ValueError("Please provide either 'year' OR both 'start_date' and 'end_date'.") - - # Determine the approach and set up file paths and time info - if year is not None: - # Full year approach - file_years = [year] - time_suffix = str(year) - time_description = f"year {year}" - else: - # Date range approach - - start_dt = pd.to_datetime(start_date) - end_dt = pd.to_datetime(end_date) - - if start_dt > end_dt: - raise ValueError("start_date must be before end_date") - - # Get all years in the date range - file_years = list(range(start_dt.year, end_dt.year + 1)) - time_suffix = f"{start_date}_to_{end_date}".replace("-", "") - time_description = f"period {start_date} to {end_date}" - - # Create the bounding box - llcrn_lat = target_lat - coord_delta - llcrn_lon = target_lon - coord_delta - urcrn_lat = target_lat + coord_delta - urcrn_lon = target_lon + coord_delta - - print(f"Downloading WTK data for {time_description}") - print(f"Target coordinates: ({target_lat}, {target_lon})") - print(f"Bounding box: ({llcrn_lat}, {llcrn_lon}) to ({urcrn_lat}, {urcrn_lon})") - print(f"Variables: {variables}") - print(f"Years to process: {file_years}") - - t0 = time.time() - - data_dict = {} - all_dataframes = {var: [] for var in variables} - - try: - # Process each year in the range - for file_year in file_years: - print(f"\nProcessing year {file_year}...") - fp = f"/nrel/wtk/wtk-led/conus/v1.0.0/5min/wtk_conus_{file_year}.h5" - - with ResourceX(fp) as res: - # Download each variable for this year - for var in variables: - print(f" Downloading {var} for {file_year}...") - df_year = res.get_box_df( - var, lat_lon_1=[llcrn_lat, llcrn_lon], lat_lon_2=[urcrn_lat, urcrn_lon] - ) - - # Filter by date range if using date range approach - if start_date is not None and end_date is not None: - # Filter the DataFrame to the specified date range - df_year = df_year.loc[start_date:end_date] - - all_dataframes[var].append(df_year) - - # Get coordinates (only need to do this once) - if "coordinates" not in data_dict: - gids = df_year.columns.values - coordinates = res.lat_lon[gids] - df_coords = pd.DataFrame(coordinates, index=gids, columns=["lat", "lon"]) - data_dict["coordinates"] = df_coords - - # Concatenate all years for each variable - for var in variables: - if all_dataframes[var]: - print(f"Concatenating {var} data across {len(all_dataframes[var])} years...") - data_dict[var] = pd.concat(all_dataframes[var], axis=0).sort_index() - - # Convert numeric columns to float32 for memory efficiency - for col in data_dict[var].columns: - if pd.api.types.is_numeric_dtype(data_dict[var][col]): - data_dict[var][col] = data_dict[var][col].astype(hercules_float_type) - - # Clear intermediate DataFrames to free memory - all_dataframes[var].clear() - - # Save to feather format - output_file = os.path.join( - output_dir, f"{filename_prefix}_{var}_{time_suffix}.feather" - ) - data_dict[var].reset_index().to_feather(output_file) - print(f"Saved {var} data to {output_file}") - - # Save coordinates - coords_file = os.path.join(output_dir, f"{filename_prefix}_coords_{time_suffix}.feather") - data_dict["coordinates"].reset_index().to_feather(coords_file) - print(f"Saved coordinates to {coords_file}") - - except OSError as e: - print(f"Error downloading WTK data: {e}") - print("This could be caused by an invalid API key or date range.") - raise - except Exception as e: - print(f"Error downloading WTK data: {e}") - raise - - total_time = (time.time() - t0) / 60 - decimal_part = math.modf(total_time)[0] * 60 - print( - "WTK download completed in " - f"{int(np.floor(total_time))}:{int(np.round(decimal_part, 0)):02d} minutes" - ) - - # Create plots if requested - if plot_data and data_dict and "coordinates" in data_dict: - coordinates_array = data_dict["coordinates"][["lat", "lon"]].values - if plot_type == "timeseries": - plot_timeseries(data_dict, variables, coordinates_array, f"{filename_prefix} WTK Data") - elif plot_type == "map": - plot_spatial_map(data_dict, variables, coordinates_array, f"{filename_prefix} WTK Data") - - return data_dict - - -def download_openmeteo_data( - target_lat: float | List[float], - target_lon: float | List[float], - year: Optional[int] = None, - start_date: Optional[str] = None, - end_date: Optional[str] = None, - variables: List[str] = [ - "wind_speed_80m", - "wind_direction_80m", - "temperature_2m", - "shortwave_radiation_instant", - "diffuse_radiation_instant", - "direct_normal_irradiance_instant", - ], - coord_delta: float = 0.1, - output_dir: str = "./data", - filename_prefix: str = "openmeteo", - plot_data: bool = False, - plot_type: str = "timeseries", - remove_duplicate_coords=True, -) -> dict: - """Download Open-Meteo weather data for specified location(s) and time period. - - Data are retrieved from the nearest weather grid cell to the requested locations. The grid cell - resolution varies with latitude, but at ~35 degrees latitude, the grid cell resolution is - approximately 0.027 degrees latitude (~2.4 km in the N-S direction) and 0.0333 degrees - longitude (~3.7km in the E-W direction). - - Args: - target_lat (float | List[float]): Target latitude coordinate or list of latitude - coordinates. - target_lon (float | List[float]): Target longitude coordinate or list of longitude - coordinates. - year (int, optional): Year of data to download (if using full year approach). - start_date (str, optional): Start date in format 'YYYY-MM-DD' (if using date range - approach). - end_date (str, optional): End date in format 'YYYY-MM-DD' (if using date range approach). - variables (List[str], optional): List of variables to download. Available options include - wind_speed_80m, wind_direction_80m, temperature_2m, shortwave_radiation_instant, - diffuse_radiation_instant, direct_normal_irradiance_instant. - coord_delta (float, optional): Not used for Open-Meteo (points specified individually), - kept for consistency. Defaults to 0.1. - output_dir (str, optional): Directory to save output files. Defaults to "./data". - filename_prefix (str, optional): Prefix for output filenames. Defaults to "openmeteo". - plot_data (bool, optional): Whether to create plots of the data. Defaults to False. - plot_type (str, optional): Type of plot to create: 'timeseries' or 'map'. - Defaults to "timeseries". - remove_duplicate_coords (bool, optional): Whether to remove data from duplicate coordinates. - Defaults to True. - - Returns: - dict: Dictionary containing DataFrames for each variable and coordinates. - - Note: - Either 'year' OR both 'start_date' and 'end_date' must be provided. Open-Meteo provides - point data (not gridded), so coord_delta is ignored. Available historical data typically - spans from 1940 to present. Plots are not automatically shown. If plot_data is True, call - matplotlib.pyplot.show() to display the figure. - """ - - # Create output directory if it doesn't exist - os.makedirs(output_dir, exist_ok=True) - - # Validate input parameters - if year is not None and (start_date is not None or end_date is not None): - raise ValueError( - "Please provide either 'year' OR both 'start_date' and 'end_date', not both approaches." - ) - - if year is None and (start_date is None or end_date is None): - raise ValueError("Please provide either 'year' OR both 'start_date' and 'end_date'.") - - # Determine the approach and set up time info - if year is not None: - start_date = f"{year}-01-01" - end_date = f"{year}-12-31" - time_suffix = str(year) - time_description = f"year {year}" - else: - start_dt = pd.to_datetime(start_date) - end_dt = pd.to_datetime(end_date) - - if start_dt > end_dt: - raise ValueError("start_date must be before end_date") - - time_suffix = f"{start_date}_to_{end_date}".replace("-", "") - time_description = f"period {start_date} to {end_date}" - - print(f"Downloading Open-Meteo data for {time_description}") - print(f"Target coordinates: ({target_lat}, {target_lon})") - print(f"Variables: {variables}") - print("Note: Open-Meteo provides point data (coord_delta ignored)") - - # Map variable names to Open-Meteo API parameters - variable_mapping = { - "wind_speed_80m": "wind_speed_80m", - "wind_direction_80m": "wind_direction_80m", - "temperature_2m": "temperature_2m", - "shortwave_radiation_instant": "shortwave_radiation_instant", - "diffuse_radiation_instant": "diffuse_radiation_instant", - "direct_normal_irradiance_instant": "direct_normal_irradiance_instant", - "ghi": "shortwave_radiation_instant", # Alias for solar users - "dni": "direct_normal_irradiance_instant", # Alias for solar users - "dhi": "diffuse_radiation_instant", # Alias for solar users - "windspeed_80m": "wind_speed_80m", # Alias for wind users - "winddirection_80m": "wind_direction_80m", # Alias for wind users - } - - # Validate variables and map them - mapped_variables = [] - for var in variables: - if var in variable_mapping: - mapped_variables.append(variable_mapping[var]) - else: - print(f"Warning: Variable '{var}' not available in Open-Meteo. Skipping.") - - if not mapped_variables: - raise ValueError("No valid variables found for Open-Meteo download.") - - t0 = time.time() - - try: - # Setup the Open-Meteo API client with cache and retry on error - cache_session = requests_cache.CachedSession(".cache", expire_after=3600) - retry_session = retry(cache_session, retries=5, backoff_factor=0.2) - openmeteo = openmeteo_requests.Client(session=retry_session) - - # Setup API parameters - url = "https://historical-forecast-api.open-meteo.com/v1/forecast" - params = { - "latitude": target_lat, - "longitude": target_lon, - "start_date": start_date, - "end_date": end_date, - "minutely_15": mapped_variables, - "wind_speed_unit": "ms", - } - - # Try to make the API request with SSL verification first, then fallback to no verification - try: - responses = openmeteo.weather_api(url, params=params) - print("API request successful with SSL verification.") - except Exception as e: - print(f"SSL verification failed: {str(e)[:100]}...") - print("Trying with SSL verification disabled...") - - # Suppress SSL warnings since we're intentionally disabling verification - warnings.filterwarnings("ignore", message="Unverified HTTPS request") - - # Create a new session with SSL verification disabled - cache_session_no_ssl = requests_cache.CachedSession(".cache", expire_after=3600) - cache_session_no_ssl.verify = False - retry_session_no_ssl = retry(cache_session_no_ssl, retries=5, backoff_factor=0.2) - openmeteo_no_ssl = openmeteo_requests.Client(session=retry_session_no_ssl) - - responses = openmeteo_no_ssl.weather_api(url, params=params) - print("API request successful with SSL verification disabled.") - - # Create data dictionary in the same format as WTK/NSRDB and initialize dataframes - data_dict = {} - data_dict["coordinates"] = pd.DataFrame() - - # Initialize for each variable - original_var_names = [] - for var in mapped_variables: - # Use original variable name (not mapped name) for consistency - original_var_name = None - for orig, mapped in variable_mapping.items(): - if mapped == var and orig in variables: - original_var_name = orig - break - - var_name = original_var_name if original_var_name else var - data_dict[var_name] = pd.DataFrame() - - original_var_names.append(var_name) - - # Process the responses for each lat/lon - for gid, response in enumerate(responses): - print(f"Coordinates retrieved: {response.Latitude()}°N {response.Longitude()}°E") - print(f"Elevation: {response.Elevation()} m asl") - - # Process minutely_15 data - minutely_15 = response.Minutely15() - - # Create the date range - date_range = pd.date_range( - start=pd.to_datetime(minutely_15.Time(), unit="s", utc=True), - end=pd.to_datetime(minutely_15.TimeEnd(), unit="s", utc=True), - freq=pd.Timedelta(seconds=minutely_15.Interval()), - inclusive="left", - ) - - # Create coordinates DataFrame (single point, but match the format) - # Use a synthetic GID (grid ID) to match WTK/NSRDB format - df_coords = pd.DataFrame( - [[response.Latitude(), response.Longitude()]], index=[gid], columns=["lat", "lon"] - ) - data_dict["coordinates"] = pd.concat([data_dict["coordinates"], df_coords], axis=0) - - # Process each requested variable - for i, var_name in enumerate(original_var_names): - var_data = minutely_15.Variables(i).ValuesAsNumpy() - - # Create DataFrame with same structure as WTK/NSRDB (datetime index, gid columns) - # Convert to float32 for memory efficiency - df_var = pd.DataFrame( - var_data.astype(hercules_float_type), index=date_range, columns=[gid] - ) - df_var.index.name = "time_index" - - data_dict[var_name] = pd.concat([data_dict[var_name], df_var], axis=1) - - # Check for duplicates, remove if any exist, and rename locations indices consecutively - if remove_duplicate_coords & (len(data_dict["coordinates"]) > 1): - duplicate_mask = data_dict["coordinates"].duplicated( - subset=["lat", "lon"], keep="first" - ) - data_dict["coordinates"] = data_dict["coordinates"][~duplicate_mask] - - for var_name in original_var_names: - data_dict[var_name] = data_dict[var_name][ - [c for c in data_dict["coordinates"].index] - ] - data_dict[var_name].columns = range(len(data_dict["coordinates"])) - - data_dict["coordinates"] = data_dict["coordinates"].reset_index(drop=True) - - # Save variables to feather format - for var_name in original_var_names: - output_file = os.path.join( - output_dir, f"{filename_prefix}_{var_name}_{time_suffix}.feather" - ) - data_dict[var_name].reset_index().to_feather(output_file) - print(f"Saved {var_name} data to {output_file}") - - # Save coordinates - coords_file = os.path.join(output_dir, f"{filename_prefix}_coords_{time_suffix}.feather") - data_dict["coordinates"].reset_index().to_feather(coords_file) - print(f"Saved coordinates to {coords_file}") - - except Exception as e: - print(f"Error downloading Open-Meteo data: {e}") - raise - - total_time = (time.time() - t0) / 60 - decimal_part = math.modf(total_time)[0] * 60 - print( - "Open-Meteo download completed in " - f"{int(np.floor(total_time))}:{int(np.round(decimal_part, 0)):02d} minutes" - ) - - # Create plots if requested - if plot_data and data_dict and "coordinates" in data_dict: - coordinates_array = data_dict["coordinates"][["lat", "lon"]].values - if plot_type == "timeseries": - plot_timeseries( - data_dict, variables, coordinates_array, f"{filename_prefix} Open-Meteo Data" - ) - elif plot_type == "map": - plot_spatial_map( - data_dict, variables, coordinates_array, f"{filename_prefix} Open-Meteo Data" - ) - - return data_dict - - -def plot_timeseries(data_dict: dict, variables: List[str], coordinates: np.ndarray, title: str): - """Create time-series plots for the downloaded data. - - Args: - data_dict (dict): Dictionary containing DataFrames for each variable. - variables (List[str]): List of variables to plot. - coordinates (np.ndarray): Array of coordinates for the data points. - title (str): Title for the plots. - """ - - n_vars = len(variables) - if n_vars == 0: - return - - # Create subplots based on number of variables - fig, axes = plt.subplots(n_vars, 1, figsize=(12, 4 * n_vars), sharex=True) - if n_vars == 1: - axes = [axes] - - for i, var in enumerate(variables): - if var in data_dict: - df = data_dict[var] - - # Plot all time series (one for each spatial point) - for col in df.columns: - axes[i].plot(df.index, df[col], alpha=0.7, linewidth=0.8) - - axes[i].set_ylabel(get_variable_label(var)) - axes[i].set_title(f"{var.replace('_', ' ').title()}") - axes[i].grid(True, alpha=0.3) - - axes[-1].set_xlabel("Time") - plt.suptitle(f"{title} - Time Series", fontsize=14, fontweight="bold") - plt.tight_layout() - - -def plot_spatial_map(data_dict: dict, variables: List[str], coordinates: np.ndarray, title: str): - """Create spatial maps showing the mean values across the region. - - Args: - data_dict (dict): Dictionary containing DataFrames for each variable. - variables (List[str]): List of variables to plot. - coordinates (np.ndarray): Array of coordinates for the data points. - title (str): Title for the plots. - """ - - n_vars = len(variables) - if n_vars == 0: - return - - # Calculate subplot layout - n_cols = min(2, n_vars) - n_rows = math.ceil(n_vars / n_cols) - - plt.figure(figsize=(8 * n_cols, 6 * n_rows)) - - for i, var in enumerate(variables): - if var in data_dict: - df = data_dict[var] - - # Extract coordinates - lats = coordinates[:, 0] - lons = coordinates[:, 1] - - # Calculate mean values across time - mean_values = df.mean(axis=0).values - - # Create subplot with map projection - ax = plt.subplot(n_rows, n_cols, i + 1, projection=ccrs.PlateCarree()) - - # Add geographic features - ax.add_feature(cfeature.COASTLINE, alpha=0.5) - ax.add_feature(cfeature.BORDERS, linestyle=":", alpha=0.5) - ax.add_feature(cfeature.LAND, edgecolor="black", facecolor="lightgray", alpha=0.3) - ax.add_feature(cfeature.OCEAN, facecolor="lightblue", alpha=0.3) - - # Create interpolated grid for smoother visualization - if len(lats) > 4: # Only interpolate if we have enough points - grid_lon = np.linspace(min(lons), max(lons), 50) - grid_lat = np.linspace(min(lats), max(lats), 50) - grid_lon, grid_lat = np.meshgrid(grid_lon, grid_lat) - - try: - grid_values = griddata( - (lons, lats), mean_values, (grid_lon, grid_lat), method="cubic" - ) - contour = ax.contourf( - grid_lon, - grid_lat, - grid_values, - levels=15, - cmap=get_variable_colormap(var), - transform=ccrs.PlateCarree(), - ) - plt.colorbar( - contour, - ax=ax, - orientation="vertical", - label=get_variable_label(var), - shrink=0.8, - ) - except Exception: - # Fall back to scatter plot if interpolation fails - sc = ax.scatter( - lons, - lats, - c=mean_values, - s=100, - cmap=get_variable_colormap(var), - transform=ccrs.PlateCarree(), - ) - plt.colorbar( - sc, ax=ax, orientation="vertical", label=get_variable_label(var), shrink=0.8 - ) - else: - # Use scatter plot for few points - sc = ax.scatter( - lons, - lats, - c=mean_values, - s=100, - cmap=get_variable_colormap(var), - transform=ccrs.PlateCarree(), - ) - plt.colorbar( - sc, ax=ax, orientation="vertical", label=get_variable_label(var), shrink=0.8 - ) - - # Add points on top - ax.scatter(lons, lats, c="black", s=20, transform=ccrs.PlateCarree(), alpha=0.8) - - # Set title - ax.set_title(f"{var.replace('_', ' ').title()}") - - # Set coordinate labels - ax.set_xticks(np.linspace(min(lons), max(lons), 5)) - ax.set_yticks(np.linspace(min(lats), max(lats), 5)) - ax.set_xticklabels( - [f"{lon:.2f}°" for lon in np.linspace(min(lons), max(lons), 5)], fontsize=8 - ) - ax.set_yticklabels( - [f"{lat:.2f}°" for lat in np.linspace(min(lats), max(lats), 5)], fontsize=8 - ) - ax.set_xlabel("Longitude") - ax.set_ylabel("Latitude") - - plt.suptitle(f"{title} - Spatial Distribution (Time-Averaged)", fontsize=14, fontweight="bold") - plt.tight_layout() - - -def get_variable_label(variable: str) -> str: - """Get appropriate label and units for a variable. - - Args: - variable (str): Variable name. - - Returns: - str: Label with units for the variable. - """ - labels = { - "ghi": "GHI (W/m²)", - "dni": "DNI (W/m²)", - "dhi": "DHI (W/m²)", - "windspeed_100m": "Wind Speed at 100m (m/s)", - "winddirection_100m": "Wind Direction at 100m (°)", - "turbulent_kinetic_energy_100m": "TKE at 100m (m²/s²)", - "temperature_100m": "Temperature at 100m (°C)", - "pressure_100m": "Pressure at 100m (Pa)", - # Open-Meteo variables - "wind_speed_80m": "Wind Speed at 80m (m/s)", - "windspeed_80m": "Wind Speed at 80m (m/s)", - "wind_direction_80m": "Wind Direction at 80m (m/s)", - "winddirection_80m": "Wind Direction at 80m (m/s)", - "temperature_2m": "Temperature at 2m (°C)", - "shortwave_radiation_instant": "Shortwave Radiation (W/m²)", - "diffuse_radiation_instant": "Diffuse Radiation (W/m²)", - "direct_normal_irradiance_instant": "Direct Normal Irradiance (W/m²)", - } - return labels.get(variable, variable.replace("_", " ").title()) - - -def get_variable_colormap(variable: str) -> str: - """Get appropriate colormap for a variable. - - Args: - variable (str): Variable name. - - Returns: - str: Matplotlib colormap name for the variable. - """ - colormaps = { - "ghi": "plasma", - "dni": "plasma", - "dhi": "plasma", - "windspeed_100m": "viridis", - "winddirection_100m": "hsv", - "turbulent_kinetic_energy_100m": "cividis", - "temperature_100m": "RdYlBu_r", - "pressure_100m": "coolwarm", - # Open-Meteo variables - "wind_speed_80m": "viridis", - "windspeed_80m": "viridis", - "wind_direction_80m": "hsv", - "winddirection_80m": "hsv", - "temperature_2m": "RdYlBu_r", - "shortwave_radiation_instant": "plasma", - "diffuse_radiation_instant": "plasma", - "direct_normal_irradiance_instant": "plasma", - } - return colormaps.get(variable, "viridis") diff --git a/hercules/resource/wtk_downloader.py b/hercules/resource/wtk_downloader.py new file mode 100644 index 00000000..ecba44ea --- /dev/null +++ b/hercules/resource/wtk_downloader.py @@ -0,0 +1,195 @@ +"""WIND Toolkit (WTK) wind data downloader. + +This module provides the `download_wtk_data` function, which was previously +defined in `wind_solar_resource_downloader`. The implementation is moved +here without functional changes to support a more modular resource package +layout. +""" + +import math +import os +import time +from typing import List, Optional + +import numpy as np +import pandas as pd +from rex import ResourceX + +from hercules.resource.resource_utilities import ( + plot_spatial_map, + plot_timeseries, +) +from hercules.utilities import hercules_float_type + + +def download_wtk_data( + target_lat: float, + target_lon: float, + year: Optional[int] = None, + start_date: Optional[str] = None, + end_date: Optional[str] = None, + variables: List[str] = ["windspeed_100m", "winddirection_100m"], + coord_delta: float = 0.1, + output_dir: str = "./data", + filename_prefix: str = "wtk", + plot_data: bool = False, + plot_type: str = "timeseries", +) -> dict: + """Download WTK wind data for a specified location and time period. + + This function requires an NLR API key, which can be obtained by visiting + https://developer.nrel.gov/signup/. After receiving your API key, you must make a configuration + file at ~/.hscfg containing the following: + + hs_endpoint = https://developer.nrel.gov/api/hsds + + hs_api_key = YOUR_API_KEY_GOES_HERE + + More information can be found at: https://github.com/NREL/hsds-examples. + + Args: + target_lat (float): Target latitude coordinate. + target_lon (float): Target longitude coordinate. + year (int, optional): Year of data to download (if using full year approach). + start_date (str, optional): Start date in format 'YYYY-MM-DD' (if using date range + approach). + end_date (str, optional): End date in format 'YYYY-MM-DD' (if using date range approach). + variables (List[str], optional): List of variables to download. + Defaults to ['windspeed_100m', 'winddirection_100m']. + coord_delta (float, optional): Coordinate delta for bounding box. Defaults to 0.1 degrees. + output_dir (str, optional): Directory to save output files. Defaults to "./data". + filename_prefix (str, optional): Prefix for output filenames. Defaults to "wtk". + plot_data (bool, optional): Whether to create plots of the data. Defaults to False. + plot_type (str, optional): Type of plot to create: 'timeseries' or 'map'. + Defaults to "timeseries". + + Returns: + dict: Dictionary containing DataFrames for each variable and coordinates. + + Note: + Either 'year' OR both 'start_date' and 'end_date' must be provided. Date range approach + allows for more flexible time periods than full year. Plots are not automatically shown. + If plot_data is True, call matplotlib.pyplot.show() to display the figure. + """ + + os.makedirs(output_dir, exist_ok=True) + + if year is not None and (start_date is not None or end_date is not None): + raise ValueError( + "Please provide either 'year' OR both 'start_date' and 'end_date', not both approaches." + ) + + if year is None and (start_date is None or end_date is None): + raise ValueError("Please provide either 'year' OR both 'start_date' and 'end_date'.") + + if year is not None: + file_years = [year] + time_suffix = str(year) + time_description = f"year {year}" + else: + start_dt = pd.to_datetime(start_date) + end_dt = pd.to_datetime(end_date) + + if start_dt > end_dt: + raise ValueError("start_date must be before end_date") + + file_years = list(range(start_dt.year, end_dt.year + 1)) + time_suffix = f"{start_date}_to_{end_date}".replace("-", "") + time_description = f"period {start_date} to {end_date}" + + llcrn_lat = target_lat - coord_delta + llcrn_lon = target_lon - coord_delta + urcrn_lat = target_lat + coord_delta + urcrn_lon = target_lon + coord_delta + + print(f"Downloading WTK data for {time_description}") + print(f"Target coordinates: ({target_lat}, {target_lon})") + print(f"Bounding box: ({llcrn_lat}, {llcrn_lon}) to ({urcrn_lat}, {urcrn_lon})") + print(f"Variables: {variables}") + print(f"Years to process: {file_years}") + + t0 = time.time() + + data_dict: dict = {} + all_dataframes: dict = {var: [] for var in variables} + + try: + for file_year in file_years: + print(f"\nProcessing year {file_year}...") + fp = f"/nrel/wtk/wtk-led/conus/v1.0.0/5min/wtk_conus_{file_year}.h5" + + with ResourceX(fp) as res: + for var in variables: + print(f" Downloading {var} for {file_year}...") + df_year = res.get_box_df( + var, + lat_lon_1=[llcrn_lat, llcrn_lon], + lat_lon_2=[urcrn_lat, urcrn_lon], + ) + + if start_date is not None and end_date is not None: + df_year = df_year.loc[start_date:end_date] + + all_dataframes[var].append(df_year) + + if "coordinates" not in data_dict: + gids = df_year.columns.values + coordinates = res.lat_lon[gids] + df_coords = pd.DataFrame(coordinates, index=gids, columns=["lat", "lon"]) + data_dict["coordinates"] = df_coords + + for var in variables: + if all_dataframes[var]: + print(f"Concatenating {var} data across {len(all_dataframes[var])} years...") + data_dict[var] = pd.concat(all_dataframes[var], axis=0).sort_index() + + for col in data_dict[var].columns: + if pd.api.types.is_numeric_dtype(data_dict[var][col]): + data_dict[var][col] = data_dict[var][col].astype(hercules_float_type) + + all_dataframes[var].clear() + + output_file = os.path.join( + output_dir, + f"{filename_prefix}_{var}_{time_suffix}.feather", + ) + data_dict[var].reset_index().to_feather(output_file) + print(f"Saved {var} data to {output_file}") + + coords_file = os.path.join(output_dir, f"{filename_prefix}_coords_{time_suffix}.feather") + data_dict["coordinates"].reset_index().to_feather(coords_file) + print(f"Saved coordinates to {coords_file}") + + except OSError as e: + print(f"Error downloading WTK data: {e}") + print("This could be caused by an invalid API key or date range.") + raise + except Exception as e: + print(f"Error downloading WTK data: {e}") + raise + + total_time = (time.time() - t0) / 60 + decimal_part = math.modf(total_time)[0] * 60 + print( + "WTK download completed in " + f"{int(np.floor(total_time))}:{int(np.round(decimal_part, 0)):02d} minutes" + ) + + if plot_data and data_dict and "coordinates" in data_dict: + coordinates_array = data_dict["coordinates"][["lat", "lon"]].values + if plot_type == "timeseries": + plot_timeseries( + data_dict, + variables, + coordinates_array, + f"{filename_prefix} WTK Data", + ) + elif plot_type == "map": + plot_spatial_map( + data_dict, + variables, + coordinates_array, + f"{filename_prefix} WTK Data", + ) + + return data_dict diff --git a/pyproject.toml b/pyproject.toml index 2c80c4e3..44b3d87e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -37,7 +37,7 @@ dependencies = [ "polars~=1.0", "pyarrow", "h5py~=3.10", -"NREL-rex[hsds]", +"NLR-rex[hsds]", "utm", "cartopy", "openmeteo_requests",