diff --git a/docs/docs/LAI/intro.md b/docs/docs/LAI/intro.md index 704436d..962ce6e 100644 --- a/docs/docs/LAI/intro.md +++ b/docs/docs/LAI/intro.md @@ -1,7 +1,12 @@ # VeRCYe LAI Generation -VeRCYE is designed to identify best matching APSIM simulations with actual remotely sensed Leaf Area Index (LAI) data. This documentation covers the LAI data generation pipeline, which is a separate but essential component of the VeRCYE workflow. The LAI is not a true remotely sensed value, but is rather estimated from a -combination of bands using neural networks that were converted to Pytorch from the [Leaf Toolbox](https://github.com/rfernand387/LEAF-Toolbox). +VeRCYE is designed to identify best matching APSIM simulations based on actual remotely sensed Leaf Area Index (LAI) data. This documentation covers the LAI data generation pipeline, which is a separate but essential component of the VeRCYE workflow. The LAI is not a true remotely sensed value, but is rather estimated from a +combination of bands using neural networks that were converted to Pytorch from the [Leaf Toolbox](https://github.com/rfernand387/LEAF-Toolbox) that was developed by Fernandez et al. + +Two different LAI models from the Leaf Toolbox were re-implemented more efficiently with Pytorch: +- The 20m general purpose and strongly validated model. +- The 10m model, which should be considered experimental at the moment, as it was not yet validated in depth. Check the LEAF toolbox for details to the models. + The LAI creation pipeline described in this documentation transforms Sentinel-2 satellite imagery into LAI estimates that can be used by the VeRCYE algorithm. Currently only Sentinel-2 data is supported, but a version using Harmonized Landsat-Sentinel Imagery is planned. @@ -16,7 +21,7 @@ We provide two methods for exporting remotely sensed imagery and deriving LAI pr **A:** Exporting RS imagery from **Google Earth Engine** -**B:** Downloading RS imagery through an open source **STAC catalog** and data hosted on AWS. +**B:** Downloading RS imagery through an open source **STAC catalog** and data hosted on AWS/Azure. **C:** Using your own LAI data @@ -28,29 +33,29 @@ Google Drive or a Google Cloud Storage Bucket, from which it can be downloaded t **Pro:** -- Directly export mosaics with bands resampled to the same resolution and CRS +- Directly export mosaics with bands resampled to the same resolution and CRS (EPSG:4326). - Strong cloud masking algorithm (Google Cloud Score Plus) **Con** -- Slow for large regions, due to limited number of parallel export processes +- Very slow for large regions, due to limited number of parallel export processes - Exported data is exported to either Google Drive (Free) or Google Cloud Storage (Fees apply), and downloaded from there, but requires more manual setup which might be tedious especially on remote systems. -### B: STAC & AWS Export -This approach queries a STAC catalog to identify all Sentinel-2 Tiles intersecting the region of interest within the timespan. The individual tiles are then downloaded from an AWS bucket. +### B: STAC Based Export +This approach queries a STAC catalog to identify all Sentinel-2 Tiles intersecting the region of interest within the timespan. The individual tiles are then downloaded from an AWS or Microsoft Planetary Computer (Azure) bucket. -You can choose between selecting data hosted by Element84 on AWS (`Sentinel-2 L2A Colection 1` ), in which all historial data was processed using `Processing Baseline 5.0`, however this collection is currently missing large timespans (e.g 2022,2023). Alternativeley, you can use the Microsoft Planetary Computer (`Sentinel-2 L2A`). +You can choose between selecting data hosted by Element84 on AWS (`Sentinel-2 L2A Colection 1` ), in which all historial data was processed using `Processing Baseline 5.0`, however this collection is currently missing large timespans (e.g 2022,2023). Alternativeley, as a default we use the Microsoft Planetary Computer (`Sentinel-2 L2A`). The data is downloaded and processed to match the `S2_SR_Harmonized` collection in GEE. **Pro**: - Very fast download in HPC environment due to high level of parallelism - Completely free download of data -- Harmonized to data in `Sentinel-2 L2A Colection 1` - all data processed using modern Baseline 5.0. **Con**: -- Less Accurate Cloudmask in comparison to Google Cloud Score Plus. Cloud mask is based on SCL + S2-Cloudless. -- As of May 27th 2025, `Sentinel-2 L2A Colection 1` does not contain data for 2022 and parts of 2023. According to ESA this backfill is scheduled to be completed until Q2 2025. +- Less Accurate Cloudmask in comparison to Google Cloud Score Plus. Cloud mask is based on SCL. +- As of May 27th 2025, `Sentinel-2 L2A Colection 1` does not contain data for 2022 and parts of 2023. According to ESA this backfill is scheduled to be completed until Q2 2025, however this needs to be validated. +- Sentinel-2 data in MPC has inconsistent nodata values (should be all 0, but sometimes no nodata value is set), which might hint to differences in processing aswell. ### C: Bring your own LAI data If you already have LAI data or are planning to generate it with a different pipeline this is also possible. Simply ensure the file names match our required format. All files ned to be located in a single folder and the filename needs to satisfy the following format: @@ -61,4 +66,4 @@ If you already have LAI data or are planning to generate it with a different pip - The `date` should be in the YYYY-MM-DD format. - The `file extension` can be either `.vrt` or `.tif` -Additionally, you will have to ensure all your LAI files have exactly the same resolution and CRS and match the scale and offset as used in our inbuilt imagery. +Additionally, you will have to ensure all your LAI files have exactly the same resolution and CRS and extent and match the scale and offset as used in our inbuilt imagery. Currently all the imagery is scaled by multiplying with 0.001 to convert the int data to float. diff --git a/docs/docs/LAI/running.md b/docs/docs/LAI/running.md index fa3462b..a55fef9 100644 --- a/docs/docs/LAI/running.md +++ b/docs/docs/LAI/running.md @@ -2,7 +2,7 @@ The pipeline produces LAI products for VERCYe and is intended for scaled deployment on servers or HPC with minimal human intervention. The Sentinel-2 LAI model is by Fernandes et al. from https://github.com/rfernand387/LEAF-Toolbox. We provide two methods for exporting remotely sensed imagery and deriving LAI products: - **A:** Exporting RS imagery from **Google Earth Engine** (slow, more setup required, better cloudmasks) -- **B: **Downloading RS imagery through an open source **STAC catalog** and data hosted on AWS or MPC (fast, inferior cloudmasking). +- **B: **Downloading RS imagery through an open source **STAC catalog** and data hosted on AWS or MPC/Azure (fast, inferior cloudmasking). The individual advantages are detailed in the [introduction](intro.md#lai-generation). This document details the instruction on how to download remotely sensed imagery and derive LAI data. For both approaches we provide pipelines that simply require specifying a configuration and then handle the complete process from exporting and downloading remotely sensed imagery to cloudmasking and deriving LAI estimates. Details of the invididual components of the pipelines can be found in the readme of the corresponding folders. @@ -151,7 +151,7 @@ keep_imagery: false - `date_ranges`: Define multiple seasonal or arbitrary time windows to process (in YYY-MM-DD format). -- `resolution`: Spatial resolution in meters. (Typically 10 or 20) +- `resolution`: Spatial resolution in meters. (Typically 10 or 20). When specifying 10m - a different model for 10m resolution is used, which is not yet validated in depth! - `geojson-path`: Path to your regions of interest geojson. Will create a bounding box for each geometry and query the intersecting tiles. - `out_dir`: Output directory for all generated data. - `region_out_prefix`: Prefix for the output VRT filenames - typically the name of the GeoJSON region. diff --git a/docs/docs/Vercye/apsim.md b/docs/docs/Vercye/apsim.md index 7ef56ed..44b4920 100644 --- a/docs/docs/Vercye/apsim.md +++ b/docs/docs/Vercye/apsim.md @@ -1,9 +1,6 @@ -APSIM is an agricultural modelling framework that can simulate a variety of biophysical processes for differen crops. +APSIM is an agricultural modelling framework that can simulate a variety of biophysical processes for different crops. -VeRCYe relies on the APSIMX framework for generating various simulations in a realistic range. - - -**TODO needs to be updates - currently only copied Mark instructions, but they are incomplete** +VeRCYe relies on the APSIM Next-Gen framework for generating various simulations in a realistic range of input parameters (management practices, soils, water, etc. ). ### Installing APSIMX Visit [https://www.apsim.info](https://www.apsim.info) and make sure you have the proper license. diff --git a/docs/docs/Vercye/architecture.md b/docs/docs/Vercye/architecture.md index 3fca332..378c5e1 100644 --- a/docs/docs/Vercye/architecture.md +++ b/docs/docs/Vercye/architecture.md @@ -67,6 +67,7 @@ The complete logic is defined in `vecrye_ops/snakemake/Snakefile`. ### 4. Meteorological Data Acquisition **Supported Sources**: + - **ERA5** (via Google Earth Engine): Max 10 concurrent jobs. - **NASAPower**: Uses a global cache to avoid API rate limits. - First job: One-time cache fill per region for its full date range (single job per region to avoid race conditions in cache write). diff --git a/docs/docs/Vercye/metdata.md b/docs/docs/Vercye/metdata.md index af7b467..d66c741 100644 --- a/docs/docs/Vercye/metdata.md +++ b/docs/docs/Vercye/metdata.md @@ -34,8 +34,8 @@ Options: --help Show this message and exit. ``` -!Attention: This can amount to a few hundred GB of data when downloading many years of historical data. Therefore this is rather intended to be run on HPC environments. +!Attention: This can amount to 100+ GB of data when downloading many years of historical data. Therefore this is rather intended to be run on HPC environments. This will first try to download all final [CHIRPS v2.0 global daily products](https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/cogs/p05/) at 0.05 degrees resolution. For days without data available, the downloader will fallback to the [preliminary product](https://data.chc.ucsb.edu/products/CHIRPS-2.0/prelim/global_daily/tifs/p05/). -The VeRCYe pipeline will then read local regions from this global files during runtime. +The VeRCYe pipeline will then read local regions from these global files during runtime. diff --git a/docs/docs/Vercye/running.md b/docs/docs/Vercye/running.md index eb42766..fff3537 100644 --- a/docs/docs/Vercye/running.md +++ b/docs/docs/Vercye/running.md @@ -1,4 +1,4 @@ -# Running VeRCYE manuall +# Running VeRCYE manually This guide walks you through the process of setting up and running a yield study using our framework, which helps you simulate crop yields across different regions. @@ -56,7 +56,7 @@ The full configuration options are documented in the [Inputs documentation (Sect The **base directory** (your `study head dir`) organizes region-specific geometries and APSIM simulations by year, timepoint, and region [(See Details)](inputs.md). -Use the provided helper script (`prepare_yieldstudy.py`) to create this structure. For this, simply create an additional `setup_config.yaml` file in your base directory and fill it as described below. You can then run the setup helper with `python prepare_yieldstudy.py /path/to/basedirectory/setup_config.yaml`. For ease of use, start out with the example provided in `examples/setup_config.yaml`. +Use the provided helper script (`prepare_yieldstudy.py`) to create this structure. For this, simply create an additional `setup_config.yaml` file in your base directory and fill it as described below. For ease of use, start out with the example provided in `examples/setup_config.yaml`. 1. **Input shapefile & region names** @@ -75,12 +75,16 @@ Use the provided helper script (`prepare_yieldstudy.py`) to create this structur 3. **APSIM configuration templates** + VeRCYe requires an APSIM template that will be adjusted for each region. This APSIM template defines the general simulations that should be run for each region. i.e it defines the factorials of different input parameters that should be run. All of these should be manually configured based on expertise of the regions of interest. + + Additionally, currently a precipitation based custom script that MUST be embedded in the APSIM file (provided by Yuval Sadeh) is used to estimate likely sowing dates (sowing window). However, if the true sowing date is known, this can also be injected in the pipeline, but it still requires the sowing window script to be present in the code, as the start/end/force sowing dates are overwritten to the true sowing date and the factorial is disabled. + Rather than manually copying and editing an APSIM file for each year/region, the helper will: 1. Copy a template for each higher-level region (e.g. state) into every year’s folder. 2. Auto-adjust the simulation dates. NOTE: This will replace the `Models.Clock` parameter in the APSIM simulation to with the value specified in the `run_config_template.yaml` under `apsim_params.time_bounds`. If you require different simulation start/end-dates for various regions during a season, you will have to configure this manually in the APSIM files in the extracted directories. - Configure this by setting: + Configure this by setting the following parameters in your `setup_config.yaml` that you created in the previous step: - **`APSIM_TEMPLATE_PATHS_FILTER_COL_NAME`** Admin column that groups regions sharing a template (e.g. `NAME_1`). @@ -99,13 +103,11 @@ Use the provided helper script (`prepare_yieldstudy.py`) to create this structur ``` -Once all parameters are defined, run the notebook. It will: +Once all parameters are defined, run the prepartaion script with `python prepare_yieldstudy.py /path/to/basedirectory/setup_config.yaml`. It will: -- Create your `year/timepoint/region` directory tree under `OUTPUT_DIR`. +- Create your `year/timepoint/region` scaffolded directory tree under `OUTPUT_DIR`. - Generate a final `run_config.yaml` that merges your Snakemake settings with the selected regions. -**Note**: Sometimes, you might want to add some custom conditionals or processing, that is why we have provided this code in a jupyter notebook. In that case make sure to read the [input documentation](inputs.md), to understand the required structure. - ## 4. Adding Reported Validation Data The VeRCYE pipeline can automatically generate validation metrics (e.g., R², RMSE) if reported data is available. To enable this, you must manually add validation data for each year. @@ -116,12 +118,12 @@ Define aggregation levels in your `config file` under `eval_params.aggregation_l For each year and aggregation level, create a CSV file named: `{year}/referencedata_{aggregation_name}-{year}.csv`, where aggregation_name matches the key in your config (case-sensitive!). -Example: For 2024 state-level data, the file should be: `basedirectory/2024/referencedata__State-2024.csv` +Example: For 2024 state-level data, the file should be: `basedirectory/2024/referencedata_State-2024.csv` For simulation ROI-level data, use `primary` as the aggregation name: `basedirectory/2024/referencedata_primary-2024.csv` -**CSV Structure** +**CSV Structure of validation name** -- `region`: Name matching GeoJSON folder (for `primary aggregation level`) or matching attribute table column values for custom aggregation level (Column as specified under `eval_params.aggregation_levels` in tour `config.yaml`) +- `region`: Name matching GeoJSON folder name (for `primary aggregation level`) or matching attribute table column values for custom aggregation level (Column as specified under `eval_params.aggregation_levels` in tour `config.yaml`) - `reported_mean_yield_kg_ha`: Mean yield in kg/ha If unavailable, provide `reported_production_kg` instead. The mean yield will then be calculated using cropmask area (note: subject to cropmask accuracy).If you do not have validation data for certain regions, simply do not include these in your CSV. - If your reference data contains area, it is recommended to also include this under `reported_area_ha` even though this is not yet used in the evaluation pipeline. @@ -155,12 +157,11 @@ Once your setup is complete: When the simulation completes, results will be available in your base directory. See the [Outputs Documentation](outputs.md) for details on interpreting the results. -To run the pipeline over the same region(s), either use Snakemake's `-F` flag or delete the log files at `vercye_ops/snakemake/logs_*`. Runtimes are in `vercye_ops/snakemake/benchmarks`. - ## Troubleshooting and re-running If your pipeline fails, you have a few options to re-run: - If you want to force the re-execution of all rules, even if they have already completed successfully, you can add the `-F` flag to the run command above. This will invalidate all outputs and require rerunning them. +- You can delete output files and these files and all downstream affected rules will be rerun. - Recommend: If you have fixed the section of your code that caused the problems, you can simply rerun with the normal run command and only the rules that have failed and their downstream dependencies will be run. -Check out the troubleshooting page. +Check out the [troubeshooting page](troubleshooting.md) for common errors. diff --git a/docs/docs/Vercye/troubleshooting.md b/docs/docs/Vercye/troubleshooting.md index d061b85..498bd0c 100644 --- a/docs/docs/Vercye/troubleshooting.md +++ b/docs/docs/Vercye/troubleshooting.md @@ -2,8 +2,10 @@ This section contains a few tips on what to do if you are encountering errors du - `Missing input files for rule xyz`: Check the error output under `affected files`. This outlines the files that snakemake expects to be present, however they are do not exist. You can manually check the directory if they exist. Typically this points to an error in the configuration, as for example when a `region.geojson` is supposed to be missing, this points to the basedirectory being incorrectly setup / the wrong path being provided to the base directory somewhere in the config. -- `Error in rule LAI_analysis`: An error related to not enough points or something similar typicallt indicates that in all of your LAI data there are not sufficient dates that meet the required minimum pixels without clouds for the specific region. +- `Error in rule LAI_analysis`: An error related to not enough points or something similar typically indicates that in all of your LAI data there are not sufficient dates that meet the required minimum pixels without clouds for the specific region. However, this rarely should be the case when running with LAI data of multiple months (a typical season). Typically, this rather indicates that the `LAI parameters` were incorrectly set in the config. Check that the `lai_region`, `lai_resolution`, `lai_dir` and `file_ext` are correctly set. - `Error in rule match_sim_real: KeyError: None`: Typically indicates that the APSIM simulation was externally interrupted or unexpectedly failed. In such a case you will have to find the `--db_path` option in the `shell` section in the tracelog and manually delete the `.db` file. + +- `Errors related to evaluation`: Typically is related to names in the validation data `.csv` file not matching those used internally by vercye. For validation data at the same level as the simulation (`primary`), the names must match the cleaned names from vercye (the folder names). diff --git a/docs/docs/Vercye/webapp.md b/docs/docs/Vercye/webapp.md index 68e49b7..eab0d12 100644 --- a/docs/docs/Vercye/webapp.md +++ b/docs/docs/Vercye/webapp.md @@ -2,11 +2,11 @@ The VeRCYe webapp is an interface to core functionality, wrapping the CLI utilit ### Setup -1. Ensure you have installed the VeRCYe core library as described in [](../index.md#vercye-library-setup). +1. Ensure you have installed the VeRCYe core library as described in the [setup instruction](../index.md#vercye-library-setup). 2. The webapp requires you to set a number of default folders, for example for the storage of cached outputs, the path to the APSIM installation and others. For this set the environmental variables by copying `vercye_ops/.env_examples` to `vercye_ops/.env` and setting the actual values. 3. Navigate to `vercye_ops/vercye_webapp/`: `cd vercye_ops/vercye_webapp`. 4. Install the additional requirements for the webapp: Ensure you have loaded your environment from step 1 and run `pip install -r requirements.txt`. -5. To queue incoming jobs and allow workers to fetch jobs independantly, `redis` is used. Install redis for your system by following the [official instructions](). +5. To queue incoming jobs and allow workers to fetch jobs independantly, `redis` is used. Install redis for your system by following the [official instructions](https://redis.io/docs/latest/operate/oss_and_stack/install/archive/install-redis/). 4. You will now have to specify a few more environmental variables for the webapp. For this copy `vercye_webapp/.env_example` to `vercye_webapp/.env` and set the values. diff --git a/docs/docs/index.md b/docs/docs/index.md index 1591600..efd0c4b 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -9,7 +9,7 @@ The **VeRCYe Repository** contains a number of components: - **The VerCYe Library**: Contains all steps to run the VeRCYe algorithm as individual python scripts. The scripts are orchestrated into a pipeline using `Snakemake`. In general the library is split into two components: - **LAI Generation**: Downloads remotely sensed imagery and predicst Leaf Area Index (LAI) values per pixel. - **Yield Simulation and Prediction**: Simulate numerous likely configurations using APSIM and identify the best-matching simulations with the LAI data. This step also includes evaluation and reporting tools. -- **The VeRCYe Webapp**: Provides a webapp wrapper around the core library. Runs a backend service and a frontendservice that facilitates using VeRCYe operationally. +- **The VeRCYe Webapp**: Provides a webapp wrapper around the core library. Runs a backend and a frontend service that facilitate using VeRCYe operationally. --- @@ -57,10 +57,10 @@ pip install -e . #### 4. Install APSIMX -There are two options for running APSIM: +The simulations that produce yield predictions and phenology are run using the process based APSIM NextGen model. There are two options for running APSIM: -- **Using Docker**: Simply set a parameter during configuration of your yield study. The Docker container will build automatically. (Ensure `docker` is installed.) -- **Building the APSIM-NextGen binary manually**: See instructions in the [APSIM Section](Vercye/apsim.md). +- **A: Using Docker**: Simply set a parameter during [configuration of your yield study](Vercye/running.md). The Docker container will build automatically. (Ensure `docker` is installed.). This option is**NOT** available on UMD systems. +- **B: Building the APSIM-NextGen binary manually**: See instructions in the [APSIM Section](Vercye/apsim.md). > **Note**: If running on UMD systems, APSIM is pre-installed at: > @@ -144,7 +144,7 @@ vercye run --name your-study-name --dir /path/to/study/store/profile/config.yaml **Running VeRCYe manually** -While the CLI provides a conveniert way to run a yield study, for larger experiments with different configurations, you might want more freedom. For this the general process is as follows: +While the CLI provides a convenient way to run a yield study, for larger experiments with different configurations, you might want more freedom. For this the general process is as follows: 1. You will first have to generate **LAI** data from remotely sensed imagery. Refer to the [LAI Creation Guide](LAI/running.md) for details. @@ -152,11 +152,11 @@ While the CLI provides a conveniert way to run a yield study, for larger experim ### VeRCYe Webapp Setup -On information for setting up and running the webapp, visit the [Webapp Section](). +On information for setting up and running the webapp, visit the [Webapp Section](Vercye/webapp.md). ### Technical Details -![VeRCYe Architecture Diagram](vercye_highlevel.png) +![VeRCYe Architecture Diagram](Vercye/vercye_highlevel.png) - **Library Details**: The technical implementation details of the vercye library are outlined in the [VeRCYe Architecture Section](Vercye/architecture.md). Fore more details check out the code in `vercye_ops`. - **Webapp Details**: The details on architectural decisions of the webapp are documented under [VeRCYe Webapp](Vercye/webapp.md). diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 841d598..187013e 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -8,18 +8,17 @@ theme: nav: - 'Home': 'index.md' - - 'LAI': - - 'Introduction': 'LAI/intro.md' - - 'Generating LAI Data': 'LAI/running.md' - - 'Vercye': - - 'Introduction': 'Vercye/intro.md' + - 'Vercye Pipeline': - 'Running VeRCYe': - 'CLI': 'Vercye/cli.md' - 'Manual execution': 'Vercye/running.md' - 'Troubleshooting': 'Vercye/troubleshooting.md' - 'Inputs': 'Vercye/inputs.md' - 'Outputs': 'Vercye/outputs.md' - - 'VeRCYe Webapp': 'Vercye/webapp.md' - 'Meteorological Data': 'Vercye/metdata.md' - 'Vercye Library Architecture': 'Vercye/architecture.md' - 'APSIMX': 'Vercye/apsim.md' + - 'LAI Pipeline': + - 'Introduction': 'LAI/intro.md' + - 'Generating LAI Data': 'LAI/running.md' + - 'Vercye Webapp': 'Vercye/webapp.md'