From e07e3a8bcb11c4cdd2e3b679037f4dc671eb84a7 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 30 Jun 2025 17:12:31 -0400 Subject: [PATCH 01/70] added exploration demo of different approaches to aggregating multiple PFTs --- docs/mixed_system_prototype.qmd | 832 ++++++++++---------------------- 1 file changed, 242 insertions(+), 590 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 240d968..f3e8338 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -1,670 +1,322 @@ ---- -title: "Mixed-System Prototype (Two-PFT Aggregation)" -author: "David LeBauer" -date: "`r Sys.Date()`" -quarto-cache: true -format: - html: - self-contained: true -execute: - cache: true - echo: false - warning: false - message: false ---- +## Background -This document prototypes a workflow for modeling croplands with multiple crops ("mixed cropping systems"). +This document prototypes a workflow for modeling croplands with multiple crops at a single site. The proposed strategy, based on [issue #137](https://github.com/ccmmf/organization/issues/137), is to: -## Challenge +1. Run the model (e.g., SIPNET) separately for each PFT at a given site. +2. Combine the model outputs from the individual PFT runs to generate a single, site-level prediction. -CARB identifies multiple practices from the Healthy Soils Program (https://www.cdfa.ca.gov/oars/healthysoils/) that rely on mixed cropping systems—such as hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover-crop mixtures (CARB, 2022 and Appendix @tbl-hsp). -It is important to be able to represent scenarios that contain multiple crops at the same time. The challenge is that SIPNET simulates only one Plant Functional Type (PFT) - a standard approach used in many land surface, biogeochemistry, and crop models. +Table 1: Practices to be implemented in the inventory and projection workflows, in order of priority. +We expect data availability to be approximately proportional to priority, where priority reflects a combination of effect size and current or potential adoption. -## Approach - -### Context - -Here we consider methods of simulating cropping systems where two _ecophysiologically distinct_ PFTs are present at the same time. These include woody buffers and orchards with ground cover. Here, ecophysiologically distinct means that the two PFTs have different growth forms (woody vs herbaceous) and/or phenology (perennial vs annual). - -We do not consider mixtures of plant species that are similar enough to represent as a single PFT, including cover crop mixtures and agroforestry. We note that the approach described below could be applied to these practices, and this will be considered if the additional complexity can be justified by potential reduction in uncertainty. - -We also do not consider practices in which different crops are planted in sequence. These include crop rotations, cover crops, and land use change. These scenarios can be represented as sequential model runs that change PFTs while maintaining a continuously evolving system state. - -### Algorithm - -The general approach for combining multiple PFTs into an aggregate estimate of carbon pool size (and later, GHG fluxes) is to: - -1. Run SIPNET separately for each PFT. -2. Combine the model outputs from the individual PFT. - -Technical details of these steps are described in more detail below. - -#### Initial Conditions Generation (IC) +| Agronomic Practice | Model Representation | Expected GHG Impact on State GHG Inventory | Expected Adoption (CARB 2022 Scoping Plan) | Priority | +| :---- | :---- | :---- | :---- | :---- | +| Cover Crops | Same planting and harvest as commodity crops. Crop or mixture specific parameterizations. | High | High | High | +| Non-Crop Carbon Amendments such as manure or compost | Increase surface carbon and organic nitrogen, analogous to low C:N litter. | High | High | High | +| N Fertilization | Increase size of soil inorganic nitrogen pools. | High | High | High | +| Transition to Certified Organic Practices | This will be represented as specific combinations of other practices based on actual or expected rates of adoption associated with the transition. | High | High | High | +| No-till, reduced till | Tillage will increase litter and SOM decomposition rate. Tillage frequency and effect on decomposition rate will be reduced to represent no till and reduced till. | High | Medium | High | +| Polycultures \- crop mixtures including orchard ground cover and cover crop mixtures | Run PFTs independently; combine in proportion to relative composition. | Medium | Medium | Medium | +| | Increase soil water content. | High | | | +| Annual → Perennial | Change crop type, living biomass may be partially removed during harvest but not killed. | High | Low | Low | +| Agroforestry and Woody Buffers | Run woody PFTs independently and aggregate to county and state. | High | Low | Low | -1. Run once for each PFT at each site. -2. Meteorology and soil initial conditions: - - Use the same meteorological driver data (ERA5 ensemble). - - Use the same soil texture and soil organic carbon. -3. Settings File: - - For each mixed site, create separate a separate configuration file for each PFT; append `-PFT` to the site ID. -4. For herbaceous crops: - - Annual herbaceous: assume start with bare ground (zero biomass). - - Perennial herbaceous: start as bare ground run; consider methods for estimating initial biomass in later iterations. -5. For woody crops: - - Generate initial conditions. +Accurately modeling mixed cropping systems is crucial for understanding ecosystem functions such as carbon sequestration, soil health, and productivity. Mixed systems, which include combinations of woody perennials and herbaceous plants, pose challenges due to differing plant functional traits, growth dynamics, and interactions between plant types. -#### Model Execution +### Mixed Vegetation Management -- Execute PEcAn SIPNET model independently for each PFT and site combination. - - Use ensemble methods to propagate uncertainty and variance in model parameters, meteorological drivers and initial conditions. -- Following PEcAn convention, netCDFs will be in `out/ENS-{ensemble_number}-{site_id}-{pft}`. +A number of Healthy Soils Practices (HSPs, ) involve mixed cropping systems. These include hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover crop mixtures. -#### Post-processing Outputs -- Combine outputs: For each mixed site, combine model outputs using one of the approaches defined below. -- Standardize Results: Outputs are formatted into ensemble files (`combined_ensemble_output.csv`) for downstream analysis and visualization. +It is important to be able to represent scenarios that contain multiple crops at the same time. +The challenge is that a SIPNET (ref) simulates only one Plant Functional Type (PFT). +This is a common approach in many land surface and biogeochemistry models, where each PFT represents a distinct vegetation type with specific functional traits and dynamics +Examples include DayCent, DNDC, RothC, (TK cite, check that this is correct, mention others). -#### Downscaling and Analysis +There are a few approaches to modeling mixed cropping systems: -1. **Downscaling:** - - Apply machine learning models trained separately for each PFT to downscale outputs. - - Combine downscaled outputs using one of the methods described below. -2. **Analysis:** - - Calibrate using site level mixtures. - - Validate against downscaled outputs. - +1. Land surface models that simulate a grid cell containing multiple ecosystems is to simulate multiple PFTs separately at each point (or grid cell) and then combine their outputs weighted based on their fractional coverage. +2. DayCent (TK cite) uses another approach, represents mixtures as a new PFT that is parameterized to reflect a mixture of the component PFTs. +3. More advanced models like ED2 and FATES (TK cite) simulate how multiple PFTs grow together in real time - representing the vertical structure of the ecosystem and representing competition for light, space, water, and nutrients. +4. Others worth mentioning? -### Mixture Methods {#sec-mixture-methods} +All of these approaches have limitations. And simplicity is a key design principle of SIPNET - so the full ED2 or FATES approach is not feasible. -#### Two Cases for Mixed Cropping Systems: Discrete and Overlap +The proportional weighting approach is straightforward and allows for flexibility in representing different mixtures of PFTs. However, +- SOC represents a history of land cover, not the current state. But the change in SOC over time could be weighted based on the fractional coverage of each PFT. +- At the same time, imagine an orchard with partial cover. Adding ground cover to an orchard does not immediately change the SOC or AGB of the woody PFT, but it does add to the SOC and AGB of the herbaceous PFT. -In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percent contribution of a crop or plant functional type to ecosystem dynamics. This contribution is not "canopy cover" since that changes over time. Think of this as "percent of a monoculture". This method will build on SIPNET's single PFT design - the outputs from two separate single PFT runs simulate will be combined. +We have identified a few options for representing mixed cropping systems in SIPNET: -| Example System | Scenario | Method | Eqn. | -| :-------------- | :-------- | :----- | :------: | -| Annual Crops + Woody Hedgerow | Discrete | Weighted Average | [-@eq-discrete] | -| Orchard + Ground Cover | Overlap | Incremental | [-@eq-overlap] | +1. Come up with a new PFT that represents the mixed system. This approach is used e.g. by DayCent. +2. Simulate each PFT separately and then combine outputs based on their fractional areas within a management parcel. +3. Starting with orchards, assume that adding ground cover will only increase the SOC and AGB of the system relative to the baseline. -: Methods for combining two plant functional types (PFTs) in mixed cropping systems. {#tbl-methods} +## Proposed Approach -**Notation** +Mixed-vegetation scenarios include hedgerows, orchard ground cover, windbreaks, riparian buffers and cover crop mixtures that contain multiple PFTs. These will be represented by running a separate simulation for each vegetation type and then combining outputs based on the fractional area of each crop type within a field. -We define the following values: +For mixed-vegetation scenarios that combine woody buffers with crops such as hedgerows, windbreaks, and riparian buffers, we will combine crop PFTs with existing SIPNET PFTs that have been developed for North American evergreen and deciduous forests. -- $X$ is the carbon stock (AGB or SOC). -- For a finite time interval $\Delta t$ starting at $t_0$ (the start of the simulation) the change is $\Delta X_{\Delta t} = X(t_0 + \Delta t) - X(t_0)$. -- $f_{woody}$ and $f_{annual}$ are the fractional contributions of each PFT to ecosystem dynamics. In the case of the discrete method, these represent cover and sum to 1. In the overlap case, $f_{woody} = 1$ and $f_{annual} < 1$. +### Initial Conditions Generation (IC) -#### Scenario 1: Discrete (Weighted) +1. Run as normal, but for each PFT at each site. +2. Settings File: + - For each mixed site, create separate configuration entries for each PFT; append `-PFT` to the site ID. +3. For herbaceous crops: + - annual herbaceous: assume start with bare ground (zero biomass). + - perennial herbaceous (MVP): start as bare ground run; consider methods for estimating initial biomass in later iterations. +4. Run PEcAn SIPNET pipeline to generate initial conditions for each PFT. +5. Soil Conditions: + - Assign whole-site values for soil organic carbon (SOC), soil water content, and other soil parameters to each PFT. -For the Discrete case, we use a _Weighted Average_ approach (@eq-discrete). +### Model Execution -The Discrete case refers to mixed systems where the two different crops do not overlap. This covers two key scenarios: 1) each PFT occupies a distinct zones such as an annual crop with woody hedgerows or 2) a mixture of annuals. In this annual mixture case, the approach assumes that the plants do not overlap - even though this is not the case, it is a reasonable simplification. In the **discrete case**, the field scale ecosystem dynamics are approximated by weighting SIPNET outputs by the fractional area of each crop type within a field: -$$ -X_{\textrm{site}} = f_{\textrm{woody}}\cdot X_{\textrm{woody}} + f_{\textrm{annual}}\cdot X_{\textrm{annual}} -$$ {#eq-discrete} +- Execute PEcAn SIPNET model independently for each PFT and site combination. +- Following PEcAn convention, netCDFs will be in `out/ENS-{ensemble_number}-{site_id}-{pft}`. -#### Scenario 2: Overlap +### Post-processing Outputs -For the Overlap case, we use an _Incremental_ approach (@eq-overlap). +1. **Output Combination:** + - For each mixed site, combine model outputs by applying weights based on fractional cover. + - Biomass pools for herbaceous crops are scaled by fractional cover; woody biomass is summed directly. + - Soil variables (e.g., SOC) are calculated as weighted averages across PFTs. -This scenario applies when a ground cover is added to an existing orchard orchard. We set $f_\textrm{woody} = 1$, reflecting the assumption that the addition of ground cover does not reduce the contribution of the woody species to ecosystem dynamics. Lowering this fraction below 1 at the moment of planting would represent an unrealistic scenario in which adding ground cover also reduced standing orchard biomass and associated impact of the woody species on ecosystem function. When woody biomass removal occurs, it is represented as a distinct harvest event. +2. **Standardized Results:** + - Outputs are formatted into ensemble files (`efi_ensemble.csv`) for downstream analysis and visualization. -The contribution of the ground cover is added _incrementally_ to the system. The parameter $f_\text{annual}$ represents the "ecosystem impact factor" - the fraction of a monoculture's influence on ecosystem functioning that the ground cover contributes to this mixed system. +### Downscaling and Analysis -This factor is always less than 1 ($f_\textrm{annual} < 1$), as it accounts for both the percent of area covered by the ground cover and the reduction in its ecosystem impact due to competition for light and other resources. +1. **Downscaling:** + - Apply machine learning models trained separately for each PFT to downscale outputs. + - Combine downscaled outputs using fractional cover to produce integrated maps and analyses. -Thus, when ground cover is added, it is treated as an addition to the ecosystem, and the orchard (woody monoculture) is assumed to remain unchanged. -$$ -X_\textrm{site} = X_\textrm{woody} + f_\textrm{annual}\cdot\Delta X_\textrm{annual} -$$ {#eq-overlap} ## Demonstration -Here we illustrate this approach using the orchard + ground cover and annual crop + hedgerows scenarios. - -We consider a set of mixtures used in the examples and figures below (@tbl-mixture-scenarios); the percent columns indicate the fractional contribution of each Plant Functional Type (PFT). +Here we illustrate the approach using an orchard with different ratios of woody:herbacous ground cover (e.g., 0:100, 25:75, 50:50, 75:25, 100:0). +Two methods of aggregation are demonstrated: -| Scenario | Woody (%) | Annual (%) | Method | -| :-------------------------------------------- | --------: | ---------: | :------- | -| 100% woody perennial | 100 | 0 | monoculture | -| 100% annual herbaceous | 0 | 100 | monoculture | -| Orchard + 25% herbaceous ground cover | 100 | 25 | overlap | -| Orchard + 50% herbaceous ground cover | 100 | 50 | overlap | -| Annual crop + 25% woody hedgerows | 25 | 75 | discrete | -| Annual crop + 50% woody hedgerows | 50 | 50 | discrete | +1. **Weighted Average**: Combines model outputs proportionally based on PFT fractional cover. +2. **Incremental Addition**: Uses Aboveground Biomass (AGB) and Soil Organic Carbon (SOC) directly from woody PFT, adding only incremental increases in AGB and SOC from herbaceous cover proportional to its fractional area. -: Set of mixture scenarios used for demonstration. {#tbl-mixture-scenarios} - -## Illustrative Examples - -### Run Configuration - -These runs use SIPNET grass and temperate deciduous PFT parameterizations to represent annual and perennial woody crops, respectively. - -These simulations were run from 2016 through 2024 across 10 sites, with an ensemble size of 20. - -```{r setup} +```{r} +source("000-config.R") library(tidyverse) -library(lubridate) -library(here) -library(stringr) # for label wrapping -library(grid) # grid::unit() for panel spacing - -source(here::here("000-config.R")) -if (!dir.exists(model_outdir)) { - model_outdir <- "data" -} +library(ggplot2) ``` +The combination of outputs depends on the variable: -```{r plot-helpers} -# helper: replace " + " with newline so labels use two lines -mix_label_fn <- function(x) stringr::str_replace_all(x, " \\+ ", "\n+ ") +- **Soil variables** (e.g., `TotSoilCarb`): Calculated as a weighted average of the PFT-specific outputs, with weights determined by the fractional cover of each PFT. This assumes that each PFT contributes to the total soil carbon in proportion to the area it covers. Since simulations of both PFTs have the same initial conditions, this weighted average will reflect the weighted averages of $\delta$ SOC. +- **Aboveground Biomass (AGB)**: Calculated as a sum of the PFT-specific outputs. + - For herbaceous PFTs, their AGB is scaled by fractional cover. + - Woody PFT AGB is taken as is. This approach implicitly assumes that adding herbaceous cover doesnt significantly impact the biomass of established woody plants. -# labeller that applies the above to mix_description (fallback to a normal wrap for other strips) -mix_labeller <- labeller( - mix_description = mix_label_fn, - .default = label_wrap_gen(width = 12) -) +The following code demonstrates this output combination process. -pal_mix <- c( - monoculture = "#1b9e77", - discrete = "#d95f02", - overlap = "#7570b3" -) +### Prepare example data -base_theme <- theme_minimal(base_size = 10) + - theme( - legend.position = "top", - strip.text = element_text(size = 9, lineheight = 0.95, - margin = margin(2, 2, 2, 2)), - axis.text.x = element_text(angle = 0, hjust = 0.5), - plot.margin = margin(5, 5, 5, 5), - plot.title.position = "plot", - # center title and subtitle - plot.title = element_text(hjust = 0.5), - plot.subtitle = element_text(hjust = 0.5) - ) - -wrap_labeller <- labeller(.default = label_wrap_gen(width = 12)) -year_scale_dt <- scale_x_datetime(date_breaks = "1 year", - date_labels = "%Y", - expand = expansion(mult = c(0.01, 0.02))) -``` +To demonstrate this approach we will construct a hypothetical mixed cropping system - an orchard with and without herbaceous cover. We will also consider an annual herbaceous monoculture, as well as different mixtures of woody and herbaceous crops. -## Mixed-system dataset +Inputs are from woody perennial simulations in phase 1b and herbaceous annual simulations in phase 2a. -The `combined_ensemble_output.csv` file contains the combined outputs for different scenarios (discrete, overlap). +Because we have not already run the two PFTs at the same site, we combine outputs from two sites that are less than 1km apart into a single site for illustration. -```{r data-prep} -# ---- Mixed-system dataset ---------------------------------------------------- -# created in 031_aggregate_sipnet_output.R -combined_output_csv <- here::here( - file.path(model_outdir, "combined_ensemble_output.csv") -) -combined_output <- readr::read_csv( - combined_output_csv, - show_col_types = FALSE -) |> - mutate( - variable = recode(variable, "TotSoilCarb" = "SOC"), - mix_description = factor( - mix_description, - levels = c( - "100% woody", - "100% annual", - "100% woody + 25% annual", - "100% woody + 50% annual", - "75% annual + 25% woody", - "50% annual + 50% woody" - ) - ), - mix_type = case_when( - mix_description %in% c("100% annual", "100% woody") ~ "monoculture", - scenario == "discrete" ~ "discrete", - scenario == "overlap" ~ "overlap", - TRUE ~ "other" - ) - ) |> - arrange(mix_description) - -# variables present (used by time-series sections below) -vars <- unique(combined_output$variable) +We will consider the final years of the simulation (2023-2024). -``` +```{r example-data} +ac_ensemble_output <- readr::read_csv(file.path(model_outdir, "ensemble_output.csv")) |> + dplyr::mutate(year = lubridate::year(datetime)) +wp_ensemble_output_csv <- file.path( + ccmmf_dir, + "modelout", + "ccmmf_phase_1b_98sites_20reps_20250312", + "out", + "efi_std_ens_results.csv" # previous name +) -**Selecting Representative Sites** - -Next we select representative sites based on their productivity. We use AGB in the 100% woody scenario as an indicator of productivity. We select sites that are near the 15th, 50th, and 85th percentiles of AGB and categorize them as low, medium, and high productivity. - -```{r select-representative-sites} -agb_site <- combined_output |> - filter(mix_description == "100% woody", variable == "AGB") |> - group_by(site_id) |> - summarise(agb_mean = mean(value_combined), .groups = "drop") - -if (nrow(agb_site) >= 3) { - qs_agb <- quantile( - agb_site$agb_mean, - probs = c(0.15, 0.50, 0.85) - ) - analysis_sites <- purrr::map_chr(qs_agb, function(qv) { - agb_site |> - mutate(diff = abs(agb_mean - qv)) |> - slice_min(diff, n = 1, with_ties = FALSE) |> - pull(site_id) - }) |> unique() - - if (length(analysis_sites) != 3) { - PEcAn.logger::logger.error("Representative sites not found for all quantiles") - } - site_map <- tibble( - site_id = analysis_sites, - agb_class = factor(c("low_agb", "med_agb", "high_agb"), - levels = c("low_agb", "med_agb", "high_agb")) - ) -} else { - PEcAn.logger::logger.severe("Looks like something went wrong", - "There are only", nrow(agb_site), " sites with AGB data. ") -} - -# join and relabel classes for plotting (use human-friendly labels) -output_representative_sites <- site_map |> - left_join(combined_output, by = "site_id") |> - mutate( - agb_class = factor( - agb_class, - levels = c("low_agb", "med_agb", "high_agb"), - labels = c("low AGB site", "med AGB site", "high AGB site") +wp_ensemble_output <- readr::read_csv(wp_ensemble_output_csv) |> + dplyr::rename( + site_id = site, + ensemble = parameter + ) |> + dplyr::mutate( + pft = "woody perennial crop", + year = lubridate::year(datetime) ) - ) -``` - -### Summary Statistics - - - -```{r summary-table} -summary_stats <- combined_output |> - filter(year(datetime) == max(year(datetime))) |> - group_by(variable, mix_description, mix_type) |> - summarise( - mean = mean(value_combined), - sd = sd(value_combined), - .groups = "drop" - ) |> - mutate( - # format as "mean (sd)" with one decimal place - val = sprintf("%.1f (%.1f)", mean, sd) +mixed_system <- bind_rows( + ac_ensemble_output, + wp_ensemble_output ) |> - select(-mean, -sd) |> - tidyr::pivot_wider( - id_cols = c(mix_description, mix_type), - names_from = variable, - values_from = val - ) - -# TODO future iterations, don't assume columns; -# use glue to construct dynamically -knitr::kable( - summary_stats, - col.names = c("Mixture", "Type", "AGB", "SOC"), - caption = "Summary statistics (mean and standard deviation) for carbon pools by mixture type." -) -``` - -### Carbon Pool Sizes For Different Mixture Types - -What we expect to see here: - -- Discrete mixtures: Carbon pools scale approximately as the fractional (area) weighted average of the two monocultures. -- Overlap mixtures: Pools are dominated by the woody monoculture plus a modest incremental contribution from the annual component. - - -```{r carbon-pools} -# Ensemble-level end-of-simulation medians (per ensemble) for absolute pool sizes -last_dt <- max(combined_output$datetime) -last_year <- lubridate::year(last_dt) - -# Final-time values per (site x ensemble) -combined_output_final_site_ens <- combined_output |> - filter(datetime == last_dt) |> - group_by(variable, mix_description, mix_type, site_id, ensemble_id) |> - summarise( - value = median(value_combined), - .groups = "drop" - ) - -# Points: site-level means across ensembles at final time -combined_output_site_mean <- combined_output_final_site_ens |> - group_by(variable, mix_description, mix_type, site_id) |> - summarise( - site_mean = mean(value), - .groups = "drop" - ) - -# Error bars + median dot: summarise across sites (using site-level means) -combined_output_summaries <- combined_output_site_mean |> - group_by(variable, mix_description, mix_type) |> - summarise( - q25 = quantile(site_mean, 0.25), - q50 = quantile(site_mean, 0.5), - q75 = quantile(site_mean, 0.75), - .groups = "drop" - ) - - # Plot: ensemble points (jitter) + IQR + median; facet by variable - ggplot() + - geom_jitter( - data = combined_output_site_mean, - aes(x = mix_description, y = site_mean, color = mix_type), - width = 0.15, alpha = 0.5, size = 1.2 - ) + - geom_errorbar( - data = combined_output_summaries, - aes(x = mix_description, ymin = q25, ymax = q75, color = mix_type), - width = 0.4, linewidth = 0.6 - ) + - geom_point( - data = combined_output_summaries, - aes(x = mix_description, y = q50, color = mix_type), - size = 2 - ) + - facet_wrap(~ variable, scales = "free_y", nrow = 1, labeller = mix_labeller) + # use mix_labeller - scale_x_discrete(labels = mix_label_fn) + # wrap x labels into two lines - scale_color_manual(values = pal_mix) + - labs( - title = paste0("End-Year Carbon Pool Sizes (", last_year, ")"), - subtitle = "Points = site means; bars = IQR; dot = median", - x = "Mixture", - y = expression("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), - color = "Type" - ) + - base_theme + - theme(axis.text.x = element_text(angle = 35, hjust = 1)) - -``` - -### Effect Size (Final Year AGB and SOC Compared to Annual Monoculture) - -These plots show the change in carbon pool sizes for each mixture compared to the annual monoculture baselines. Because we are considering the healthy soils practice of adding a woody component like a hedgerow to an annual cropping system, we use annual monoculture as the point of comparison. - -```{r effect-size-probabilistic} -# Compute ensemble-level annual medians (per ensemble) so deltas are calculated within each ensemble -annual_stats_ens <- combined_output_final_site_ens - -# Mixture ensemble results (exclude pure monocultures) -mix_ens <- combined_output_final_site_ens |> - filter(!(mix_description %in% c("100% annual", "100% woody"))) |> - select(variable, mix_description, mix_type, site_id, ensemble_id, value) - -# Baseline per ensemble for monocultures (annual & woody) -base_ens <- annual_stats_ens |> - filter(mix_description %in% c("100% annual", "100% woody")) |> - select(variable, mix_description, site_id, ensemble_id, baseline = value) - -# Join ensemble-specific baselines to mixtures so deltas are ensemble-specific -delta_ens <- mix_ens |> - left_join( - base_ens |> - filter(mix_description == "100% annual") |> - select(variable, site_id, ensemble_id, annual_baseline = baseline), - by = c("variable", "site_id", "ensemble_id") - ) |> - left_join( - base_ens |> - filter(mix_description == "100% woody") |> - select(variable, site_id, ensemble_id, woody_baseline = baseline), - by = c("variable", "site_id", "ensemble_id") - ) |> - mutate( - delta_vs_annual = value - annual_baseline, - delta_vs_woody = value - woody_baseline + dplyr::filter( + site_id %in% c("922c6a68452fc8ca", "57ea98e608333b76"), + year %in% 2022:2023, # only have two yrs for woody + ensemble %in% 1:5, + variable %in% outputs_to_extract ) |> - pivot_longer( - cols = starts_with("delta_vs"), - names_to = "baseline_ref", - values_to = "delta" + dplyr::mutate( + # currently no sites have both annual and perennial crops, + # these sites are close, will make one up + # AND append PFT acronym to site_id + lat = mean(lat, na.rm = TRUE), # haven't joined wp w/ lat,lon yet + lon = mean(lon, na.rm = TRUE), + site_id = paste0( + "1234abcd5678efgh", "-", + stringr::str_replace_all(pft, " ", "_") + ) ) |> - mutate( - baseline_ref = factor( - baseline_ref, - levels = c("delta_vs_annual", "delta_vs_woody"), - labels = c("Δ vs Annual monoculture", "Δ vs Woody monoculture") + dplyr::mutate( + pft = case_when( + pft == "woody perennial crop" ~ "wp", + pft == "annual crop" ~ "ac", + TRUE ~ pft + ), + variable = case_when( + variable == "TotSoilCarb" ~ "soc", + variable == "AGB" ~ "agb", + TRUE ~ variable ) + ) |> + tidyr::pivot_wider( + names_from = variable, + values_from = prediction + ) |> + tidyr::pivot_wider( + id_cols = c(year, ensemble), + names_from = pft, + values_from = c(agb, soc) ) - -# Points: site-level medians (across ensembles) -delta_site_median <- delta_ens |> - group_by(variable, mix_description, baseline_ref, site_id, mix_type) |> - summarise(site_median = median(delta), .groups = "drop") - -# Error bars + median dot: across-site summaries -summary_delta <- delta_site_median |> - group_by(variable, mix_description, baseline_ref, mix_type) |> - summarise( - q25 = quantile(site_median, 0.25), - q50 = median(site_median, 0.5), - q75 = quantile(site_median, 0.75), - .groups = "drop" - ) - -# Plot only relative to annual monoculture, reflecting HSP -summary_delta_annual <- summary_delta |> - filter(baseline_ref == "Δ vs Annual monoculture") - -delta_site_median_annual <- delta_site_median |> - filter(baseline_ref == "Δ vs Annual monoculture") - -# Jittered ensemble points + IQR bars + median points -ggplot() + - geom_hline(yintercept = 0, color = "grey60", linewidth = .4) + - geom_jitter( - data = delta_site_median_annual, - aes(x = mix_description, y = site_median, color = mix_type), - width = 0.15, height = 0, alpha = 0.5, size = 1.2 - ) + - geom_errorbar( - data = summary_delta_annual, - aes(x = mix_description, ymin = q25, ymax = q75, color = mix_type), - width = 0.4, linewidth = 0.5 - ) + - geom_point( - data = summary_delta_annual, - aes(x = mix_description, y = q50, color = mix_type), - size = 0.9 - ) + - facet_wrap(variable ~ baseline_ref, scales = "free_y", nrow = 1, labeller = mix_labeller) + - scale_x_discrete(labels = mix_label_fn) + - scale_color_manual(values = pal_mix) + - labs( - title = paste0("Effect Size in ", last_year), - x = "Mixture", - y = expression(Delta ~ "(Mixture - Baseline) (" * kg ~ C ~ m^-2 * ")"), - color = "Type" - ) + - base_theme + - theme( - axis.text.x = element_text(angle = 35, hjust = 1), - panel.spacing.y = unit(1.5, "lines") # <-- increase vertical spacing between facet rows - ) +write_csv(mixed_system, "data/mixed_system.csv") ``` +### Combine Outputs -## Other Approaches Considered - -We evaluated several alternative approaches for representing mixed cropping systems: - -1. Fractional coverage approach (mosaic / tiling) - Each PFT is simulated separately and then combined by fractional area weighting (Li & Arora, 2011). This is the "Discrete" method described above (@eq-discrete). -2. Composite PFT approach - A new parameter set is defined to represent the mixture as a single “composite” PFT (e.g., DayCent: Parton et al., 1998). -3. Vegetation demography approach - Individual or cohort based models explicitly represent competition for light and resources among PFTs, as well as disturbance and successional dynamics (Fisher et al., 2018). - -Limitations: - -- Composite PFTs: By lumping mixture components into a single parameterization prevents explicit evaluation of the incremental contribution of individual PFTs - it would be harder to compare an orchard monoculture to an orchard+ground cover system. It can also bias estimates of pools and fluxes relative to the weighted mixture model (Li & Arora, 2011). -- Fractional weighting: While more realistic, this approach requires careful treatment of long-lived pools (e.g., soil organic carbon, woody biomass). If the PFTs share a common soil, raw stocks cannot be summed without double-counting; instead, $\Delta$ stocks relative to a common baseline (or shared-pool formulations) should be aggregated - as is done in the "Overlap" method above (@eq-overlap). -- Demography models provide mechanistic representation of competition but impose higher computational costs and require additional data and parameterizations (Fisher et al., 2018; Fisher & Koven, 2020). +**Aggregation Scenarios** +We use two approaches to aggregate Aboveground Biomass (AGB) and Soil Organic Carbon (SOC) for mixed systems: -## References +1. **Weighted Average Method** + Each output is combined in proportion to the fractional cover of annual crops (`ac`) and woody perennials (`wp`): -CARB (2022). Draft 2022 Scoping Plan for Achieving Carbon Neutrality. https://ww2.arb.ca.gov/sites/default/files/2022-09/2022-draft-sp-appendix-b-draft-ea-recirc.pdf + $$ + \text{AGB}_{agg} = \text{AGB}_{ac} \times ac + \text{AGB}_{wp} \times wp + $$ + $$ + \text{SOC}_{agg} = \text{SOC}_{ac} \times ac + \text{SOC}_{wp} \times wp + $$ -Fisher and Koven, 2020. Perspectives on the future of land surface models and the challenges of representing complex terrestrial systems. JAMES 12, e2018MS001453. https://doi.org/10.1029/2018MS001453 + This assumes both PFTs contribute to the total in proportion to their area. -Fisher et al., 2018. Vegetation demographics in Earth system models: a review of progress and priorities. Global Change Biol. 24, 35–54. https://doi.org/10.1111/gcb.13910 +2. **Incremental Addition Method** + The woody perennial output is used as a base, and only the incremental increase from the annual crop is added, scaled by its fractional cover: -Li, R., Arora, V.K., 2011. Effect of mosaic representation of vegetation in land surface schemes on simulated energy and carbon balances. https://doi.org/10.5194/bgd-8-5849-2011 + $$ + \text{AGB}_{add} = \text{AGB}_{wp} + \text{AGB}_{ac} \times ac + $$ + $$ + \text{SOC}_{add} = \text{SOC}_{wp} + (\text{SOC}_{ac} - \text{SOC}_{wp}) \times ac + $$ - +mp + + geom_boxplot( + #width narrow + width = 0.2, + color = 'grey' + ) + + geom_line(aes(group = ensemble), alpha = 0.25) + +# + +#geom_point(alpha = 0.25) + +# -## Appendix -### Healthy Soils Practices and Model Representation -| Agronomic Practice | Model Representation | Expected GHG Impact on State GHG Inventory | Expected Adoption (CARB 2022 Scoping Plan) | Priority | -| :---- | :---- | :---- | :---- | :---- | -| Cover Crops | Planting/harvest like commodity; PFT/mixture params. | High | High | High | -| Non-Crop Carbon Amendments (manure/compost) | Add surface C & organic N (low C:N litter analog). | High | High | High | -| N Fertilization | Increase soil inorganic N pools. | High | High | High | -| Transition to Certified Organic | Composite of shifts (fertility, amendments, tillage). | High | High | High | -| No-till / Reduced till | Lower decomposition pulses (freq & magnitude). | High | Medium | High | -| Polycultures / Mixtures | Independent PFT runs; combine by fractional contribution. | Medium | Medium | Medium | -| (Water management placeholder) | Increase soil water content. | High | | | -| Annual → Perennial | Change PFT; partial biomass retained. | High | Low | Low | -| Agroforestry / Woody Buffers | Independent woody runs; spatial aggregation. | High | Low | Low | - -: Healthy Soils Program practices, model representation, and priority for implementation in SIPNET. {#tbl-hsp} \ No newline at end of file +``` \ No newline at end of file From 5c8d46eb8d884cf8605254964544467e9ef148c1 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 7 Aug 2025 16:07:18 -0400 Subject: [PATCH 02/70] add points to plot --- docs/mixed_system_prototype.qmd | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index f3e8338..dd14c3f 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -313,10 +313,4 @@ mp + ) + geom_line(aes(group = ensemble), alpha = 0.25) -# + -#geom_point(alpha = 0.25) + -# - - - ``` \ No newline at end of file From 156063026c9a7e99f6624f5c03380d73f56c9a11 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 18 Aug 2025 19:09:49 -0400 Subject: [PATCH 03/70] update config - change location of model outputs, deprecate use of .future.R (set in config); use all outputs in production to aid in development of multi-crop workflow --- 000-config.R | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/000-config.R b/000-config.R index d9d729b..3009497 100644 --- a/000-config.R +++ b/000-config.R @@ -31,7 +31,7 @@ pecan_outdir <- file.path(ccmmf_dir, "modelout", "ccmmf_phase_2b_mixed_pfts_2025 # # Global switch to toggle between fast, small scale runs for development and testing # and full-scale production runs. Works by subsetting various data objects. -PRODUCTION <- TRUE +PRODUCTION <- FALSE # **Variables to extract** # see docs/workflow_documentation.qmd for complete list of outputs @@ -41,9 +41,7 @@ outputs_to_extract <- c( ) if(!PRODUCTION) { - # can subset for testing - # depending on what part of the workflow you are testing - # outputs_to_extract <- outputs_to_extract[1] + outputs_to_extract <- outputs_to_extract } ### Configuration Settings that can be set to default ### From ddbde3759212aca4fb196b17a6db1fa35cf73972 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 18 Aug 2025 19:20:49 -0400 Subject: [PATCH 04/70] =?UTF-8?q?Add=20nearest=E2=80=90polygon=20matching?= =?UTF-8?q?=20for=20unmatched=20sites?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- scripts/009_update_landiq.R | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/scripts/009_update_landiq.R b/scripts/009_update_landiq.R index 1f19a51..7804e7f 100644 --- a/scripts/009_update_landiq.R +++ b/scripts/009_update_landiq.R @@ -236,6 +236,25 @@ if (nrow(unmatched) > 0) { nearest_data <- tibble( site_id = unmatched$site_id, point_index = seq_len(nrow(unmatched)) + ) |> + mutate( + nearest_info = purrr::map(point_index, function(i) { + distances <- terra::distance(unmatched_vect[i], dwr_2018) + min_idx <- which.min(distances) + nearest_poly <- dwr_2018[min_idx] + point_wgs84 <- terra::project(unmatched_vect[i], "epsg:4326") + coords <- terra::crds(point_wgs84) + tibble( + UniqueID = nearest_poly$UniqueID, + distance = min(distances), + lon = round(coords[1], 5), + lat = round(coords[2], 5) + ) + }) + ) |> + tidyr::unnest(nearest_info) |> + select(-point_index) |> + mutate(match_type = "nearest") ) |> mutate( nearest_info = purrr::map(point_index, function(i) { From 696a933a84859eefc59465618616da555259f353 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 18 Aug 2025 19:22:19 -0400 Subject: [PATCH 05/70] add note in docs that workflows will be split into site selection and downscaling --- docs/workflow_documentation.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 0be98d0..c515cb1 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -39,7 +39,13 @@ The workflows are ### Configuration -Workflow settings are configured in `000-config.R`. +Workflow settings are configured in `000-config.R`, except that the CCMMF_DIR is set in `.Renviron`. + The configuration script reads the CCMMF directory from the environment variable `CCMMF_DIR` (set in .Renviron), and uses it to define paths for inputs and outputs. @@ -80,10 +86,23 @@ git clone git@github.com:ccmmf/downscaling _these shouldn't need to be changed unless you want to change the default behavior of the workflow_ - `renv.lock` is used for package management with `renv`. - - See [project renv setup docs](docs/renv_setup.md) for instructions about using `renv` for these workflows. - - See [renv package documentation](https://rstudio.github.io/renv/articles/renv.html) for more details. +See [project renv setup docs](renv_setup.md) for instructions about using `renv` for these workflows. +See [renv package documentation](https://rstudio.github.io/renv/articles/renv.html) for more details. -# + ### 1. Data Preparation From ff23d5e23d14b872033f0af2b9ce6ab6169cb39e Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 18 Aug 2025 19:31:19 -0400 Subject: [PATCH 06/70] removed duplicate chunk from 009 --- scripts/009_update_landiq.R | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/scripts/009_update_landiq.R b/scripts/009_update_landiq.R index 7804e7f..1f19a51 100644 --- a/scripts/009_update_landiq.R +++ b/scripts/009_update_landiq.R @@ -236,25 +236,6 @@ if (nrow(unmatched) > 0) { nearest_data <- tibble( site_id = unmatched$site_id, point_index = seq_len(nrow(unmatched)) - ) |> - mutate( - nearest_info = purrr::map(point_index, function(i) { - distances <- terra::distance(unmatched_vect[i], dwr_2018) - min_idx <- which.min(distances) - nearest_poly <- dwr_2018[min_idx] - point_wgs84 <- terra::project(unmatched_vect[i], "epsg:4326") - coords <- terra::crds(point_wgs84) - tibble( - UniqueID = nearest_poly$UniqueID, - distance = min(distances), - lon = round(coords[1], 5), - lat = round(coords[2], 5) - ) - }) - ) |> - tidyr::unnest(nearest_info) |> - select(-point_index) |> - mutate(match_type = "nearest") ) |> mutate( nearest_info = purrr::map(point_index, function(i) { From 15e631601f6308fa650dacf793fd5961d9428c08 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 18 Aug 2025 20:13:12 -0400 Subject: [PATCH 07/70] Refactor SIPNET output extraction script for clarity and efficiency; add change since start as delta_ to standard output; aggregate output to monthly means --- scripts/030_extract_sipnet_output.R | 54 +++++++++-------------------- 1 file changed, 16 insertions(+), 38 deletions(-) diff --git a/scripts/030_extract_sipnet_output.R b/scripts/030_extract_sipnet_output.R index 5b818ca..b12192b 100644 --- a/scripts/030_extract_sipnet_output.R +++ b/scripts/030_extract_sipnet_output.R @@ -2,7 +2,7 @@ # A long format CSV (time, site, ensemble, variable) # that follows the Ecological Forecasting Initiative (EFI) forecast standard -# Helper functions in R/efi_long_to_arrays.R will convert this to +# Helper functions in R/efi_long_to_arrays.R will # 1. A 4-D array (time, site, ensemble, variable) # 2. A NetCDF file (time, site, ensemble, variable) # TODO: write out EML metadata in order to be fully EFI compliant @@ -89,8 +89,9 @@ if (!PRODUCTION) { start_year <- end_year - 1 } -## Create dataframe of directories to process, filtered to include only -## existing directories (ens_dirs) that have site_ids and ensemble IDs +## Create of directories to process +## limited to directories that actually exist (ens_dirs) +## and that have site_ids and ensemble IDs ens_ids_str <- PEcAn.utils::left.pad.zeros(ens_ids) dirs_to_process <- ens_dirs |> dplyr::filter( @@ -150,42 +151,19 @@ ens_results <- ens_results_raw |> ) |> dplyr::select(-site_id) |> dplyr::rename(site_id = base_site_id) |> - dplyr::select(datetime, site_id, lat, lon, pft, parameter, variable, prediction) - -# Classify variables as 'pool' (state) vs 'flux' (rate) using PEcAn standard_vars. -std_vars <- PEcAn.utils::standard_vars - -pool_vars <- std_vars |> - dplyr::filter(stringr::str_detect(tolower(Category), "pool")) |> - dplyr::pull(Variable.Name) - -flux_vars <- std_vars |> - dplyr::filter(stringr::str_detect(tolower(Category), "flux")) |> - dplyr::pull(Variable.Name) - -ens_results <- ens_results |> - dplyr::mutate( - datetime = lubridate::floor_date(datetime, unit = "month"), - variable_type = dplyr::case_when( - variable %in% pool_vars ~ "pool", - variable %in% flux_vars ~ "flux", - TRUE ~ "unknown" - ) - ) |> - dplyr::group_by(datetime, site_id, lat, lon, pft, parameter, variable, variable_type) |> + dplyr::select(datetime, site_id, lat, lon, pft, parameter, variable, prediction) |> + dplyr::mutate(datetime = lubridate::floor_date(datetime, unit = "month")) |> + dplyr::group_by(datetime, site_id, lat, lon, pft, parameter, variable) |> dplyr::summarise( - prediction = mean(prediction, na.rm = TRUE), - .groups = "drop" - ) - -# Warn if flux variables are present because users may need to treat them differently. -if (any(ens_results$variable_type == "flux")) { - PEcAn.logger::logger.severe( - "Flux variables detected in ensemble output. Note: averaging flux (rate) variables", - "across ensembles/sites or over time can be misleading. Consider computing cumulative", - "fluxes over simulation period", + prediction = mean(prediction, na.rm = TRUE), .groups = "drop" ) -} +# Append with change since start date +delta_df <- ens_results |> + dplyr::group_by(site_id, lat, lon, pft, parameter, variable) |> + dplyr::mutate(prediction = as.numeric(difftime(datetime, min(datetime), units = "days"))) |> + dplyr::ungroup() |> + dplyr::mutate(variable = paste0("delta_", variable)) +ens <- dplyr::bind_rows(ens_results, delta_df) # After extraction, ens_results$prediction is in kg C m-2 for both AGB and TotSoilCarb @@ -193,7 +171,7 @@ if (any(ens_results$variable_type == "flux")) { logger_level <- PEcAn.logger::logger.setLevel(logger_level) ensemble_output_csv <- file.path(model_outdir, "ensemble_output.csv") -readr::write_csv(ens_results, ensemble_output_csv) +readr::write_csv(ens, ensemble_output_csv) PEcAn.logger::logger.info( "\nEnsemble output extraction complete.", "\nResults saved to ", ensemble_output_csv From 12f4a438a856e9dd7a03c21c7ffe0a0ef5c04e5e Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 18 Aug 2025 20:52:06 -0400 Subject: [PATCH 08/70] clarify documentation; disable renv auto activation in .Rprofile --- .Rprofile | 58 +--------------------------------- docs/workflow_documentation.md | 29 +++-------------- 2 files changed, 5 insertions(+), 82 deletions(-) diff --git a/.Rprofile b/.Rprofile index 9454c40..4b314ba 100644 --- a/.Rprofile +++ b/.Rprofile @@ -5,60 +5,4 @@ options(repos = c( CRAN = "https://cloud.r-project.org" )) -# Sys.setenv(R_LIBS_USER = file.path( -# Sys.getenv("RENV_PATHS_LIBRARY"), -# renv:::renv_platform_prefix() -# )) -options(renv.config.autoloader.enabled = FALSE) -if (requireNamespace("renv", quietly = TRUE)) { - renv::activate() -} - -options(bitmapType = "cairo") - - -### Get plotting to work in VS Code on HPC -## Always prefer cairo bitmaps (avoids X11) -options(bitmapType = "cairo") - -## Interactive: try httpgd, else fall back to headless pdf -if (interactive()) { - # Prefer httpgd (works in VS Code without X11). If missing, use OS GUI. - if (requireNamespace("httpgd", quietly = TRUE)) { - # Make httpgd the default device for *all* interactive plots - options(device = function(...) httpgd::hgd(silent = TRUE, ...)) - # Start a viewer immediately so first plot appears - try(httpgd::hgd(silent = TRUE), silent = TRUE) - } else if (.Platform$OS.type == "windows") { - options(device = "windows") - } else if (capabilities("aqua")) { # macOS - options(device = "quartz") - } else { - # Last resort: warn instead of silently discarding plots - packageStartupMessage( - "No interactive device. Install 'httpgd' for X11-free plotting in VS Code: ", - "install.packages('httpgd') or remotes::install_github('r-lib/httpgd')" - ) - } -} - -## Non-interactive (e.g. quarto render): set knitr defaults once -if (!interactive() && requireNamespace("knitr", quietly = TRUE)) { - dev_choice <- if (requireNamespace("ragg", quietly = TRUE)) { - "ragg_png" - } else if (requireNamespace("svglite", quietly = TRUE)) { - "svglite" - } else if (requireNamespace("Cairo", quietly = TRUE)) { - "cairo_png" - } else { - "png" - } - - knitr::opts_chunk$set( - dev = dev_choice, - fig.width = 7, - fig.height = 4, - dpi = 144 - ) -} -cat("global Rprofile loaded\n") +# source("renv/activate.R") diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index c515cb1..a43e365 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -39,13 +39,7 @@ The workflows are ### Configuration -Workflow settings are configured in `000-config.R`, except that the CCMMF_DIR is set in `.Renviron`. - +Workflow settings are configured in `000-config.R`. The configuration script reads the CCMMF directory from the environment variable `CCMMF_DIR` (set in .Renviron), and uses it to define paths for inputs and outputs. @@ -70,8 +64,6 @@ git clone git@github.com:ccmmf/downscaling - `RENV_PATHS_CACHE` and `RENV_PATHS_LIBRARY` store the `renv` cache and library in the CCMMF directory. These are in a subdirectory of the CCMMF directory in order to make them available across all users (and because on some computers, they exceed allocated space in the home directory). - - `R_LIBS_USER` must point to the platform and R version specific subdirectory inside `RENV_PATHS_LIBRARY`. - Example: `/projectnb/dietzelab/ccmmf/renv-library/linux-almalinux-8.10/R-4.4/x86_64-pc-linux-gnu` - `.Rprofile` - sets repositories from which R packages are installed - runs `renv/activate.R` @@ -86,23 +78,10 @@ git clone git@github.com:ccmmf/downscaling _these shouldn't need to be changed unless you want to change the default behavior of the workflow_ - `renv.lock` is used for package management with `renv`. -See [project renv setup docs](renv_setup.md) for instructions about using `renv` for these workflows. -See [renv package documentation](https://rstudio.github.io/renv/articles/renv.html) for more details. + - See [project renv setup docs](docs/renv_setup.md) for instructions about using `renv` for these workflows. + - See [renv package documentation](https://rstudio.github.io/renv/articles/renv.html) for more details. - +# ### 1. Data Preparation From d0cd427052456bed638acffd30eda3be30cbfbde Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Tue, 26 Aug 2025 01:34:05 -0400 Subject: [PATCH 09/70] major refactoring of mixed_system_prototype.qmd no longer load renv in .Rprofile; hard code path to renv libraries in .Renviron --- .Renviron | 2 - .Rprofile | 9 +- docs/mixed_system_prototype.qmd | 789 +++++++++++++++++++--------- docs/workflow_documentation.md | 2 + scripts/030_extract_sipnet_output.R | 14 +- 5 files changed, 561 insertions(+), 255 deletions(-) diff --git a/.Renviron b/.Renviron index aab4f46..1eecd28 100644 --- a/.Renviron +++ b/.Renviron @@ -12,5 +12,3 @@ R_LIBS_USER="${CCMMF_DIR}/renv-library/linux-almalinux-8.10/R-4.4/x86_64-pc-linu UDUNITS2_LIBS="-L/share/pkg.8/udunits/2.2.28/install/lib -ludunits2" UDUNITS2_INCLUDE="-I/share/pkg.8/udunits/2.2.28/install/include" -# Default graphics device -R_DEFAULT_DEVICE=png diff --git a/.Rprofile b/.Rprofile index 4b314ba..7aabf92 100644 --- a/.Rprofile +++ b/.Rprofile @@ -5,4 +5,11 @@ options(repos = c( CRAN = "https://cloud.r-project.org" )) -# source("renv/activate.R") +Sys.setenv(R_LIBS_USER = file.path( + Sys.getenv("RENV_PATHS_LIBRARY"), + renv:::renv_platform_prefix() +)) +options(renv.config.autoloader.enabled = FALSE) +if (requireNamespace("renv", quietly = TRUE)) { + renv::activate() +} \ No newline at end of file diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index dd14c3f..4dd0153 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -1,63 +1,55 @@ -## Background +--- +title: "Mixed-System Prototype (Two-PFT Aggregation)" +#quarto-cache: true +format: + html: + self-contained: true +execute: +# cache: true + echo: false + warning: false + message: false +--- -This document prototypes a workflow for modeling croplands with multiple crops at a single site. The proposed strategy, based on [issue #137](https://github.com/ccmmf/organization/issues/137), is to: +This document prototypes a workflow for modeling croplands with multiple crops ("mixed cropping systems"). -1. Run the model (e.g., SIPNET) separately for each PFT at a given site. -2. Combine the model outputs from the individual PFT runs to generate a single, site-level prediction. +## Challenge +A number of Healthy Soils Practices (HSPs, ) involve mixed cropping systems. These include hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover crop mixtures (Appendix). -Table 1: Practices to be implemented in the inventory and projection workflows, in order of priority. -We expect data availability to be approximately proportional to priority, where priority reflects a combination of effect size and current or potential adoption. - -| Agronomic Practice | Model Representation | Expected GHG Impact on State GHG Inventory | Expected Adoption (CARB 2022 Scoping Plan) | Priority | -| :---- | :---- | :---- | :---- | :---- | -| Cover Crops | Same planting and harvest as commodity crops. Crop or mixture specific parameterizations. | High | High | High | -| Non-Crop Carbon Amendments such as manure or compost | Increase surface carbon and organic nitrogen, analogous to low C:N litter. | High | High | High | -| N Fertilization | Increase size of soil inorganic nitrogen pools. | High | High | High | -| Transition to Certified Organic Practices | This will be represented as specific combinations of other practices based on actual or expected rates of adoption associated with the transition. | High | High | High | -| No-till, reduced till | Tillage will increase litter and SOM decomposition rate. Tillage frequency and effect on decomposition rate will be reduced to represent no till and reduced till. | High | Medium | High | -| Polycultures \- crop mixtures including orchard ground cover and cover crop mixtures | Run PFTs independently; combine in proportion to relative composition. | Medium | Medium | Medium | -| | Increase soil water content. | High | | | -| Annual → Perennial | Change crop type, living biomass may be partially removed during harvest but not killed. | High | Low | Low | -| Agroforestry and Woody Buffers | Run woody PFTs independently and aggregate to county and state. | High | Low | Low | - -Accurately modeling mixed cropping systems is crucial for understanding ecosystem functions such as carbon sequestration, soil health, and productivity. Mixed systems, which include combinations of woody perennials and herbaceous plants, pose challenges due to differing plant functional traits, growth dynamics, and interactions between plant types. - -### Mixed Vegetation Management +It is important to be able to represent scenarios that contain multiple crops at the same time. +The challenge is that SIPNET simulates only one Plant Functional Type (PFT) - a standard approach used in many land surface, biogeochemistry, and crop models. -A number of Healthy Soils Practices (HSPs, ) involve mixed cropping systems. These include hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover crop mixtures. +## Approach -It is important to be able to represent scenarios that contain multiple crops at the same time. -The challenge is that a SIPNET (ref) simulates only one Plant Functional Type (PFT). -This is a common approach in many land surface and biogeochemistry models, where each PFT represents a distinct vegetation type with specific functional traits and dynamics -Examples include DayCent, DNDC, RothC, (TK cite, check that this is correct, mention others). +### Context -There are a few approaches to modeling mixed cropping systems: +Here we only consider cropping systems where two PFTs are present at the same time. -1. Land surface models that simulate a grid cell containing multiple ecosystems is to simulate multiple PFTs separately at each point (or grid cell) and then combine their outputs weighted based on their fractional coverage. -2. DayCent (TK cite) uses another approach, represents mixtures as a new PFT that is parameterized to reflect a mixture of the component PFTs. -3. More advanced models like ED2 and FATES (TK cite) simulate how multiple PFTs grow together in real time - representing the vertical structure of the ecosystem and representing competition for light, space, water, and nutrients. -4. Others worth mentioning? +This includes: -All of these approaches have limitations. And simplicity is a key design principle of SIPNET - so the full ED2 or FATES approach is not feasible. +- Cover crop mixtures +- Agroforestry +- Woody Buffers +- Orchards with ground cover -The proportional weighting approach is straightforward and allows for flexibility in representing different mixtures of PFTs. However, -- SOC represents a history of land cover, not the current state. But the change in SOC over time could be weighted based on the fractional coverage of each PFT. -- At the same time, imagine an orchard with partial cover. Adding ground cover to an orchard does not immediately change the SOC or AGB of the woody PFT, but it does add to the SOC and AGB of the herbaceous PFT. +This does not include: -We have identified a few options for representing mixed cropping systems in SIPNET: +- Cases where where the different crops separated in time: + - cover crops + - multiple crops in sequence + - annual to perennial crop (or vice-versa) transitions. +- Plant mixtures that can be represented as a single PFT. -1. Come up with a new PFT that represents the mixed system. This approach is used e.g. by DayCent. -2. Simulate each PFT separately and then combine outputs based on their fractional areas within a management parcel. -3. Starting with orchards, assume that adding ground cover will only increase the SOC and AGB of the system relative to the baseline. +### Algorithm -## Proposed Approach +The general approach for combining runs at each site is to: -Mixed-vegetation scenarios include hedgerows, orchard ground cover, windbreaks, riparian buffers and cover crop mixtures that contain multiple PFTs. These will be represented by running a separate simulation for each vegetation type and then combining outputs based on the fractional area of each crop type within a field. +1. Run SIPNET separately for each PFT +2. Combine the model outputs from the individual PFT. -For mixed-vegetation scenarios that combine woody buffers with crops such as hedgerows, windbreaks, and riparian buffers, we will combine crop PFTs with existing SIPNET PFTs that have been developed for North American evergreen and deciduous forests. -### Initial Conditions Generation (IC) +#### Initial Conditions Generation (IC) 1. Run as normal, but for each PFT at each site. 2. Settings File: @@ -65,252 +57,567 @@ For mixed-vegetation scenarios that combine woody buffers with crops such as hed 3. For herbaceous crops: - annual herbaceous: assume start with bare ground (zero biomass). - perennial herbaceous (MVP): start as bare ground run; consider methods for estimating initial biomass in later iterations. -4. Run PEcAn SIPNET pipeline to generate initial conditions for each PFT. -5. Soil Conditions: +4. For woody crops: + - Use existing approach to generate initial conditions. +5. Run PEcAn SIPNET pipeline to generate initial conditions for each PFT. +6. Soil Conditions: - Assign whole-site values for soil organic carbon (SOC), soil water content, and other soil parameters to each PFT. -### Model Execution +#### Model Execution - Execute PEcAn SIPNET model independently for each PFT and site combination. - Following PEcAn convention, netCDFs will be in `out/ENS-{ensemble_number}-{site_id}-{pft}`. -### Post-processing Outputs +#### Post-processing Outputs -1. **Output Combination:** - - For each mixed site, combine model outputs by applying weights based on fractional cover. - - Biomass pools for herbaceous crops are scaled by fractional cover; woody biomass is summed directly. - - Soil variables (e.g., SOC) are calculated as weighted averages across PFTs. +- Combine outputs: For each mixed site, combine model outputs using the appropriate approach (Discrete / Weighted or Overlap / Incremental). Combination methods are described in the following section. +- Standardize Results: Outputs are formatted into ensemble files (`combined_ensemble_output.csv`) for downstream analysis and visualization. -2. **Standardized Results:** - - Outputs are formatted into ensemble files (`efi_ensemble.csv`) for downstream analysis and visualization. - -### Downscaling and Analysis +#### Downscaling and Analysis 1. **Downscaling:** + - [open question] Build downscaling model from merged outputs, or downscale each PFT separately and then merge downscaled outputs? - Apply machine learning models trained separately for each PFT to downscale outputs. - Combine downscaled outputs using fractional cover to produce integrated maps and analyses. +### Mixture Methods -## Demonstration +#### Two Cases for Mixed Cropping Systems: Discrete and Overlap -Here we illustrate the approach using an orchard with different ratios of woody:herbacous ground cover (e.g., 0:100, 25:75, 50:50, 75:25, 100:0). -Two methods of aggregation are demonstrated: +In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percent contribution of a crop or plant functional type to ecosystem dynamics. This contribution is not "canopy cover" since that changes over time. Think of this as "percent of a monoculture". This method will build on SIPNET's single PFT design - the outputs from two separate single PFT runs simulate will be combined. -1. **Weighted Average**: Combines model outputs proportionally based on PFT fractional cover. -2. **Incremental Addition**: Uses Aboveground Biomass (AGB) and Soil Organic Carbon (SOC) directly from woody PFT, adding only incremental increases in AGB and SOC from herbaceous cover proportional to its fractional area. +The two methods of mixing described below are "Discrete" and "Overlap". -```{r} -source("000-config.R") -library(tidyverse) -library(ggplot2) -``` +We define $X$ as the carbon stock. For a finite time interval $\Delta t$ starting at $t_0$ (the start of the simulation) the change is $\Delta X_{\Delta t} = X(t_0 + \Delta t) - X(t_0)$. + +#### Scenario 1: Discrete (Weighted) + +For the Discrete case, we use a _Weighted Average_ approach. -The combination of outputs depends on the variable: +The Discrete case refers to mixed systems where the two different crops do not overlap. This covers two key scenarios: 1) each PFT occupies a distinct zones such as an annual crop with woody hedgerows or 2) a mixture of annuals. In this annual mixture case, the approach assumes that the plants do not overlap - even though this is not the case, it is a reasonable simplification. In the **discrete case**, we define $f_{woody} + f_{annual} = 1$ and the field scale ecosystem dynamics are approximated by weighting SIPNET outputs by the fractional area of each crop type within a field: +$$ +X_{\textrm{site}} = f_{\textrm{woody}}\cdot X_{\textrm{woody}} + f_{\textrm{annual}}\cdot X_{\textrm{annual}} +\label{eq:discrete}\tag{1} +$$ -- **Soil variables** (e.g., `TotSoilCarb`): Calculated as a weighted average of the PFT-specific outputs, with weights determined by the fractional cover of each PFT. This assumes that each PFT contributes to the total soil carbon in proportion to the area it covers. Since simulations of both PFTs have the same initial conditions, this weighted average will reflect the weighted averages of $\delta$ SOC. -- **Aboveground Biomass (AGB)**: Calculated as a sum of the PFT-specific outputs. - - For herbaceous PFTs, their AGB is scaled by fractional cover. - - Woody PFT AGB is taken as is. This approach implicitly assumes that adding herbaceous cover doesnt significantly impact the biomass of established woody plants. +#### Scenario 2: **Overlap** -The following code demonstrates this output combination process. +For the Overlap case, we use an _Incremental_ approach. -### Prepare example data +The Overlap case will apply when a ground cover is added to an orchard. When a ground cover is added to an orchard, it is not expected to reduce the contribution of the orchard species to ecosystem dynamics to a fraction. Rather, we keep $f_\textrm{woody} = 1$ and set $f_\textrm{annual} <1$ to represent the contribution of the ground cover to the system. Reducing $f_\textrm{woody}$ would be equivalent to removing standing orchard biomass at the moment the ground cover is planted. In this case, the ground cover effect is added proportional to its cover and biomass relative to a monoculture in full sun. -To demonstrate this approach we will construct a hypothetical mixed cropping system - an orchard with and without herbaceous cover. We will also consider an annual herbaceous monoculture, as well as different mixtures of woody and herbaceous crops. +We start by assuming that all orchards, both with and without ground cover, represent the same biomass as a monoculture, and set $f_{woody} = 1$. When ground cover is added, it is treated as addition to the ecosystem, and the value of $f_{annual}<1$ reflects the fraction of a monoculture that it represents - accounting for both percent of area that it covers as well as the fact that competition for light reduces its effect on ecosystem functioning. -Inputs are from woody perennial simulations in phase 1b and herbaceous annual simulations in phase 2a. +$$ +X_\textrm{site} = X_\textrm{woody} + f_\textrm{annual}\cdot\Delta X_\textrm{annual} +\label{eq:overlap}\tag{2} +$$ -Because we have not already run the two PFTs at the same site, we combine outputs from two sites that are less than 1km apart into a single site for illustration. -We will consider the final years of the simulation (2023-2024). +#### Summary Table + +| Example System | Scenario | Method | Eqn. | +| :-------------- | :-------- | :----- | :------: | +| Annual Crops + Woody Hedgerow | Discrete | Weighted Average | \eqref{eq:discrete} | +| Orchard + Ground Cover | Overlap | Incremental | \eqref{eq:overlap} | + + +## Demonstration -```{r example-data} -ac_ensemble_output <- readr::read_csv(file.path(model_outdir, "ensemble_output.csv")) |> - dplyr::mutate(year = lubridate::year(datetime)) -wp_ensemble_output_csv <- file.path( - ccmmf_dir, - "modelout", - "ccmmf_phase_1b_98sites_20reps_20250312", - "out", - "efi_std_ens_results.csv" # previous name +Here we illustrate this approach using the orchard + ground cover and annual crop + hedgerows scenarios. + +We consider a set of mixtures that will be used in the examples and figures below; the percent columns indicate the fractional contribution of each Plant Functional Type (PFT) used for simple area-weighted aggregation. + +| Scenario | Woody (%) | Annual (%) | Method | +| :-------------------------------------------- | --------: | ---------: | :------- | +| 100% woody perennial (orchard / monoculture) | 100 | 0 | discrete | +| 100% annual herbaceous (monoculture) | 0 | 100 | discrete | +| Orchard + 25% herbaceous ground cover | 100 | 25 | overlap | +| Orchard + 50% herbaceous ground cover | 100 | 50 | overlap | +| Annual crop + 25% woody hedgerows | 25 | 75 | discrete | +| Annual crop + 50% woody hedgerows | 50 | 50 | discrete | + +These scenarios can be referenced by their descriptive names (e.g., "Orchard + 25% cover") in the downstream code and figures. + +## Illustrative Examples + +### Run Configuration + +These runs use SIPNET grass and temperate deciduous PFT parameterizations to represent annual and perennial woody crops, respectively. + +These were run at 10 from 2016 through 2024 with an ensemble size of 20. + +```{r setup} +library(tidyverse) +library(lubridate) +library(here) +library(stringr) # for label wrapping +library(grid) # grid::unit() for panel spacing + +source(here::here("000-config.R")) +if (!dir.exists(model_outdir)) { + model_outdir <- "data" +} + +# helper: replace " + " with newline so labels use two lines +mix_label_fn <- function(x) stringr::str_replace_all(x, " \\+ ", "\n+ ") + +# labeller that applies the above to mix_order (fallback to a normal wrap for other strips) +mix_labeller <- labeller( + mix_order = mix_label_fn, + .default = label_wrap_gen(width = 12) ) -wp_ensemble_output <- readr::read_csv(wp_ensemble_output_csv) |> - dplyr::rename( - site_id = site, - ensemble = parameter - ) |> - dplyr::mutate( - pft = "woody perennial crop", - year = lubridate::year(datetime) - ) +pal_mix <- c( + monoculture = "#1b9e77", + discrete = "#d95f02", + overlap = "#7570b3" +) + +base_theme <- theme_minimal(base_size = 10) + + theme( + legend.position = "top", + strip.text = element_text(size = 9, lineheight = 0.95, + margin = margin(2, 2, 2, 2)), + axis.text.x = element_text(angle = 0, hjust = 0.5), + plot.margin = margin(5, 5, 5, 5), + plot.title.position = "plot", + # center title and subtitle + plot.title = element_text(hjust = 0.5), + plot.subtitle = element_text(hjust = 0.5) + ) + +wrap_labeller <- labeller(.default = label_wrap_gen(width = 12)) +year_scale_dt <- scale_x_datetime(date_breaks = "1 year", + date_labels = "%Y", + expand = expansion(mult = c(0.01, 0.02))) +``` -mixed_system <- bind_rows( - ac_ensemble_output, - wp_ensemble_output +## Mixed-system dataset + +The `combined_output` tibble contains the combined outputs for different scenarios (discrete, overlap). + +```{r data-prep} +# ---- Mixed-system dataset ---------------------------------------------------- +# created in 031_aggregate_sipnet_output.R +combined_output_csv <- here::here( + file.path(model_outdir, "combined_ensemble_output.csv") +) +combined_output <- readr::read_csv( + combined_output_csv, + show_col_types = FALSE +) |> + # map legacy variable name to short name used in plots/tables + mutate(variable = recode(variable, "TotSoilCarb" = "SOC")) |> + mutate( + # keep mix_description as the canonical value but display explicit labels via mix_order + # TODO: move recoding to 031_aggregate_sipnet_output.R + mix_order = factor( + mix_description, + levels = c( + "annual", "annual + 25% woody", "annual + 50% woody", + "woody", "woody + 25% annual", "woody + 50% annual" + ), + labels = c( + "100% annual", "100% annual + 25% woody", "100% annual + 50% woody", + "100% woody", "100% woody + 25% annual", "100% woody + 50% annual" + ) + ), + mix_type = case_when( + mix_description %in% c("annual", "woody") ~ "monoculture", + scenario == "discrete" ~ "discrete", + scenario == "overlap" ~ "overlap", + TRUE ~ "other" + ) ) |> - dplyr::filter( - site_id %in% c("922c6a68452fc8ca", "57ea98e608333b76"), - year %in% 2022:2023, # only have two yrs for woody - ensemble %in% 1:5, - variable %in% outputs_to_extract - ) |> - dplyr::mutate( - # currently no sites have both annual and perennial crops, - # these sites are close, will make one up - # AND append PFT acronym to site_id - lat = mean(lat, na.rm = TRUE), # haven't joined wp w/ lat,lon yet - lon = mean(lon, na.rm = TRUE), - site_id = paste0( - "1234abcd5678efgh", "-", - stringr::str_replace_all(pft, " ", "_") - ) - ) |> - dplyr::mutate( - pft = case_when( - pft == "woody perennial crop" ~ "wp", - pft == "annual crop" ~ "ac", - TRUE ~ pft - ), - variable = case_when( - variable == "TotSoilCarb" ~ "soc", - variable == "AGB" ~ "agb", - TRUE ~ variable - ) - ) |> - tidyr::pivot_wider( - names_from = variable, - values_from = prediction - ) |> - tidyr::pivot_wider( - id_cols = c(year, ensemble), - names_from = pft, - values_from = c(agb, soc) + arrange(across(any_of(c("site_id", "mix_description", "scenario", + "ensemble_id", "datetime")))) |> + mutate(line_group = interaction( + across(all_of(c("ensemble_id", "mix_description", "scenario", "site_id"))), + drop = TRUE + )) + +vars <- unique(combined_output$variable) +``` + +```{r select-representative-sites} +agb_site <- combined_output |> + filter(mix_description == "woody", variable == "AGB") |> + group_by(site_id) |> + summarise(agb_mean = mean(value_combined, na.rm = TRUE), .groups = "drop") + +if (nrow(agb_site) >= 3) { + qs_agb <- quantile( + agb_site$agb_mean, + probs = c(0.15, 0.50, 0.85), na.rm = TRUE + ) + analysis_sites <- purrr::map_chr(qs_agb, function(qv) { + agb_site |> + mutate(diff = abs(agb_mean - qv)) |> + slice_min(diff, n = 1, with_ties = FALSE) |> + pull(site_id) + }) |> unique() + + if (length(analysis_sites) != 3) { + PEcAn.logger::logger.error("Representative sites not found for all quantiles") + } + site_map <- tibble( + site_id = analysis_sites, + agb_class = factor(c("low agb", "med_agb", "high_agb"), + levels = c("low agb", "med_agb", "high_agb")) + ) +} + +# join and relabel classes for plotting (use human-friendly labels) +output_representative_sites <- site_map |> + left_join(combined_output, by = "site_id") |> + mutate( + agb_class = factor( + agb_class, + levels = c("low agb", "med_agb", "high_agb"), + labels = c("low AGB site", "med AGB site", "high AGB site") ) -write_csv(mixed_system, "data/mixed_system.csv") + ) ``` -### Combine Outputs -**Aggregation Scenarios** +### Summary Statistics -We use two approaches to aggregate Aboveground Biomass (AGB) and Soil Organic Carbon (SOC) for mixed systems: +```{r summary-table} +summary_stats <- combined_output |> + group_by(variable, mix_order, mix_type) |> + summarise( + mean = mean(value_combined, na.rm = TRUE), + sd = sd(value_combined, na.rm = TRUE), + .groups = "drop" + ) |> + tidyr::pivot_wider( + id_cols = c(mix_order, mix_type), + names_from = variable, + values_from = c(mean, sd), + names_glue = "{tolower(variable)}_{.value}" + ) + +knitr::kable(summary_stats, + digits = 2, + col.names = c("Mixture", "Type", "AGB (mean)", "AGB (sd)", "SOC (mean)", "SOC (sd)") +) +``` -1. **Weighted Average Method** - Each output is combined in proportion to the fractional cover of annual crops (`ac`) and woody perennials (`wp`): +### Carbon Pool Sizes For Different Mixture Types + +What we expect to see here: + +- The discrete mixtures represent weighted averages of the component PFTs (so annual + 25% woody). +- The overlap mixtures will represent the woody perennial, with a small contribution from the annual crop. + +```{r carbon-pools} +# Ensemble-level annual medians (per ensemble) for absolute pool sizes +annual_stats_ens <- combined_output |> + mutate(year = year(datetime)) |> + group_by(variable, mix_order, mix_type, year, ensemble_id) |> + summarise(q50 = median(value_combined, na.rm = TRUE), .groups = "drop") + +# ensure we have a year column and data +if (nrow(annual_stats_ens) > 0) { + last_year <- max(annual_stats_ens$year, na.rm = TRUE) + + # end-year ensemble values (absolute magnitudes) + endyr_ens <- annual_stats_ens |> + filter(year == last_year) + + if (nrow(endyr_ens) > 0) { + # Summarize across ensembles to get IQR + median per mix (for error bars) + summary_pools <- endyr_ens |> + group_by(variable, mix_order, mix_type) |> + summarise( + q25 = quantile(q50, 0.25, na.rm = TRUE), + q50 = median(q50, na.rm = TRUE), + q75 = quantile(q50, 0.75, na.rm = TRUE), + .groups = "drop" + ) + + # Plot: ensemble points (jitter) + IQR + median; facet by variable + ggplot() + + geom_jitter( + data = endyr_ens, + aes(x = mix_order, y = q50, color = mix_type), + width = 0.15, alpha = 0.5, size = 1.2 + ) + + geom_errorbar( + data = summary_pools, + aes(x = mix_order, ymin = q25, ymax = q75), + width = 0.4, color = "black", linewidth = 0.6 + ) + + geom_point( + data = summary_pools, + aes(x = mix_order, y = q50), + color = "black", size = 2 + ) + + facet_wrap(~ variable, scales = "free_y", nrow = 1, labeller = mix_labeller) + # use mix_labeller + scale_x_discrete(labels = mix_label_fn) + # wrap x labels into two lines + scale_color_manual(values = pal_mix) + + labs( + title = paste0("End-Year Carbon Pool Sizes (", last_year, ")"), + subtitle = "Points = ensemble members; bars = IQR; black dot = median", + x = "Mixture", + y = expression("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), + color = "Type" + ) + + base_theme + + theme(axis.text.x = element_text(angle = 35, hjust = 1)) + } else { + message("No end-year ensemble data available for carbon pool sizes") + } +} else { + message("No ensemble annual stats available to compute carbon pool sizes") +} +``` - $$ - \text{AGB}_{agg} = \text{AGB}_{ac} \times ac + \text{AGB}_{wp} \times wp - $$ - $$ - \text{SOC}_{agg} = \text{SOC}_{ac} \times ac + \text{SOC}_{wp} \times wp - $$ +### Effect Size (Final Year Δ vs Monocultures) + +These plots show the change in carbon pool sizes for each mixture compared to the monoculture baselines. Left panels are relative to annual monocultures; right panels are relative to woody monocultures. +```{r effect-size-probabilistic} +# Compute ensemble-level annual medians (per ensemble) so deltas are calculated within each ensemble +annual_stats_ens <- combined_output |> + group_by(variable, mix_description, mix_order, mix_type, year, ensemble_id) |> + summarise( + q50 = median(value_combined, na.rm = TRUE), + .groups = "drop" + ) + +last_year <- max(annual_stats_ens$year, na.rm = TRUE) + +# Mixture ensemble results (exclude pure monocultures) +mix_ens <- annual_stats_ens |> + filter(year == last_year, !(mix_description %in% c("annual", "woody"))) |> + select(variable, mix_description, mix_order, mix_type, ensemble_id, q50) + +# Baseline per ensemble for monocultures (annual & woody) +base_ens <- annual_stats_ens |> + filter(year == last_year, mix_description %in% c("annual", "woody")) |> + select(variable, mix_description, mix_order, ensemble_id, baseline = q50) + +# Join ensemble-specific baselines to mixtures so deltas are ensemble-specific +delta_ens <- mix_ens |> + left_join( + base_ens |> + filter(mix_description == "annual") |> + select(variable, ensemble_id, annual_baseline = baseline), + by = c("variable", "ensemble_id") + ) |> + left_join( + base_ens |> + filter(mix_description == "woody") |> + select(variable, ensemble_id, woody_baseline = baseline), + by = c("variable", "ensemble_id") + ) |> + mutate( + delta_vs_annual = q50 - annual_baseline, + delta_vs_woody = q50 - woody_baseline + ) |> + pivot_longer( + cols = starts_with("delta_vs"), + names_to = "baseline_ref", + values_to = "delta" + ) |> + mutate( + baseline_ref = factor( + baseline_ref, + levels = c("delta_vs_annual", "delta_vs_woody"), + labels = c("Δ vs Annual monoculture", "Δ vs Woody monoculture") + ) + ) + +summary_delta <- delta_ens |> + group_by(variable, mix_order, baseline_ref) |> + summarise( + q25 = quantile(delta, 0.25, na.rm = TRUE), + q50 = median(delta, na.rm = TRUE), + q75 = quantile(delta, 0.75, na.rm = TRUE), + .groups = "drop" + ) + +# Jittered ensemble points + IQR bars + median points +ggplot() + + geom_hline(yintercept = 0, color = "grey60", linewidth = .4) + + geom_jitter( + data = delta_ens, + aes(x = mix_order, y = delta, color = mix_type), + width = 0.15, height = 0, alpha = 0.5, size = 1.2 + ) + + geom_errorbar( + data = summary_delta, + aes(x = mix_order, ymin = q25, ymax = q75), + width = 0.4, color = "black", linewidth = 0.5 + ) + + geom_point( + data = summary_delta, + aes(x = mix_order, y = q50), + color = "black", size = 0.75 + ) + + facet_grid(variable ~ baseline_ref, scales = "free_y", labeller = mix_labeller) + + scale_x_discrete(labels = mix_label_fn) + + scale_color_manual(values = pal_mix) + + labs( + title = paste0("Effect Size in ", last_year), + x = "Mixture", + y = expression(Delta ~ "(Mixture - Baseline) (" * kg ~ C ~ m^-2 * ")"), + color = "Type" + ) + + base_theme + + theme( + axis.text.x = element_text(angle = 35, hjust = 1), + panel.spacing.y = unit(1.5, "lines") # <-- increase vertical spacing between facet rows + ) +``` - This assumes both PFTs contribute to the total in proportion to their area. +--- -2. **Incremental Addition Method** - The woody perennial output is used as a base, and only the incremental increase from the annual crop is added, scaled by its fractional cover: +# Below is Draft Material - $$ - \text{AGB}_{add} = \text{AGB}_{wp} + \text{AGB}_{ac} \times ac - $$ - $$ - \text{SOC}_{add} = \text{SOC}_{wp} + (\text{SOC}_{ac} - \text{SOC}_{wp}) \times ac - $$ +--- - This reflects the assumption that adding ground cover increases system AGB and SOC only by the additional contribution from the annual crop, without reducing the woody perennial pools. +## Nothing to see here -```{r combine-outputs, message=FALSE} -# Define fractional cover for each PFT. -# This will eventially be specified by monitoring framework. -mixed_system <- readr::read_csv("data/mixed_system.csv") -# Define fractional cover scenarios -cover_scenarios <- tibble::tribble( - ~scenario, ~wp, ~ac, - "0:100", 0, 1.0, - "25:75", 0.25, 0.75, - "50:50", 0.50, 0.50, - "75:25", 0.75, 0.25, - "100:0", 1.0, 0 -) +--- -# Aggregated estimates for each scenario -mixed_system_scenarios <- mixed_system |> - cross_join(cover_scenarios) |> - mutate( - agb_agg = agb_ac * ac + agb_wp * wp, - soc_agg = soc_ac * ac + soc_wp * wp, - wp_present = ifelse(wp > 0, TRUE, FALSE), - agb_add = agb_wp * wp_present + agb_ac * ac, - soc_add = soc_wp + (soc_ac - soc_wp) * ac, - pct_ac = ac * 100, - pct_wp = wp * 100 - ) |> - select(-wp_present, -wp, -ac) +## These aren't as interesting. -``` +Except that the lines are wiggly +### Time Series (All Sites) -## Compare Different Methods and Scenarios Used Represent Mixed Systems - -The following code visualizes both AGB and SOC for all cover scenarios, using both aggregation methods. - -```{r visualize-pft-mixtures, message=FALSE, warning=FALSE} -# Long format for visualization -mixed_long <- mixed_system_scenarios |> - pivot_longer( - cols = c(agb_agg, soc_agg, agb_add, soc_add), - names_to = c("variable", "method"), - names_pattern = "(agb|soc)_(agg|add)", - values_to = "value" - ) |> - mutate( - date = as.Date(paste0(year, "-01-01")), - scenario = factor( - scenario, - levels = c("0:100", "25:75", "50:50", "75:25", "100:0") - ), - variable = factor( - variable, - levels = c("agb", "soc") - ), - method = factor( - method, - levels = c("agg", "add") +```{r} +for (v in vars) { + # create a stable plotting group so ensemble lines stay continuous + dv <- combined_output |> + filter(variable == v) |> + mutate(plot_group = interaction(ensemble_id, mix_description, scenario, site_id, drop = TRUE)) + + qv <- dv |> + group_by(variable, mix_order, datetime) |> + summarise( + q25 = quantile(value_combined, .25, na.rm = TRUE), + q50 = median(value_combined, na.rm = TRUE), + q75 = quantile(value_combined, .75, na.rm = TRUE), + .groups = "drop" ) - ) |> select(-ends_with("wp"), -ends_with("ac")) + + p <- ggplot() + + geom_line( + data = dv, + aes(datetime, value_combined, group = plot_group, color = mix_type), + alpha = .25, linewidth = .35 + ) + + geom_ribbon( + data = qv, + aes(datetime, ymin = q25, ymax = q75, group = mix_order), + inherit.aes = FALSE, alpha = 0.25, fill = "black", + linetype = "dashed" + ) + + geom_line( + data = qv, + aes(datetime, q50, group = mix_order), + color = "black", linewidth = .5 + ) + + facet_wrap(~mix_order, nrow = 2, labeller = mix_labeller, scales = "free_y") + + year_scale_dt + + scale_color_manual(values = pal_mix) + + labs( + title = paste("Time Series (Median & IQR):", v), + subtitle = "Lines=ensembles; black=median; ribbon=IQR", + x = NULL, y = bquote("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), + color = "Category" + ) + + base_theme + print(p) +} ``` +### Time Series (Representative Sites) + +Here we focus on representative sites for each mixture. + +We selected 3 sites that represent low, medium, and high productivity (AGB) defined as the 15th, 50th, and 85th percentiles of AGB. + ```{r} -mp <- mixed_long |> - filter(year == 2023) |> - ggplot(aes(x = scenario, y = value)) + - facet_grid( - variable ~ method, - scales = "free_y", - labeller = labeller( - .rows = c(agb = "AGB", soc = "SOC"), - .cols = c( - agg = "Weighted Average", - add = "Incremental Addition" - ) +for (v in vars) { + dv <- output_representative_sites |> + filter(variable == v) |> + # ensure ensemble lines are connected within each (site x mix) panel + mutate(plot_group = interaction(ensemble_id, mix_description, site_id, drop = TRUE)) + + qv <- dv |> + group_by(variable, agb_class, mix_order, datetime) |> + summarise( + q25 = quantile(value_combined, .25, na.rm = TRUE), + q50 = median(value_combined, na.rm = TRUE), + q75 = quantile(value_combined, .75, na.rm = TRUE), + .groups = "drop" ) - ) + - labs( - title = "Mixed System Aggregated Outputs by Scenario and Method", - x = "Mixture Scenario (Woody:Herbaceous)", - y = "Value" - ) + - theme_minimal() - - -mp + - geom_boxplot( - #width narrow - width = 0.2, - color = 'grey' - ) + - geom_line(aes(group = ensemble), alpha = 0.25) - -``` \ No newline at end of file + + p <- ggplot() + + geom_line( + data = dv, + aes(datetime, value_combined, + group = plot_group, color = mix_type + ), + alpha = .25, linewidth = .35 + ) + + geom_ribbon( + data = qv, + aes(datetime, ymin = q25, ymax = q75, group = interaction(agb_class, mix_order)), + inherit.aes = FALSE, alpha = 0.25, fill = "black", + linetype = "dashed" + ) + + geom_line( + data = qv, + aes(datetime, q50, group = interaction(agb_class, mix_order)), + color = "black", linewidth = .6 + ) + + # Layout: one row per productivity class (low/med/high AGB), one column per mix_order + facet_grid(agb_class ~ mix_order, labeller = mix_labeller, scales = "free_y") + + year_scale_dt + + scale_color_manual(values = pal_mix) + + labs( + title = paste("Representative Sites:", v), + subtitle = "Lines = ensemble members; black = median; ribbon = IQR (by AGB class)", + x = NULL, y = "Carbon Pool Size (kg C m-2)", color = "Category" + ) + + base_theme + print(p) +} +``` + + +## Appendix + +### Survey of Approaches to Simulating Mixed Vegetation with SIPNET + +We have considered a few approaches to modeling mixed cropping systems: + +1. Fractional coverage approach: Independent PFT runs aggregated by fractional coverage. [TK-Cite] +2. Composite PFT approach: Define a new parameter set representing the mixture (e.g., DayCent; Parton et al. 1998). +3. Vegetation demography approach: Cohort/size-structured (ED2; Moorcroft et al. 2001; FATES; Fisher et al. 2015; Koven et al. 2020) with explicit competition. + +Limitations: + +- Composite obscures marginal effects of adding/removing a component. +- Demography models are computationally heavier. +- Simple fractional weighting must treat legacy pools (SOC) carefully (use Δ rather than raw stocks when appropriate). + +### Healthy Soils Practices + +| Agronomic Practice | Model Representation | Expected GHG Impact on State GHG Inventory | Expected Adoption (CARB 2022 Scoping Plan) | Priority | +| :---- | :---- | :---- | :---- | :---- | +| Cover Crops | Planting/harvest like commodity; PFT/mixture params. | High | High | High | +| Non-Crop Carbon Amendments (manure/compost) | Add surface C & organic N (low C:N litter analog). | High | High | High | +| N Fertilization | Increase soil inorganic N pools. | High | High | High | +| Transition to Certified Organic | Composite of shifts (fertility, amendments, tillage). | High | High | High | +| No-till / Reduced till | Lower decomposition pulses (freq & magnitude). | High | Medium | High | +| Polycultures / Mixtures | Independent PFT runs; combine by fractional contribution. | Medium | Medium | Medium | +| (Water management placeholder) | Increase soil water content. | High | | | +| Annual → Perennial | Change PFT; partial biomass retained. | High | Low | Low | +| Agroforestry / Woody Buffers | Independent woody runs; spatial aggregation. | High | Low | Low | diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index a43e365..0be98d0 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -64,6 +64,8 @@ git clone git@github.com:ccmmf/downscaling - `RENV_PATHS_CACHE` and `RENV_PATHS_LIBRARY` store the `renv` cache and library in the CCMMF directory. These are in a subdirectory of the CCMMF directory in order to make them available across all users (and because on some computers, they exceed allocated space in the home directory). + - `R_LIBS_USER` must point to the platform and R version specific subdirectory inside `RENV_PATHS_LIBRARY`. + Example: `/projectnb/dietzelab/ccmmf/renv-library/linux-almalinux-8.10/R-4.4/x86_64-pc-linux-gnu` - `.Rprofile` - sets repositories from which R packages are installed - runs `renv/activate.R` diff --git a/scripts/030_extract_sipnet_output.R b/scripts/030_extract_sipnet_output.R index b12192b..e92ff29 100644 --- a/scripts/030_extract_sipnet_output.R +++ b/scripts/030_extract_sipnet_output.R @@ -2,7 +2,7 @@ # A long format CSV (time, site, ensemble, variable) # that follows the Ecological Forecasting Initiative (EFI) forecast standard -# Helper functions in R/efi_long_to_arrays.R will +# Helper functions in R/efi_long_to_arrays.R will convert this to # 1. A 4-D array (time, site, ensemble, variable) # 2. A NetCDF file (time, site, ensemble, variable) # TODO: write out EML metadata in order to be fully EFI compliant @@ -89,9 +89,8 @@ if (!PRODUCTION) { start_year <- end_year - 1 } -## Create of directories to process -## limited to directories that actually exist (ens_dirs) -## and that have site_ids and ensemble IDs +## Create dataframe of directories to process, filtered to include only +## existing directories (ens_dirs) that have site_ids and ensemble IDs ens_ids_str <- PEcAn.utils::left.pad.zeros(ens_ids) dirs_to_process <- ens_dirs |> dplyr::filter( @@ -157,13 +156,6 @@ ens_results <- ens_results_raw |> dplyr::summarise( prediction = mean(prediction, na.rm = TRUE), .groups = "drop" ) -# Append with change since start date -delta_df <- ens_results |> - dplyr::group_by(site_id, lat, lon, pft, parameter, variable) |> - dplyr::mutate(prediction = as.numeric(difftime(datetime, min(datetime), units = "days"))) |> - dplyr::ungroup() |> - dplyr::mutate(variable = paste0("delta_", variable)) -ens <- dplyr::bind_rows(ens_results, delta_df) # After extraction, ens_results$prediction is in kg C m-2 for both AGB and TotSoilCarb From 68ef94692435479b3b87f405c6d76e5f1cf408f3 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 4 Sep 2025 19:28:58 -0400 Subject: [PATCH 10/70] Added 031_agggregate_sipnet_output.R to handle PFT mixture logic Revise mixed_system_prototype.qmd for clarity and to fix incorrect labels. --- docs/mixed_system_prototype.qmd | 549 +++++++++++++++++--------------- 1 file changed, 291 insertions(+), 258 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 4dd0153..92a4fad 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -1,6 +1,8 @@ --- title: "Mixed-System Prototype (Two-PFT Aggregation)" -#quarto-cache: true +author: "David LeBauer" +date: "`r Sys.Date()`" +quarto-cache: true format: html: self-contained: true @@ -15,73 +17,60 @@ This document prototypes a workflow for modeling croplands with multiple crops ( ## Challenge -A number of Healthy Soils Practices (HSPs, ) involve mixed cropping systems. These include hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover crop mixtures (Appendix). +CARB identifies multiple practices that rely on mixed cropping systems—such as hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover-crop mixtures (CARB, 2022 and Appendix Table 1). -It is important to be able to represent scenarios that contain multiple crops at the same time. -The challenge is that SIPNET simulates only one Plant Functional Type (PFT) - a standard approach used in many land surface, biogeochemistry, and crop models. +It is important to be able to represent scenarios that contain multiple crops at the same time. The challenge is that SIPNET simulates only one Plant Functional Type (PFT) - a standard approach used in many land surface, biogeochemistry, and crop models. ## Approach ### Context -Here we only consider cropping systems where two PFTs are present at the same time. +Here we consider methods of simulating cropping systems where two _ecophysiologically distinct_ PFTs are present at the same time. These include woody buffers and orchards with ground cover. Here, ecophysiologically distinct means that the two PFTs have different growth forms (woody vs herbaceous) and/or phenology (perennial vs annual). -This includes: +We do not consider mixtures of plant species that are similar enough to represent as a single PFT, including cover crop mixtures and agroforestry. We note that the approach described below could be applied to these practices, and this will be considered if the additional complexity can be justified by potential reduction in uncertainty. -- Cover crop mixtures -- Agroforestry -- Woody Buffers -- Orchards with ground cover - -This does not include: - -- Cases where where the different crops separated in time: - - cover crops - - multiple crops in sequence - - annual to perennial crop (or vice-versa) transitions. -- Plant mixtures that can be represented as a single PFT. +We also do not consider practices in which different crops are planted in sequence. These include crop rotations, cover crops, and land use change. These scenarios can be represented as sequential model runs that change PFTs while maintaining a continuously evolving system state. ### Algorithm -The general approach for combining runs at each site is to: +The general approach for combining multiple PFTs into an aggregate estimate of carbon pool size (and later, GHG fluxes) is to: -1. Run SIPNET separately for each PFT -2. Combine the model outputs from the individual PFT. +1. Run SIPNET separately for each PFT. +2. Combine the model outputs from the individual PFT. +Technical details of these steps are described in more detail below. #### Initial Conditions Generation (IC) -1. Run as normal, but for each PFT at each site. -2. Settings File: - - For each mixed site, create separate configuration entries for each PFT; append `-PFT` to the site ID. -3. For herbaceous crops: - - annual herbaceous: assume start with bare ground (zero biomass). - - perennial herbaceous (MVP): start as bare ground run; consider methods for estimating initial biomass in later iterations. -4. For woody crops: - - Use existing approach to generate initial conditions. -5. Run PEcAn SIPNET pipeline to generate initial conditions for each PFT. -6. Soil Conditions: - - Assign whole-site values for soil organic carbon (SOC), soil water content, and other soil parameters to each PFT. +1. Run once for each PFT at each site. +2. Meteorology and soil initial conditions: + - Use the same meteorological driver data (ERA5 ensemble). + - Assign whole-site values for soil texture and soil organic carbon. +3. Settings File: + - For each mixed site, create separate a separate configuration file for each PFT; append `-PFT` to the site ID. +4. For herbaceous crops: + - Annual herbaceous: assume start with bare ground (zero biomass). + - Perennial herbaceous: start as bare ground run; consider methods for estimating initial biomass in later iterations. +5. For woody crops: + - Generate initial conditions. #### Model Execution - Execute PEcAn SIPNET model independently for each PFT and site combination. + - Use ensemble methods to propagate uncertainty and variance in model parameters, meteorological drivers and initial conditions. - Following PEcAn convention, netCDFs will be in `out/ENS-{ensemble_number}-{site_id}-{pft}`. #### Post-processing Outputs - -- Combine outputs: For each mixed site, combine model outputs using the appropriate approach (Discrete / Weighted or Overlap / Incremental). Combination methods are described in the following section. +- Combine outputs: For each mixed site, combine model outputs using one of the approaches defined in @sec-mixture-methods, below. - Standardize Results: Outputs are formatted into ensemble files (`combined_ensemble_output.csv`) for downstream analysis and visualization. #### Downscaling and Analysis 1. **Downscaling:** - - [open question] Build downscaling model from merged outputs, or downscale each PFT separately and then merge downscaled outputs? - Apply machine learning models trained separately for each PFT to downscale outputs. - - Combine downscaled outputs using fractional cover to produce integrated maps and analyses. + - Combine downscaled outputs using one of the methods described below. - -### Mixture Methods +### Mixture Methods {#sec-mixture-methods} #### Two Cases for Mixed Cropping Systems: Discrete and Overlap @@ -89,45 +78,51 @@ In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percen The two methods of mixing described below are "Discrete" and "Overlap". -We define $X$ as the carbon stock. For a finite time interval $\Delta t$ starting at $t_0$ (the start of the simulation) the change is $\Delta X_{\Delta t} = X(t_0 + \Delta t) - X(t_0)$. + +| Example System | Scenario | Method | Eqn. | +| :-------------- | :-------- | :----- | :------: | +| Annual Crops + Woody Hedgerow | Discrete | Weighted Average | [-@eq-discrete] | +| Orchard + Ground Cover | Overlap | Incremental | [-@eq-overlap] | + +: Methods for combining two plant functional types (PFTs) in mixed cropping systems. {#tbl-methods} + +**Notation** + +We define the following values: +- $X$ is the carbon stock (AGB or SOC). +- For a finite time interval $\Delta t$ starting at $t_0$ (the start of the simulation) the change is $\Delta X_{\Delta t} = X(t_0 + \Delta t) - X(t_0)$. +- $f_{woody}$ and $f_{annual}$ are the fractional contributions of each PFT to ecosystem dynamics. In the case of the discrete method, these represent cover and sum to 1. In the overlap case, $f_{woody} = 1$ and $f_{annual} < 1$. #### Scenario 1: Discrete (Weighted) -For the Discrete case, we use a _Weighted Average_ approach. +For the Discrete case, we use a _Weighted Average_ approach (@eq-discrete). -The Discrete case refers to mixed systems where the two different crops do not overlap. This covers two key scenarios: 1) each PFT occupies a distinct zones such as an annual crop with woody hedgerows or 2) a mixture of annuals. In this annual mixture case, the approach assumes that the plants do not overlap - even though this is not the case, it is a reasonable simplification. In the **discrete case**, we define $f_{woody} + f_{annual} = 1$ and the field scale ecosystem dynamics are approximated by weighting SIPNET outputs by the fractional area of each crop type within a field: +The Discrete case refers to mixed systems where the two different crops do not overlap. This covers two key scenarios: 1) each PFT occupies a distinct zones such as an annual crop with woody hedgerows or 2) a mixture of annuals. In this annual mixture case, the approach assumes that the plants do not overlap - even though this is not the case, it is a reasonable simplification. In the **discrete case**, the field scale ecosystem dynamics are approximated by weighting SIPNET outputs by the fractional area of each crop type within a field: $$ X_{\textrm{site}} = f_{\textrm{woody}}\cdot X_{\textrm{woody}} + f_{\textrm{annual}}\cdot X_{\textrm{annual}} -\label{eq:discrete}\tag{1} -$$ +$$ {#eq-discrete} -#### Scenario 2: **Overlap** +#### Scenario 2: Overlap -For the Overlap case, we use an _Incremental_ approach. +For the Overlap case, we use an _Incremental_ approach (@eq-overlap). -The Overlap case will apply when a ground cover is added to an orchard. When a ground cover is added to an orchard, it is not expected to reduce the contribution of the orchard species to ecosystem dynamics to a fraction. Rather, we keep $f_\textrm{woody} = 1$ and set $f_\textrm{annual} <1$ to represent the contribution of the ground cover to the system. Reducing $f_\textrm{woody}$ would be equivalent to removing standing orchard biomass at the moment the ground cover is planted. In this case, the ground cover effect is added proportional to its cover and biomass relative to a monoculture in full sun. +This scenario applies when a ground cover is added to an existing orchard orchard. We set $f_\textrm{woody} = 1$, reflecting the assumption that the addition of ground cover does not reduce the contribution of the woody species to ecosystem dynamics. Lowering this fraction below 1 at the moment of planting would represent an unrealistic scenario in which adding ground cover also reduced standing orchard biomass and associated impact of the woody species on ecosystem function. When woody biomass removal occurs, it is represented as a distinct harvest event. -We start by assuming that all orchards, both with and without ground cover, represent the same biomass as a monoculture, and set $f_{woody} = 1$. When ground cover is added, it is treated as addition to the ecosystem, and the value of $f_{annual}<1$ reflects the fraction of a monoculture that it represents - accounting for both percent of area that it covers as well as the fact that competition for light reduces its effect on ecosystem functioning. +The contribution of the ground cover is added _incrementally_ to the system. The parameter $f_\text{annual}$ represents the "ecosystem impact factor" - the fraction of a monoculture's influence on ecosystem functioning that the ground cover contributes to this mixed system. -$$ -X_\textrm{site} = X_\textrm{woody} + f_\textrm{annual}\cdot\Delta X_\textrm{annual} -\label{eq:overlap}\tag{2} -$$ - - -#### Summary Table +This factor is always less than 1 ($f_\textrm{annual} < 1$), as it accounts for both the percent of area covered by the ground cover and the reduction in its ecosystem impact due to competition for light and other resources. -| Example System | Scenario | Method | Eqn. | -| :-------------- | :-------- | :----- | :------: | -| Annual Crops + Woody Hedgerow | Discrete | Weighted Average | \eqref{eq:discrete} | -| Orchard + Ground Cover | Overlap | Incremental | \eqref{eq:overlap} | +Thus, when ground cover is added, it is treated as an addition to the ecosystem, and the orchard (woody monoculture) is assumed to remain unchanged. +$$ +X_\textrm{site} = X_\textrm{woody} + f_\textrm{annual}\cdot\Delta X_\textrm{annual} +$$ {#eq-overlap} ## Demonstration Here we illustrate this approach using the orchard + ground cover and annual crop + hedgerows scenarios. -We consider a set of mixtures that will be used in the examples and figures below; the percent columns indicate the fractional contribution of each Plant Functional Type (PFT) used for simple area-weighted aggregation. +We consider a set of mixtures used in the examples and figures below (@tbl-mixture-scenarios); the percent columns indicate the fractional contribution of each Plant Functional Type (PFT). | Scenario | Woody (%) | Annual (%) | Method | | :-------------------------------------------- | --------: | ---------: | :------- | @@ -138,7 +133,7 @@ We consider a set of mixtures that will be used in the examples and figures belo | Annual crop + 25% woody hedgerows | 25 | 75 | discrete | | Annual crop + 50% woody hedgerows | 50 | 50 | discrete | -These scenarios can be referenced by their descriptive names (e.g., "Orchard + 25% cover") in the downstream code and figures. +: Set of mixture scenarios used for demonstration. {#tbl-mixture-scenarios} ## Illustrative Examples @@ -146,7 +141,7 @@ These scenarios can be referenced by their descriptive names (e.g., "Orchard + 2 These runs use SIPNET grass and temperate deciduous PFT parameterizations to represent annual and perennial woody crops, respectively. -These were run at 10 from 2016 through 2024 with an ensemble size of 20. +These simulations were run from 2016 through 2024 across 10 sites, with an ensemble size of 20. ```{r setup} library(tidyverse) @@ -159,13 +154,16 @@ source(here::here("000-config.R")) if (!dir.exists(model_outdir)) { model_outdir <- "data" } +``` + +```{r plot-helpers} # helper: replace " + " with newline so labels use two lines mix_label_fn <- function(x) stringr::str_replace_all(x, " \\+ ", "\n+ ") -# labeller that applies the above to mix_order (fallback to a normal wrap for other strips) +# labeller that applies the above to mix_description (fallback to a normal wrap for other strips) mix_labeller <- labeller( - mix_order = mix_label_fn, + mix_description = mix_label_fn, .default = label_wrap_gen(width = 12) ) @@ -196,61 +194,59 @@ year_scale_dt <- scale_x_datetime(date_breaks = "1 year", ## Mixed-system dataset -The `combined_output` tibble contains the combined outputs for different scenarios (discrete, overlap). +The `combined_output` dataset contains the combined outputs for different scenarios (discrete, overlap). ```{r data-prep} # ---- Mixed-system dataset ---------------------------------------------------- # created in 031_aggregate_sipnet_output.R combined_output_csv <- here::here( - file.path(model_outdir, "combined_ensemble_output.csv") + file.path(model_outdir, "combined_ensemble_output.csv") ) combined_output <- readr::read_csv( - combined_output_csv, - show_col_types = FALSE + combined_output_csv, + show_col_types = FALSE ) |> - # map legacy variable name to short name used in plots/tables - mutate(variable = recode(variable, "TotSoilCarb" = "SOC")) |> - mutate( - # keep mix_description as the canonical value but display explicit labels via mix_order - # TODO: move recoding to 031_aggregate_sipnet_output.R - mix_order = factor( - mix_description, - levels = c( - "annual", "annual + 25% woody", "annual + 50% woody", - "woody", "woody + 25% annual", "woody + 50% annual" - ), - labels = c( - "100% annual", "100% annual + 25% woody", "100% annual + 50% woody", - "100% woody", "100% woody + 25% annual", "100% woody + 50% annual" - ) - ), - mix_type = case_when( - mix_description %in% c("annual", "woody") ~ "monoculture", - scenario == "discrete" ~ "discrete", - scenario == "overlap" ~ "overlap", - TRUE ~ "other" - ) - ) |> - arrange(across(any_of(c("site_id", "mix_description", "scenario", - "ensemble_id", "datetime")))) |> - mutate(line_group = interaction( - across(all_of(c("ensemble_id", "mix_description", "scenario", "site_id"))), - drop = TRUE - )) - + mutate( + variable = recode(variable, "TotSoilCarb" = "SOC"), + mix_description = factor( + mix_description, + levels = c( + "100% woody", + "100% annual", + "100% woody + 25% annual", + "100% woody + 50% annual", + "75% annual + 25% woody", + "50% annual + 50% woody" + ) + ), + mix_type = case_when( + mix_description %in% c("100% annual", "100% woody") ~ "monoculture", + scenario == "discrete" ~ "discrete", + scenario == "overlap" ~ "overlap", + TRUE ~ "other" + ) + ) |> + arrange(mix_description) + +# variables present (used by time-series sections below) vars <- unique(combined_output$variable) + ``` +**Selecting Representative Sites** + +Next we select representative sites based on their productivity. We use AGB in the 100% woody scenario as an indicator of productivity. We select sites that are near the 15th, 50th, and 85th percentiles of AGB and categorize them as low, medium, and high productivity. + ```{r select-representative-sites} agb_site <- combined_output |> - filter(mix_description == "woody", variable == "AGB") |> + filter(mix_description == "100% woody", variable == "AGB") |> group_by(site_id) |> - summarise(agb_mean = mean(value_combined, na.rm = TRUE), .groups = "drop") + summarise(agb_mean = mean(value_combined), .groups = "drop") if (nrow(agb_site) >= 3) { qs_agb <- quantile( agb_site$agb_mean, - probs = c(0.15, 0.50, 0.85), na.rm = TRUE + probs = c(0.15, 0.50, 0.85) ) analysis_sites <- purrr::map_chr(qs_agb, function(qv) { agb_site |> @@ -267,6 +263,9 @@ if (nrow(agb_site) >= 3) { agb_class = factor(c("low agb", "med_agb", "high_agb"), levels = c("low agb", "med_agb", "high_agb")) ) +} else { + PEcAn.logger::logger.severe("Looks like something went wrong", + "There are only", nrow(agb_site), " sites with AGB data. ") } # join and relabel classes for plotting (use human-friendly labels) @@ -286,22 +285,28 @@ output_representative_sites <- site_map |> ```{r summary-table} summary_stats <- combined_output |> - group_by(variable, mix_order, mix_type) |> + filter(year(datetime) == max(year(datetime))) |> + group_by(variable, mix_description, mix_type) |> summarise( - mean = mean(value_combined, na.rm = TRUE), - sd = sd(value_combined, na.rm = TRUE), + mean = mean(value_combined), + sd = sd(value_combined), .groups = "drop" ) |> + mutate( + # format as "mean (sd)" with one decimal place + val = sprintf("%.1f (%.1f)", mean, sd) + ) |> + select(-mean, -sd) |> tidyr::pivot_wider( - id_cols = c(mix_order, mix_type), + id_cols = c(mix_description, mix_type), names_from = variable, - values_from = c(mean, sd), - names_glue = "{tolower(variable)}_{.value}" + values_from = val ) -knitr::kable(summary_stats, - digits = 2, - col.names = c("Mixture", "Type", "AGB (mean)", "AGB (sd)", "SOC (mean)", "SOC (sd)") +knitr::kable( + summary_stats, + col.names = c("Mixture", "Type", "AGB", "SOC"), + caption = "Summary statistics (mean and standard deviation) for carbon pools by mixture type." ) ``` @@ -313,166 +318,204 @@ What we expect to see here: - The overlap mixtures will represent the woody perennial, with a small contribution from the annual crop. ```{r carbon-pools} -# Ensemble-level annual medians (per ensemble) for absolute pool sizes -annual_stats_ens <- combined_output |> - mutate(year = year(datetime)) |> - group_by(variable, mix_order, mix_type, year, ensemble_id) |> - summarise(q50 = median(value_combined, na.rm = TRUE), .groups = "drop") - -# ensure we have a year column and data -if (nrow(annual_stats_ens) > 0) { - last_year <- max(annual_stats_ens$year, na.rm = TRUE) - - # end-year ensemble values (absolute magnitudes) - endyr_ens <- annual_stats_ens |> - filter(year == last_year) - - if (nrow(endyr_ens) > 0) { - # Summarize across ensembles to get IQR + median per mix (for error bars) - summary_pools <- endyr_ens |> - group_by(variable, mix_order, mix_type) |> - summarise( - q25 = quantile(q50, 0.25, na.rm = TRUE), - q50 = median(q50, na.rm = TRUE), - q75 = quantile(q50, 0.75, na.rm = TRUE), - .groups = "drop" - ) +# Ensemble-level end-of-simulation medians (per ensemble) for absolute pool sizes +last_dt <- max(combined_output$datetime) +last_year <- lubridate::year(last_dt) + +# Final-time values per (site x ensemble) +combined_output_final_site_ens <- combined_output |> + filter(datetime == last_dt) |> + group_by(variable, mix_description, mix_type, site_id, ensemble_id) |> + summarise( + value = median(value_combined), + .groups = "drop" + ) + +# Points: site-level means across ensembles at final time +combined_output_site_mean <- combined_output_final_site_ens |> + group_by(variable, mix_description, mix_type, site_id) |> + summarise( + site_mean = mean(value), + .groups = "drop" + ) + +# Error bars + median dot: summarise across sites (using site-level means) +combined_output_summaries <- combined_output_site_mean |> + group_by(variable, mix_description, mix_type) |> + summarise( + q25 = quantile(site_mean, 0.25), + q50 = quantile(site_mean, 0.5), + q75 = quantile(site_mean, 0.75), + .groups = "drop" + ) # Plot: ensemble points (jitter) + IQR + median; facet by variable ggplot() + geom_jitter( - data = endyr_ens, - aes(x = mix_order, y = q50, color = mix_type), + data = combined_output_site_mean, + aes(x = mix_description, y = site_mean, color = mix_type), width = 0.15, alpha = 0.5, size = 1.2 ) + geom_errorbar( - data = summary_pools, - aes(x = mix_order, ymin = q25, ymax = q75), - width = 0.4, color = "black", linewidth = 0.6 + data = combined_output_summaries, + aes(x = mix_description, ymin = q25, ymax = q75, color = mix_type), + width = 0.4, linewidth = 0.6 ) + geom_point( - data = summary_pools, - aes(x = mix_order, y = q50), - color = "black", size = 2 + data = combined_output_summaries, + aes(x = mix_description, y = q50, color = mix_type), + size = 2 ) + facet_wrap(~ variable, scales = "free_y", nrow = 1, labeller = mix_labeller) + # use mix_labeller scale_x_discrete(labels = mix_label_fn) + # wrap x labels into two lines scale_color_manual(values = pal_mix) + labs( title = paste0("End-Year Carbon Pool Sizes (", last_year, ")"), - subtitle = "Points = ensemble members; bars = IQR; black dot = median", + subtitle = "Points = site means; bars = IQR; dot = median", x = "Mixture", y = expression("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), color = "Type" ) + base_theme + theme(axis.text.x = element_text(angle = 35, hjust = 1)) - } else { - message("No end-year ensemble data available for carbon pool sizes") - } -} else { - message("No ensemble annual stats available to compute carbon pool sizes") -} + ``` -### Effect Size (Final Year Δ vs Monocultures) +### Effect Size (Final Year AGB and SOC Compared to Annual Monoculture) + +These plots show the change in carbon pool sizes for each mixture compared to the annual monoculture baselines. Here we are only comparing to the annual monoculture, as this reflects the healthy soils practice of adding a woody components like a hedgerow to an annual cropping system. -These plots show the change in carbon pool sizes for each mixture compared to the monoculture baselines. Left panels are relative to annual monocultures; right panels are relative to woody monocultures. ```{r effect-size-probabilistic} # Compute ensemble-level annual medians (per ensemble) so deltas are calculated within each ensemble -annual_stats_ens <- combined_output |> - group_by(variable, mix_description, mix_order, mix_type, year, ensemble_id) |> - summarise( - q50 = median(value_combined, na.rm = TRUE), - .groups = "drop" - ) - -last_year <- max(annual_stats_ens$year, na.rm = TRUE) +annual_stats_ens <- combined_output_final_site_ens # Mixture ensemble results (exclude pure monocultures) -mix_ens <- annual_stats_ens |> - filter(year == last_year, !(mix_description %in% c("annual", "woody"))) |> - select(variable, mix_description, mix_order, mix_type, ensemble_id, q50) +mix_ens <- combined_output_final_site_ens |> + filter(!(mix_description %in% c("100% annual", "100% woody"))) |> + select(variable, mix_description, mix_type, site_id, ensemble_id, value) # Baseline per ensemble for monocultures (annual & woody) base_ens <- annual_stats_ens |> - filter(year == last_year, mix_description %in% c("annual", "woody")) |> - select(variable, mix_description, mix_order, ensemble_id, baseline = q50) + filter(mix_description %in% c("100% annual", "100% woody")) |> + select(variable, mix_description, site_id, ensemble_id, baseline = value) # Join ensemble-specific baselines to mixtures so deltas are ensemble-specific delta_ens <- mix_ens |> - left_join( - base_ens |> - filter(mix_description == "annual") |> - select(variable, ensemble_id, annual_baseline = baseline), - by = c("variable", "ensemble_id") - ) |> - left_join( - base_ens |> - filter(mix_description == "woody") |> - select(variable, ensemble_id, woody_baseline = baseline), - by = c("variable", "ensemble_id") - ) |> - mutate( - delta_vs_annual = q50 - annual_baseline, - delta_vs_woody = q50 - woody_baseline - ) |> - pivot_longer( - cols = starts_with("delta_vs"), - names_to = "baseline_ref", - values_to = "delta" - ) |> - mutate( - baseline_ref = factor( - baseline_ref, - levels = c("delta_vs_annual", "delta_vs_woody"), - labels = c("Δ vs Annual monoculture", "Δ vs Woody monoculture") + left_join( + base_ens |> + filter(mix_description == "100% annual") |> + select(variable, site_id, ensemble_id, annual_baseline = baseline), + by = c("variable", "site_id", "ensemble_id") + ) |> + left_join( + base_ens |> + filter(mix_description == "100% woody") |> + select(variable, site_id, ensemble_id, woody_baseline = baseline), + by = c("variable", "site_id", "ensemble_id") + ) |> + mutate( + delta_vs_annual = value - annual_baseline, + delta_vs_woody = value - woody_baseline + ) |> + pivot_longer( + cols = starts_with("delta_vs"), + names_to = "baseline_ref", + values_to = "delta" + ) |> + mutate( + baseline_ref = factor( + baseline_ref, + levels = c("delta_vs_annual", "delta_vs_woody"), + labels = c("Δ vs Annual monoculture", "Δ vs Woody monoculture") + ) ) - ) -summary_delta <- delta_ens |> - group_by(variable, mix_order, baseline_ref) |> - summarise( - q25 = quantile(delta, 0.25, na.rm = TRUE), - q50 = median(delta, na.rm = TRUE), - q75 = quantile(delta, 0.75, na.rm = TRUE), - .groups = "drop" - ) +# Points: site-level medians (across ensembles) +delta_site_median <- delta_ens |> + group_by(variable, mix_description, baseline_ref, site_id, mix_type) |> + summarise(site_median = median(delta), .groups = "drop") + +# Error bars + median dot: across-site summaries +summary_delta <- delta_site_median |> + group_by(variable, mix_description, baseline_ref, mix_type) |> + summarise( + q25 = quantile(site_median, 0.25), + q50 = median(site_median, 0.5), + q75 = quantile(site_median, 0.75), + .groups = "drop" + ) + +# Plot only relative to annual monoculture, reflecting HSP +summary_delta_annual <- summary_delta |> + filter(baseline_ref == "Δ vs Annual monoculture") + +delta_site_median_annual <- delta_site_median |> + filter(baseline_ref == "Δ vs Annual monoculture") # Jittered ensemble points + IQR bars + median points ggplot() + - geom_hline(yintercept = 0, color = "grey60", linewidth = .4) + - geom_jitter( - data = delta_ens, - aes(x = mix_order, y = delta, color = mix_type), - width = 0.15, height = 0, alpha = 0.5, size = 1.2 - ) + - geom_errorbar( - data = summary_delta, - aes(x = mix_order, ymin = q25, ymax = q75), - width = 0.4, color = "black", linewidth = 0.5 - ) + - geom_point( - data = summary_delta, - aes(x = mix_order, y = q50), - color = "black", size = 0.75 - ) + - facet_grid(variable ~ baseline_ref, scales = "free_y", labeller = mix_labeller) + - scale_x_discrete(labels = mix_label_fn) + - scale_color_manual(values = pal_mix) + - labs( - title = paste0("Effect Size in ", last_year), - x = "Mixture", - y = expression(Delta ~ "(Mixture - Baseline) (" * kg ~ C ~ m^-2 * ")"), - color = "Type" - ) + - base_theme + - theme( - axis.text.x = element_text(angle = 35, hjust = 1), - panel.spacing.y = unit(1.5, "lines") # <-- increase vertical spacing between facet rows - ) + geom_hline(yintercept = 0, color = "grey60", linewidth = .4) + + geom_jitter( + data = delta_site_median_annual, + aes(x = mix_description, y = site_median, color = mix_type), + width = 0.15, height = 0, alpha = 0.5, size = 1.2 + ) + + geom_errorbar( + data = summary_delta_annual, + aes(x = mix_description, ymin = q25, ymax = q75, color = mix_type), + width = 0.4, linewidth = 0.5 + ) + + geom_point( + data = summary_delta_annual, + aes(x = mix_description, y = q50, color = mix_type), + size = 0.9 + ) + + facet_wrap(variable ~ baseline_ref, scales = "free_y", nrow = 1, labeller = mix_labeller) + + scale_x_discrete(labels = mix_label_fn) + + scale_color_manual(values = pal_mix) + + labs( + title = paste0("Effect Size in ", last_year), + x = "Mixture", + y = expression(Delta ~ "(Mixture - Baseline) (" * kg ~ C ~ m^-2 * ")"), + color = "Type" + ) + + base_theme + + theme( + axis.text.x = element_text(angle = 35, hjust = 1), + panel.spacing.y = unit(1.5, "lines") # <-- increase vertical spacing between facet rows + ) ``` + +## Other Approaches Considered + +We evaluated several alternative approaches for representing mixed cropping systems: + +1. Fractional coverage approach (mosaic / tiling) + Each PFT is simulated separately and then combined by fractional area weighting (Li & Arora, 2011). This is the "Discrete" method described above (@eq-discrete). +2. Composite PFT approach + A new parameter set is defined to represent the mixture as a single “composite” PFT (e.g., DayCent: Parton et al., 1998). +3. Vegetation demography approach + Individual or cohort based models explicitly represent competition for light and resources among PFTs, as well as disturbance and successional dynamics (Fisher et al., 2018). + +Limitations: + +- Composite PFTs: By lumping mixture components into a single parameterization prevents explicit evaluation of the incremental contribution of individual PFTs - it would be harder to compare an orchard monoculture to an orchard+ground cover system. It can also bias estimates of pools and fluxes relative to the weighted mixture model (Li & Arora, 2011). +- Fractional weighting: While more realistic, this approach requires careful treatment of long-lived pools (e.g., soil organic carbon, woody biomass). If the PFTs share a common soil, raw stocks cannot be summed without double-counting; instead, $\Delta$ stocks relative to a common baseline (or shared-pool formulations) should be aggregated - as is done in the "Overlap" method above (@eq-overlap). +- Demography models provide mechanistic representation of competition but impose higher computational costs and require additional data and parameterizations (Fisher et al., 2018; Fisher & Koven, 2020). + + +## References + +CARB (2022). Draft 2022 Scoping Plan for Achieving Carbon Neutrality. https://ww2.arb.ca.gov/sites/default/files/2022-09/2022-draft-sp-appendix-b-draft-ea-recirc.pdf + +Fisher and Koven, 2020. Perspectives on the future of land surface models and the challenges of representing complex terrestrial systems. JAMES 12, e2018MS001453. https://doi.org/10.1029/2018MS001453 + +Fisher et al., 2018. Vegetation demographics in Earth system models: a review of progress and priorities. Global Change Biol. 24, 35–54. https://doi.org/10.1111/gcb.13910 + +Li, R., Arora, V.K., 2011. Effect of mosaic representation of vegetation in land surface schemes on simulated energy and carbon balances. https://doi.org/10.5194/bgd-8-5849-2011 + + From 258ef31cd6ca2849a6e1c24fc44f0a2b7259d1c0 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 4 Sep 2025 20:47:05 -0700 Subject: [PATCH 11/70] Update docs/mixed_system_prototype.qmd Co-authored-by: Chris Black --- docs/mixed_system_prototype.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 92a4fad..155f91a 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -74,7 +74,7 @@ Technical details of these steps are described in more detail below. #### Two Cases for Mixed Cropping Systems: Discrete and Overlap -In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percent contribution of a crop or plant functional type to ecosystem dynamics. This contribution is not "canopy cover" since that changes over time. Think of this as "percent of a monoculture". This method will build on SIPNET's single PFT design - the outputs from two separate single PFT runs simulate will be combined. +In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percent contribution of a crop or plant functional type to ecosystem dynamics. This contribution is not "canopy cover" since that changes over time. Think of this as "percent of a monoculture". This method will build on SIPNET's single PFT design - the outputs from two separate single PFT runs simulate will be combined. The two methods of mixing described below are "Discrete" and "Overlap". From 07ea16791fdd0e3f2dac4fb932c93557cc2516b2 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 4 Sep 2025 21:32:28 -0400 Subject: [PATCH 12/70] add warning that we need new method for flux variables --- scripts/030_extract_sipnet_output.R | 40 +++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/scripts/030_extract_sipnet_output.R b/scripts/030_extract_sipnet_output.R index e92ff29..5b818ca 100644 --- a/scripts/030_extract_sipnet_output.R +++ b/scripts/030_extract_sipnet_output.R @@ -150,20 +150,50 @@ ens_results <- ens_results_raw |> ) |> dplyr::select(-site_id) |> dplyr::rename(site_id = base_site_id) |> - dplyr::select(datetime, site_id, lat, lon, pft, parameter, variable, prediction) |> - dplyr::mutate(datetime = lubridate::floor_date(datetime, unit = "month")) |> - dplyr::group_by(datetime, site_id, lat, lon, pft, parameter, variable) |> + dplyr::select(datetime, site_id, lat, lon, pft, parameter, variable, prediction) + +# Classify variables as 'pool' (state) vs 'flux' (rate) using PEcAn standard_vars. +std_vars <- PEcAn.utils::standard_vars + +pool_vars <- std_vars |> + dplyr::filter(stringr::str_detect(tolower(Category), "pool")) |> + dplyr::pull(Variable.Name) + +flux_vars <- std_vars |> + dplyr::filter(stringr::str_detect(tolower(Category), "flux")) |> + dplyr::pull(Variable.Name) + +ens_results <- ens_results |> + dplyr::mutate( + datetime = lubridate::floor_date(datetime, unit = "month"), + variable_type = dplyr::case_when( + variable %in% pool_vars ~ "pool", + variable %in% flux_vars ~ "flux", + TRUE ~ "unknown" + ) + ) |> + dplyr::group_by(datetime, site_id, lat, lon, pft, parameter, variable, variable_type) |> dplyr::summarise( - prediction = mean(prediction, na.rm = TRUE), .groups = "drop" + prediction = mean(prediction, na.rm = TRUE), + .groups = "drop" ) +# Warn if flux variables are present because users may need to treat them differently. +if (any(ens_results$variable_type == "flux")) { + PEcAn.logger::logger.severe( + "Flux variables detected in ensemble output. Note: averaging flux (rate) variables", + "across ensembles/sites or over time can be misleading. Consider computing cumulative", + "fluxes over simulation period", + ) +} + # After extraction, ens_results$prediction is in kg C m-2 for both AGB and TotSoilCarb # restore logging logger_level <- PEcAn.logger::logger.setLevel(logger_level) ensemble_output_csv <- file.path(model_outdir, "ensemble_output.csv") -readr::write_csv(ens, ensemble_output_csv) +readr::write_csv(ens_results, ensemble_output_csv) PEcAn.logger::logger.info( "\nEnsemble output extraction complete.", "\nResults saved to ", ensemble_output_csv From 2758e83fe029b8850ade1e3efd9471c232a002cb Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 4 Sep 2025 23:58:04 -0400 Subject: [PATCH 13/70] addressing pr suggestions --- .Renviron | 2 ++ .Rprofile | 59 ++++++++++++++++++++++++++++++--- 000-config.R | 6 ++-- docs/mixed_system_prototype.qmd | 8 +++-- 4 files changed, 65 insertions(+), 10 deletions(-) diff --git a/.Renviron b/.Renviron index 1eecd28..aab4f46 100644 --- a/.Renviron +++ b/.Renviron @@ -12,3 +12,5 @@ R_LIBS_USER="${CCMMF_DIR}/renv-library/linux-almalinux-8.10/R-4.4/x86_64-pc-linu UDUNITS2_LIBS="-L/share/pkg.8/udunits/2.2.28/install/lib -ludunits2" UDUNITS2_INCLUDE="-I/share/pkg.8/udunits/2.2.28/install/include" +# Default graphics device +R_DEFAULT_DEVICE=png diff --git a/.Rprofile b/.Rprofile index 7aabf92..9454c40 100644 --- a/.Rprofile +++ b/.Rprofile @@ -5,11 +5,60 @@ options(repos = c( CRAN = "https://cloud.r-project.org" )) -Sys.setenv(R_LIBS_USER = file.path( - Sys.getenv("RENV_PATHS_LIBRARY"), - renv:::renv_platform_prefix() -)) +# Sys.setenv(R_LIBS_USER = file.path( +# Sys.getenv("RENV_PATHS_LIBRARY"), +# renv:::renv_platform_prefix() +# )) options(renv.config.autoloader.enabled = FALSE) if (requireNamespace("renv", quietly = TRUE)) { renv::activate() -} \ No newline at end of file +} + +options(bitmapType = "cairo") + + +### Get plotting to work in VS Code on HPC +## Always prefer cairo bitmaps (avoids X11) +options(bitmapType = "cairo") + +## Interactive: try httpgd, else fall back to headless pdf +if (interactive()) { + # Prefer httpgd (works in VS Code without X11). If missing, use OS GUI. + if (requireNamespace("httpgd", quietly = TRUE)) { + # Make httpgd the default device for *all* interactive plots + options(device = function(...) httpgd::hgd(silent = TRUE, ...)) + # Start a viewer immediately so first plot appears + try(httpgd::hgd(silent = TRUE), silent = TRUE) + } else if (.Platform$OS.type == "windows") { + options(device = "windows") + } else if (capabilities("aqua")) { # macOS + options(device = "quartz") + } else { + # Last resort: warn instead of silently discarding plots + packageStartupMessage( + "No interactive device. Install 'httpgd' for X11-free plotting in VS Code: ", + "install.packages('httpgd') or remotes::install_github('r-lib/httpgd')" + ) + } +} + +## Non-interactive (e.g. quarto render): set knitr defaults once +if (!interactive() && requireNamespace("knitr", quietly = TRUE)) { + dev_choice <- if (requireNamespace("ragg", quietly = TRUE)) { + "ragg_png" + } else if (requireNamespace("svglite", quietly = TRUE)) { + "svglite" + } else if (requireNamespace("Cairo", quietly = TRUE)) { + "cairo_png" + } else { + "png" + } + + knitr::opts_chunk$set( + dev = dev_choice, + fig.width = 7, + fig.height = 4, + dpi = 144 + ) +} +cat("global Rprofile loaded\n") diff --git a/000-config.R b/000-config.R index 3009497..d9d729b 100644 --- a/000-config.R +++ b/000-config.R @@ -31,7 +31,7 @@ pecan_outdir <- file.path(ccmmf_dir, "modelout", "ccmmf_phase_2b_mixed_pfts_2025 # # Global switch to toggle between fast, small scale runs for development and testing # and full-scale production runs. Works by subsetting various data objects. -PRODUCTION <- FALSE +PRODUCTION <- TRUE # **Variables to extract** # see docs/workflow_documentation.qmd for complete list of outputs @@ -41,7 +41,9 @@ outputs_to_extract <- c( ) if(!PRODUCTION) { - outputs_to_extract <- outputs_to_extract + # can subset for testing + # depending on what part of the workflow you are testing + # outputs_to_extract <- outputs_to_extract[1] } ### Configuration Settings that can be set to default ### diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 155f91a..d73c268 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -17,7 +17,7 @@ This document prototypes a workflow for modeling croplands with multiple crops ( ## Challenge -CARB identifies multiple practices that rely on mixed cropping systems—such as hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover-crop mixtures (CARB, 2022 and Appendix Table 1). +CARB identifies multiple practices from the Healthy Soils Program (https://www.cdfa.ca.gov/oars/healthysoils/) that rely on mixed cropping systems—such as hedgerows, orchard ground cover, windbreaks, riparian buffers, intercropping, and cover-crop mixtures (CARB, 2022 and Appendix @tbl-hsp). It is important to be able to represent scenarios that contain multiple crops at the same time. The challenge is that SIPNET simulates only one Plant Functional Type (PFT) - a standard approach used in many land surface, biogeochemistry, and crop models. @@ -45,7 +45,7 @@ Technical details of these steps are described in more detail below. 1. Run once for each PFT at each site. 2. Meteorology and soil initial conditions: - Use the same meteorological driver data (ERA5 ensemble). - - Assign whole-site values for soil texture and soil organic carbon. + - Use the same soil texture and soil organic carbon. 3. Settings File: - For each mixed site, create separate a separate configuration file for each PFT; append `-PFT` to the site ID. 4. For herbaceous crops: @@ -637,6 +637,7 @@ for (v in vars) { } ``` +--> ## Appendix @@ -653,4 +654,5 @@ for (v in vars) { | (Water management placeholder) | Increase soil water content. | High | | | | Annual → Perennial | Change PFT; partial biomass retained. | High | Low | Low | | Agroforestry / Woody Buffers | Independent woody runs; spatial aggregation. | High | Low | Low | ---> + +: Healthy Soils Program practices, model representation, and priority for implementation in SIPNET. {#tbl-hsp} \ No newline at end of file From 7e4c034f7c635c22975563cff7912542da4f34f6 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 5 Sep 2025 00:17:57 -0400 Subject: [PATCH 14/70] small wording changes to explain figs --- docs/mixed_system_prototype.qmd | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index d73c268..44eb8f4 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -312,10 +312,14 @@ knitr::kable( ### Carbon Pool Sizes For Different Mixture Types -What we expect to see here: +What we expect to see here: -- The discrete mixtures represent weighted averages of the component PFTs (so annual + 25% woody). -- The overlap mixtures will represent the woody perennial, with a small contribution from the annual crop. +- Discrete mixtures: Carbon pools scale approximately as the fractional (area) weighted average of the two monocultures. +- Overlap mixtures: Pools are dominated by the woody monoculture plus a modest incremental contribution from the annual component. + ```{r carbon-pools} # Ensemble-level end-of-simulation medians (per ensemble) for absolute pool sizes @@ -383,7 +387,7 @@ combined_output_summaries <- combined_output_site_mean |> ### Effect Size (Final Year AGB and SOC Compared to Annual Monoculture) -These plots show the change in carbon pool sizes for each mixture compared to the annual monoculture baselines. Here we are only comparing to the annual monoculture, as this reflects the healthy soils practice of adding a woody components like a hedgerow to an annual cropping system. +These plots show the change in carbon pool sizes for each mixture compared to the annual monoculture baselines. Because we are considering the healthy soils practice of adding a woody component like a hedgerow to an annual cropping system, we use annual monoculture as the point of comparison. ```{r effect-size-probabilistic} # Compute ensemble-level annual medians (per ensemble) so deltas are calculated within each ensemble From da406d582b1fec4627bc3284fc2105648b76c6a1 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 5 Sep 2025 00:42:57 -0400 Subject: [PATCH 15/70] few more minor changes --- docs/mixed_system_prototype.qmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 44eb8f4..da01422 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -260,8 +260,8 @@ if (nrow(agb_site) >= 3) { } site_map <- tibble( site_id = analysis_sites, - agb_class = factor(c("low agb", "med_agb", "high_agb"), - levels = c("low agb", "med_agb", "high_agb")) + agb_class = factor(c("low_agb", "med_agb", "high_agb"), + levels = c("low_agb", "med_agb", "high_agb")) ) } else { PEcAn.logger::logger.severe("Looks like something went wrong", @@ -274,7 +274,7 @@ output_representative_sites <- site_map |> mutate( agb_class = factor( agb_class, - levels = c("low agb", "med_agb", "high_agb"), + levels = c("low_agb", "med_agb", "high_agb"), labels = c("low AGB site", "med AGB site", "high AGB site") ) ) @@ -303,6 +303,8 @@ summary_stats <- combined_output |> values_from = val ) +# TODO future iterations, don't assume columns; +# use glue to construct dynamically knitr::kable( summary_stats, col.names = c("Mixture", "Type", "AGB", "SOC"), From 110191696c5fd992956d863828085a28a1bb9ac8 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 5 Sep 2025 01:44:09 -0400 Subject: [PATCH 16/70] few more tweaks --- docs/mixed_system_prototype.qmd | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index da01422..240d968 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -7,7 +7,7 @@ format: html: self-contained: true execute: -# cache: true + cache: true echo: false warning: false message: false @@ -61,14 +61,18 @@ Technical details of these steps are described in more detail below. - Following PEcAn convention, netCDFs will be in `out/ENS-{ensemble_number}-{site_id}-{pft}`. #### Post-processing Outputs -- Combine outputs: For each mixed site, combine model outputs using one of the approaches defined in @sec-mixture-methods, below. +- Combine outputs: For each mixed site, combine model outputs using one of the approaches defined below. - Standardize Results: Outputs are formatted into ensemble files (`combined_ensemble_output.csv`) for downstream analysis and visualization. #### Downscaling and Analysis 1. **Downscaling:** - - Apply machine learning models trained separately for each PFT to downscale outputs. - - Combine downscaled outputs using one of the methods described below. + - Apply machine learning models trained separately for each PFT to downscale outputs. + - Combine downscaled outputs using one of the methods described below. +2. **Analysis:** + - Calibrate using site level mixtures. + - Validate against downscaled outputs. + ### Mixture Methods {#sec-mixture-methods} @@ -76,9 +80,6 @@ Technical details of these steps are described in more detail below. In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percent contribution of a crop or plant functional type to ecosystem dynamics. This contribution is not "canopy cover" since that changes over time. Think of this as "percent of a monoculture". This method will build on SIPNET's single PFT design - the outputs from two separate single PFT runs simulate will be combined. -The two methods of mixing described below are "Discrete" and "Overlap". - - | Example System | Scenario | Method | Eqn. | | :-------------- | :-------- | :----- | :------: | | Annual Crops + Woody Hedgerow | Discrete | Weighted Average | [-@eq-discrete] | @@ -89,6 +90,7 @@ The two methods of mixing described below are "Discrete" and "Overlap". **Notation** We define the following values: + - $X$ is the carbon stock (AGB or SOC). - For a finite time interval $\Delta t$ starting at $t_0$ (the start of the simulation) the change is $\Delta X_{\Delta t} = X(t_0 + \Delta t) - X(t_0)$. - $f_{woody}$ and $f_{annual}$ are the fractional contributions of each PFT to ecosystem dynamics. In the case of the discrete method, these represent cover and sum to 1. In the overlap case, $f_{woody} = 1$ and $f_{annual} < 1$. @@ -126,8 +128,8 @@ We consider a set of mixtures used in the examples and figures below (@tbl-mixtu | Scenario | Woody (%) | Annual (%) | Method | | :-------------------------------------------- | --------: | ---------: | :------- | -| 100% woody perennial (orchard / monoculture) | 100 | 0 | discrete | -| 100% annual herbaceous (monoculture) | 0 | 100 | discrete | +| 100% woody perennial | 100 | 0 | monoculture | +| 100% annual herbaceous | 0 | 100 | monoculture | | Orchard + 25% herbaceous ground cover | 100 | 25 | overlap | | Orchard + 50% herbaceous ground cover | 100 | 50 | overlap | | Annual crop + 25% woody hedgerows | 25 | 75 | discrete | @@ -194,7 +196,7 @@ year_scale_dt <- scale_x_datetime(date_breaks = "1 year", ## Mixed-system dataset -The `combined_output` dataset contains the combined outputs for different scenarios (discrete, overlap). +The `combined_ensemble_output.csv` file contains the combined outputs for different scenarios (discrete, overlap). ```{r data-prep} # ---- Mixed-system dataset ---------------------------------------------------- @@ -283,6 +285,10 @@ output_representative_sites <- site_map |> ### Summary Statistics + + ```{r summary-table} summary_stats <- combined_output |> filter(year(datetime) == max(year(datetime))) |> From c5175988f51072ef049431a3a3eaecdb8f36cb3b Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 5 Sep 2025 12:36:42 -0400 Subject: [PATCH 17/70] add mixed aggregation helper function --- R/mixed_aggregation.R | 1 - 1 file changed, 1 deletion(-) diff --git a/R/mixed_aggregation.R b/R/mixed_aggregation.R index 39b8ec3..1041c59 100644 --- a/R/mixed_aggregation.R +++ b/R/mixed_aggregation.R @@ -1,4 +1,3 @@ -# TODO: rename function #' Combine two-PFT outputs to represent mixed cropping systems #' #' Canonical rules for combining woody and annual PFT outputs. The function From 36a30541f853c7c90c884a262d9bb9d9bb79fbd0 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 5 Sep 2025 20:22:51 -0400 Subject: [PATCH 18/70] standardize naming of ensemble_data object --- scripts/031_aggregate_sipnet_output.R | 12 ++++++------ scripts/040_downscale_and_aggregate.R | 11 +---------- 2 files changed, 7 insertions(+), 16 deletions(-) diff --git a/scripts/031_aggregate_sipnet_output.R b/scripts/031_aggregate_sipnet_output.R index 16554e2..8a2835d 100644 --- a/scripts/031_aggregate_sipnet_output.R +++ b/scripts/031_aggregate_sipnet_output.R @@ -21,7 +21,7 @@ source(here::here("R", "mixed_aggregation.R")) # ---- Load ensemble output ---------------------------------------------------- ensemble_output_csv <- file.path(model_outdir, "ensemble_output.csv") -ens_output <- readr::read_csv(ensemble_output_csv) |> +ensemble_data <- readr::read_csv(ensemble_output_csv) |> # rename EFI std names for clarity # efi name | new name # parameter | ensemble_id @@ -38,7 +38,7 @@ ens_output <- readr::read_csv(ensemble_output_csv) |> # cover_fractions_csv <- file.path(model_outdir, "site_cover_fractions.csv") # distinct site-year combinations -distinct_site_year <- ens_output |> +distinct_site_year <- ensemble_data |> dplyr::mutate(year = lubridate::year(datetime)) |> dplyr::distinct(site_id, year) @@ -71,10 +71,10 @@ annual_pft <- "annual crop" mixed_overlap_pft <- "woody_annual_overlap_100_50" # new synthetic PFT label # ensemble members -ensemble_ids <- unique(ens_output$ensemble_id) +ensemble_ids <- unique(ensemble_data$ensemble_id) # annual_init values for each site x ensemble: value at the earliest datetime -annual_init <- ens_output |> +annual_init <- ensemble_data |> dplyr::filter(pft == "annual crop") |> dplyr::group_by(site_id, variable, ensemble_id) |> dplyr::slice_min(order_by = datetime, n = 1, with_ties = FALSE) |> @@ -92,7 +92,7 @@ annual_init <- ens_output |> dplyr::select(site_id, ensemble_id, variable, annual_init) # ---- Reshape ensemble output (wide by PFT) ----------------------------------- -.ens_wide <- ens_output |> +.ens_wide <- ensemble_data |> dplyr::mutate(year = lubridate::year(datetime)) |> dplyr::select( datetime, year, site_id, lat, lon, @@ -173,7 +173,7 @@ PEcAn.logger::logger.info("Wrote aggregated output: ", ens_combined_csv) # ---- Create EFI-compliant ensemble file with mixed overlap PFT ---- # Original EFI-style rows (restore EFI column names) -efi_original <- ens_output |> +efi_original <- ensemble_data |> dplyr::rename(parameter = ensemble_id, prediction = value) # Extract the specific overlap scenario: 100% woody, 50% annual (orchard + 50% ground cover) diff --git a/scripts/040_downscale_and_aggregate.R b/scripts/040_downscale_and_aggregate.R index 2a31459..92002b6 100644 --- a/scripts/040_downscale_and_aggregate.R +++ b/scripts/040_downscale_and_aggregate.R @@ -16,16 +16,7 @@ PEcAn.logger::logger.info("***Starting Downscaling and Aggregation***") # library(furrr) library(patchwork) # for combining plots -ensemble_file_base <- file.path(model_outdir, "ensemble_output.csv") -ensemble_file_mixed <- file.path(model_outdir, "ensemble_output_with_mixed.csv") - -if (file.exists(ensemble_file_mixed)) { - PEcAn.logger::logger.info("Using mixed ensemble file (includes overlap synthetic PFT): ", ensemble_file_mixed) - ensemble_file <- ensemble_file_mixed -} else { - PEcAn.logger::logger.info("Mixed ensemble file not found; using base ensemble file: ", ensemble_file_base) - ensemble_file <- ensemble_file_base -} +ensemble_file <- file.path(model_outdir, "ensemble_output.csv") ensemble_data <- readr::read_csv(ensemble_file) |> dplyr::rename( From 412ffd5d2cb747ad7cbc03789c5ff731ea66fd51 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Tue, 16 Sep 2025 01:08:44 -0400 Subject: [PATCH 19/70] in the middle of refactoring to handle multiple PFTs in the downscaling --- .Rprofile | 8 +- 000-config.R | 70 +++-- scripts/040_downscale_and_aggregate.R | 382 ++++++++++++++++++++------ scripts/041_downscale_analysis.R | 141 +++++----- 4 files changed, 423 insertions(+), 178 deletions(-) diff --git a/.Rprofile b/.Rprofile index 9454c40..652962d 100644 --- a/.Rprofile +++ b/.Rprofile @@ -1,8 +1,8 @@ ## --- Custom R repositories (CRAN + R-universe) ------------------------ options(repos = c( - pecanproject = "https://pecanproject.r-universe.dev", - ajlyons = "https://ajlyons.r-universe.dev", # caladaptr - CRAN = "https://cloud.r-project.org" + pecanproject = "https://pecanproject.r-universe.dev", + ajlyons = "https://ajlyons.r-universe.dev", # caladaptr + CRAN = "https://cloud.r-project.org" )) # Sys.setenv(R_LIBS_USER = file.path( @@ -61,4 +61,4 @@ if (!interactive() && requireNamespace("knitr", quietly = TRUE)) { dpi = 144 ) } -cat("global Rprofile loaded\n") +cat("repository Rprofile loaded\n") diff --git a/000-config.R b/000-config.R index d9d729b..12805cd 100644 --- a/000-config.R +++ b/000-config.R @@ -1,7 +1,7 @@ ### Workflow Configuration Settings ### # Check that we are in the correct working directory -if(!basename(here::here(getwd())) == 'downscaling') { +if (!basename(here::here(getwd())) == "downscaling") { PEcAn.logger::logger.error("Please run this script from the 'downscaling' directory") } @@ -11,7 +11,8 @@ options( # Print all tibble columns tibble.width = Inf, # Suppress readr::read_csv messages - readr.show_col_types = FALSE + readr.show_col_types = FALSE, + vroom.show_col_types = FALSE ) ## Set parallel processing options @@ -27,32 +28,38 @@ if (ccmmf_dir == "") { } pecan_outdir <- file.path(ccmmf_dir, "modelout", "ccmmf_phase_2b_mixed_pfts_20250701") # **Is this a test or production run?** -# Set to FALSE during testing and development +# Accepts command line argument '--production=false' to set testing mode # -# Global switch to toggle between fast, small scale runs for development and testing -# and full-scale production runs. Works by subsetting various data objects. -PRODUCTION <- TRUE +# Global switch to toggle between fast, small scale runs for development and testing +# and full-scale production runs. Defaults to production mode. +args <- commandArgs(trailingOnly = TRUE) +prod_arg <- grep("^--production=", args, value = TRUE) +if (length(prod_arg) > 0) { + PRODUCTION <- tolower(sub("^--production=", "", prod_arg)) != "false" +} else { + PRODUCTION <- TRUE +} # **Variables to extract** # see docs/workflow_documentation.qmd for complete list of outputs -outputs_to_extract <- c( - "TotSoilCarb", - "AGB" -) - -if(!PRODUCTION) { +# outputs_to_extract <- c( +# "TotSoilCarb", +# "AGB" +# ) +outputs_to_extract <- "TotSoilCarb" +if (!PRODUCTION) { # can subset for testing # depending on what part of the workflow you are testing - # outputs_to_extract <- outputs_to_extract[1] + outputs_to_extract <- outputs_to_extract[1] } ### Configuration Settings that can be set to default ### # Assume consistent directory structure for other directories -data_dir <- file.path(ccmmf_dir, "data") +data_dir <- file.path(ccmmf_dir, "data") raw_data_dir <- file.path(ccmmf_dir, "data_raw") cache_dir <- file.path(ccmmf_dir, "cache") -model_outdir <- file.path(pecan_outdir, "out") +model_outdir <- file.path(pecan_outdir, "out") # Misc set.seed(42) @@ -60,19 +67,22 @@ ca_albers_crs <- 3310 #### Messagees #### PEcAn.logger::logger.info("\n\n", - "##### SETTINGS SUMMARY #####\n\n", - "Running in", ifelse(PRODUCTION, "**production**", "**development**"), "mode\n\n", - "### Directory Settings ###\n", - "- CCMMF directory:", ccmmf_dir, "\n", - "- data_dir :", data_dir, "\n", - "- cache_dir :", cache_dir, "\n", - "- raw_data_dir. :", raw_data_dir, "\n", - "- pecan_outdir. :", pecan_outdir, "\n", - "- model_outdir. :", model_outdir, "\n\n", - "### Other Settings ###\n", - "- will extract variables:", paste(outputs_to_extract, collapse = ", "), "\n", - "- ca_albers_crs :", ca_albers_crs, - ifelse(ca_albers_crs == 3310, ", which is NAD83 / California Albers", ""), "\n", - wrap = FALSE - ) + "##### SETTINGS SUMMARY #####\n\n", + "Running in", ifelse(PRODUCTION, "**production**", "**development**"), "mode\n\n", + "### Directory Settings ###\n", + "- CCMMF directory:", ccmmf_dir, "\n", + "- data_dir :", data_dir, "\n", + "- cache_dir :", cache_dir, "\n", + "- raw_data_dir. :", raw_data_dir, "\n", + "- pecan_outdir. :", pecan_outdir, "\n", + "- model_outdir. :", model_outdir, "\n\n", + "### Other Settings ###\n", + "- will extract variables:", paste(outputs_to_extract, collapse = ", "), "\n", + "- ca_albers_crs :", ca_albers_crs, + ifelse(ca_albers_crs == 3310, ", which is NAD83 / California Albers", ""), "\n", + wrap = FALSE +) +# Source all R scripts in the R/ directory +r_scripts <- list.files(file.path(here::here(), "R"), pattern = "\\.R$", full.names = TRUE) +lapply(r_scripts, source) diff --git a/scripts/040_downscale_and_aggregate.R b/scripts/040_downscale_and_aggregate.R index 92002b6..5db4705 100644 --- a/scripts/040_downscale_and_aggregate.R +++ b/scripts/040_downscale_and_aggregate.R @@ -6,45 +6,56 @@ # - Write out a table with predicted biomass and SOC to maintain ensemble structure, ensuring correct error propagation and spatial covariance. # - Aggregates County-level biomass and SOC inventories # -## ----setup-------------------------------------------------------------------- +## ----debugging-------------------------------------------------------------------- +# debugonce(PEcAnAssimSequential::ensemble_downscale) +# PEcAn.logger::logger.setQuitOnSevere(TRUE) +# ----setup-------------------------------------------------------------------- + source("000-config.R") PEcAn.logger::logger.info("***Starting Downscaling and Aggregation***") -# library(tidyverse) -# library(sf) -# library(terra) -# library(furrr) -library(patchwork) # for combining plots - -ensemble_file <- file.path(model_outdir, "ensemble_output.csv") -ensemble_data <- readr::read_csv(ensemble_file) |> +#----- load ensemble data ---------------------------------------------------- +ensemble_csv <- file.path(model_outdir, "ensemble_output.csv") +ensemble_data <- readr::read_csv(ensemble_csv) |> dplyr::rename( ensemble = parameter # parameter is EFI std name for ensemble - # should decide if we want to change downstream code to use parameter - # including PECAnAssimSequential::subset_ensemble ) -# Optional: log available PFTs -PEcAn.logger::logger.info("PFTs in ensemble input: ", paste(sort(unique(ensemble_data$pft)), collapse = ", ")) - -x <- readr::read_csv(file.path(model_outdir, "combined_ensemble_output.csv")) - - ensemble_ids <- ensemble_data |> dplyr::pull(ensemble) |> unique() -start_date <- format(as.Date(min(ensemble_data$datetime)), "%Y-%m-%d") -end_date <- format(as.Date(max(ensemble_data$datetime)), "%Y-%m-%d") +start_date <- lubridate::as_date(min(ensemble_data$datetime)) +end_date <- lubridate::as_date(max(ensemble_data$datetime)) + +#--- load ca_fields ------------------------------------------------ +# this is a convenience time saver for development +if (!exists("ca_fields_full")) { + ca_fields_full <- sf::read_sf(file.path(data_dir, "ca_fields.gpkg")) +} + +ca_fields <- ca_fields_full |> + dplyr::select(site_id, county, area_ha) + +ca_field_attributes <- readr::read_csv(file.path(data_dir, "ca_field_attributes.csv")) + +# Determine PFTs and map ensemble keys (e.g. 'woody') to field labels +ensemble_pfts <- sort(unique(ensemble_data$pft)) +field_pfts <- sort(unique(ca_field_attributes$pft)) -### Random Forest using PEcAn downscale workflow -## ----------------------------------------------------------------------------- -design_points <- ensemble_data |> - dplyr::select(site_id, lat, lon, pft) |> - dplyr::distinct() +# map each ensemble key to itself (each key acts as its own label) +pfts <- intersect(ensemble_pfts, field_pfts) -covariates_csv <- file.path(data_dir, "site_covariates.csv") +if (length(pfts) == 0) { + PEcAn.logger::logger.error("No overlapping PFTs between ensemble data and field attributes") +} else { + PEcAn.logger::logger.info("Downscaling will be performed for these PFTs:", paste(pfts, collapse = ", ")) +} + +#--- load site covariates +covariates_csv <- file.path(data_dir, "site_covariates_plus_dp_patch.csv") +# covariates_csv <- file.path(data_dir, "site_covariates.csv") covariates <- readr::read_csv(covariates_csv) |> dplyr::select( site_id, where(is.numeric), @@ -54,66 +65,271 @@ covariates <- readr::read_csv(covariates_csv) |> covariate_names <- names(covariates |> dplyr::select(where(is.numeric))) -design_covariates <- design_points |> +PEcAn.logger::logger.info( + "Downscaling will use these covariates:\n\n", + paste(covariate_names, collapse = ", ") +) + +# ----define design points based on ensemble data------------------------------- +# TODO: move this sanitization upstream to when ens data is created (030_extract_sipnet_output.R) +# or better ... figure out why we so often run into mis-match!!! +# at least this time the missing site_ids all had matches within a few micrometers (10^-6 m) + +# Check for missing design points in covariates and match by proximity if needed +# required_dp_cols <- c("site_id", "lat", "lon", "pft") +# .design_points <- ensemble_data |> +# dplyr::select(dplyr::any_of(required_dp_cols)) |> +# dplyr::distinct() + +# design_points <- update_design_point_site_ids( +# .design_points, +# ca_field_attributes +# ) + +# TODO: Need to put a canonical design_points CSV in repository +### FOR NOW, just use hard coded design points +design_points <- structure(list(site_id = c( + "3a84c0268e1655a3", "3a84c0268e1655a3", + "d523652b399a8f6e", "d523652b399a8f6e", "275102c035b15f5e", "275102c035b15f5e", + "26ff9e8246f7c8f4", "26ff9e8246f7c8f4", "47cd11223bb49112", "47cd11223bb49112", + "9a4c7e47fc0297bb", "9a4c7e47fc0297bb", "e5bb4dca46bd5041", "e5bb4dca46bd5041", + "abd5a71d492e92e1", "abd5a71d492e92e1", "7fe5bb855fb36cdb", "7fe5bb855fb36cdb", + "7bb77bae6ac3c147", "7bb77bae6ac3c147" +), lat = c( + 34.91295, 34.91295, + 34.38596, 34.38596, 34.47244, 34.47244, 33.86884, 33.86884, 34.29708, + 34.29708, 33.96727, 33.96727, 33.35306, 33.35306, 34.37258, 34.37258, + 33.90119, 33.90119, 33.57847, 33.57847 +), lon = c( + -120.40345, + -120.40345, -118.81446, -118.81446, -119.22015, -119.22015, -117.40838, + -117.40838, -119.06014, -119.06014, -117.34049, -117.34049, -117.19182, + -117.19182, -119.03318, -119.03318, -117.40624, -117.40624, -116.03157, + -116.03157 +), pft = c( + "woody perennial crop", "annual crop", "woody perennial crop", "annual crop", "woody perennial crop", + "annual crop", "woody perennial crop", "annual crop", "woody perennial crop", "annual crop", "woody perennial crop", "annual crop", + "woody perennial crop", "annual crop", "woody perennial crop", "annual crop", "woody perennial crop", "annual crop", "woody perennial crop", + "annual crop" +)), row.names = c(NA, -20L), class = c( + "tbl_df", "tbl", + "data.frame" +)) + +stopifnot(all(design_points$site_id %in% covariates$site_id)) + +# should we ever get here? +if (!all(design_points$site_id %in% ensemble_data$site_id)) { + ensemble_data2 <- ensemble_data |> + dplyr::left_join(design_points, by = c("lat", "lon", "pft"), suffix = c("", ".dp")) |> + dplyr::mutate(site_id = site_id.dp) |> + dplyr::select(-site_id.dp) + n_missing <- setdiff(design_points$site_id, ensemble_data2$site_id) |> + length() + + if (n_missing > 0) { + PEcAn.logger::logger.error( + n_missing, "design points still missing from ensemble data after matching", + "this is already a hack, time to sort it out upstream!" + ) + } + ensemble_data <- ensemble_data2 +} +stopifnot(any(design_points$site_id %in% ensemble_data$site_id)) + + +# Scaled numeric design covariates for model diagnostics/plots +# Keep an unscaled copy of design covariates for prediction inputs +design_covariates_unscaled <- design_points |> dplyr::left_join(covariates, by = "site_id") |> dplyr::select(site_id, dplyr::all_of(covariate_names)) |> + as.data.frame() + +# Scaled numeric design covariates for model diagnostics/plots +design_covariates <- design_covariates_unscaled |> # randomForest pkg requires data frame as.data.frame() |> # scale covariates as for consistency with model dplyr::mutate(dplyr::across(dplyr::all_of(covariate_names), scale)) - -if (length(setdiff(design_points$site_id, unique(covariates$site_id))) > 0) { - PEcAn.logger::logger.error("Design points not in covariates:", length(not_in_covariates)) +# Check again to ensure we've resolved the issue +n_not_in_covariates_after <- setdiff(design_points$site_id, covariates$site_id) |> + length() +if (n_not_in_covariates_after > 0) { + PEcAn.logger::logger.error(n_not_in_covariates_after, "design points still missing covariate data after matching") } -# Subset data for testing / speed purposes +all(design_points$site_id %in% covariates$site_id) + +# Keep full covariates and perform per-PFT sampling later (dev mode) +covariates_full <- covariates if (!PRODUCTION) { if (!exists(".Random.seed")) set.seed(123) - covariates <- covariates |> - dplyr::anti_join(design_points, by = "site_id") |> - dplyr::slice_sample(n = 10000) |> - dplyr::bind_rows(design_covariates) # limit to 10k sites with reproducible sampling + PEcAn.logger::logger.info("Development mode: will sample up to 10k prediction sites per PFT") + # keep ca_field_attributes consistent with available covariates + ca_field_attributes <- ca_field_attributes |> + dplyr::filter(site_id %in% covariates_full$site_id) } -## TODO migrate to PEcAnAssimSequential -downscale_model_output <- function(date, model_output) { +# Build list of site_ids per PFT from field attributes +pft_site_ids <- ca_field_attributes |> + dplyr::filter(pft %in% pfts) |> + dplyr::distinct(site_id, pft) + +sites_info <- lapply(pft_site_ids, length) +PEcAn.logger::logger.info("Sites per PFT:", paste(sites_info, collapse = ", ")) + +#### Target sites: per-PFT site lists built above (pft_site_ids) + +## Wrapper to downscale a single carbon pool with explicit training set and target sites +# TODO refactor to to downscale_ensemble_output() +downscale_model_output <- function(date, + model_output, + train_ensemble_data, + train_site_coords = design_points, + pred_covariates = covariates) { + # Ensure training site coords only include sites present in the ensemble slice + # Restrict training coordinates to those present in the ensemble data + ens_sites <- unique(train_ensemble_data$site_id) + train_site_coords <- train_site_coords[train_site_coords$site_id %in% ens_sites, , drop = FALSE] + + if (nrow(train_site_coords) == 0) { + PEcAn.logger::logger.warn("No overlapping training sites between site_coords and ensemble data for pool", model_output) + return(NULL) + } + filtered_ens_data <- PEcAnAssimSequential::subset_ensemble( - ensemble_data = ensemble_data, - site_coords = design_points, + ensemble_data = train_ensemble_data, + site_coords = train_site_coords, date = date, carbon_pool = model_output ) + if (is.null(filtered_ens_data) || nrow(filtered_ens_data) == 0) { + PEcAn.logger::logger.warn("Filtered ensemble data is empty for pool", model_output) + return(NULL) + } + ## BEGIN HACK FOR SMALL-N + # n_unique_sites <- dplyr::n_distinct(train_site_coords$site_id) + # if (n_unique_sites <= 12) { + # source(here::here("R", "temporary_hack_fit_predict_small_n.R")) + # PEcAn.logger::logger.info("Small-N mode: training per-ensemble RF with nodesize=1 and no test split") + # downscale_output <- fit_predict_small_n( + # filtered_ens = filtered_ens_data, + # pred_covariates = pred_covariates, + # covariate_names = covariate_names, # already defined earlier in 040 + # nodesize = 1, + # ntree = 1000, + # seed = 123 + # ) + # } else { + ## END HACK FOR SMALL-N # Downscale the data - downscale_output <- PEcAnAssimSequential::ensemble_downscale( - ensemble_data = filtered_ens_data, - site_coords = design_points, - covariates = covariates, - seed = 123 - ) + downscale_output <- + PEcAnAssimSequential::ensemble_downscale( + ensemble_data = filtered_ens_data, + site_coords = train_site_coords, + covariates = pred_covariates, + seed = 123 + ) + # } ## REMOVE WITH HACK FOR SMALL-N + if (is.null(downscale_output)) { + return(NULL) + } + + # Attach the site_ids used for prediction to keep mapping explicit + downscale_output$site_ids <- pred_covariates$site_id return(downscale_output) } # not using furrr b/c it is used inside downscale -downscale_output_list <- purrr::map( - outputs_to_extract, - ~ { - PEcAn.logger::logger.info("Starting downscaling for", .x) - result <- downscale_model_output(date = end_date, model_output = .x) - PEcAn.logger::logger.info("Downscaling complete for", .x) - result +# We downscale each carbon pool for both woody and annual PFTs, +# predicting to the same target set for that PFT +downscale_output_list <- list() +for (pool in outputs_to_extract) { + for (pft_i in pfts) { + PEcAn.logger::logger.info("Starting downscaling for", pool, "(", pft_i, ")") + + # train_ens: ensemble data filtered by PFT + train_ens <- ensemble_data |> + dplyr::filter(pft == pft_i & variable == pool) + # train_pts: design points filtered by PFT + # should be the same as + train_pts <- train_ens |> + dplyr::select(site_id, lat, lon, pft) |> + dplyr::distinct() + # train_pts <- design_points |> + # dplyr::filter(pft == pft_i) + + # Quick diagnostic: overlapping site counts + n_train_ens_sites <- length(unique(train_ens$site_id)) + n_train_pts <- nrow(train_pts) + PEcAn.logger::logger.info("Training sites: ensemble has", n_train_ens_sites, "site_ids; using", n_train_pts, "coords") + + # NOTE: We skip strict pre-check; wrapper will attempt subset_ensemble then manual fallback. + + # prediction covariates: either full set for that PFT or sampled subset (dev) + if (PRODUCTION) { + pred_covs <- covariates_full |> + dplyr::filter(site_id %in% (pft_site_ids |> dplyr::filter(pft == pft_i) |> dplyr::pull(site_id))) + } else { + dp_pft <- design_covariates_unscaled |> + dplyr::filter(site_id %in% (design_points |> + dplyr::filter(pft == pft_i) |> + dplyr::pull(site_id))) + + sample_pool <- covariates_full |> + dplyr::filter(site_id %in% pft_site_ids[[pft_i]]) |> + dplyr::anti_join(dp_pft, by = "site_id") + + n_sample <- min(10000, nrow(sample_pool)) + sampled <- if (n_sample > 0) { + sample_pool |> + dplyr::slice_sample(n = n_sample) + } else { + sample_pool + } + + pred_covs <- dplyr::bind_rows(sampled, dp_pft) + + # ensure prediction covariates have at least one site + if (nrow(pred_covs) == 0) { + PEcAn.logger::logger.warn("No prediction covariates for PFT:", pft_i, "pool:", pool) + next + } + } + + result <- downscale_model_output( + date = end_date, + model_output = pool, + train_ensemble_data = train_ens, + train_site_coords = train_pts, + pred_covariates = pred_covs + ) + + PEcAn.logger::logger.info("Downscaling complete for", pool, "(", pft_i, ")") + + # store using pft::pool names (e.g. "woody::AGB"). + downscale_output_list[[paste0(pft_i, "::", pool)]] <- result } -) |> - purrr::set_names(outputs_to_extract) +} +if (length(downscale_output_list) == 0) { + PEcAn.logger::logger.severe("No downscale outputs produced") +} PEcAn.logger::logger.info("Downscaling complete for all model outputs") ## Save to make it easier to restart #### ---Create checkpoint for downstream analysis---#### checkpoint_file <- file.path(cache_dir, "downscaling_output.RData") start_end <- system.time( - save(downscale_output_list, covariates, design_points, design_covariates, ensemble_ids, + save( + downscale_output_list, + covariates, + design_points, + design_covariates, + ensemble_ids, + pft_site_ids, file = checkpoint_file, compress = FALSE ) @@ -124,15 +340,15 @@ PEcAn.logger::logger.info( ) PEcAn.logger::logger.info( - "🌟🌟🌟Finished downscaling🌟🌟🌟", + paste0(" Finished downscaling "), "\n\nCongratulations! You are almost there!\n\n", - rep("🚀", 10) + rep("", 10) ) -###--- Print Metrics for Each Ensemble Member ---#### +### --- Print Metrics for Each Ensemble Member ---#### PEcAn.logger::logger.info("Downscaling model results for each ensemble member:") -metrics <- lapply(downscale_output_list, PEcAnAssimSequential::downscale_metrics) +metrics <- lapply(downscale_output_list, PEcAnAssimSequential::downscale_metrics) # nolint median_metrics <- purrr::map(metrics, function(m) { m |> @@ -155,31 +371,35 @@ dplyr::bind_rows(median_metrics, .id = "model_output") |> PEcAn.logger::logger.info("Aggregating to County Level") -ca_fields_full <- sf::read_sf(file.path(data_dir, "ca_fields.gpkg")) -ca_fields <- ca_fields_full |> - dplyr::select(site_id, county, area_ha) - -if(!PRODUCTION) { +if (!PRODUCTION) { # For testing, use a subset of fields # could be even faster if we queried from gpkg: # sf::read_sf(..., sql = "SELECT * FROM ca_fields WHERE site_id IN (...)") ca_fields <- ca_fields |> - dplyr::right_join(covariates, by = "site_id") + dplyr::right_join(covariates, by = "site_id") } # Convert list to table with predictions and site identifier -get_downscale_preds <- function(downscale_output_list) { +# Helper: Convert a single downscale object to tidy predictions table +get_downscale_preds <- function(downscale_obj) { purrr::map( - downscale_output_list$predictions, - ~ tibble::tibble(site_id = covariates$site_id, prediction = .x) + downscale_obj$predictions, + ~ tibble::tibble(site_id = downscale_obj$site_ids, prediction = .x) ) |> dplyr::bind_rows(.id = "ensemble") |> - dplyr::left_join(ca_fields, by = "site_id") + dplyr::left_join(ca_fields, by = "site_id") } +# Assemble predictions; carry PFT label by parsing element name: "{pft}::{pool}" downscale_preds <- purrr::map(downscale_output_list, get_downscale_preds) |> - dplyr::bind_rows(.id = "model_output") |> + dplyr::bind_rows(.id = "spec") |> + tidyr::separate( + col = "spec", + into = c("pft", "model_output"), + sep = "::", + remove = TRUE + ) |> # Convert kg/m2 to Mg/ha using PEcAn.utils::ud_convert dplyr::mutate(c_density_Mg_ha = PEcAn.utils::ud_convert(prediction, "kg/m2", "Mg/ha")) |> # Calculate total Mg per field: c_density_Mg_ha * area_ha @@ -194,7 +414,7 @@ na_summary <- downscale_preds |> if (nrow(na_summary) > 0) { ## concise log message PEcAn.logger::logger.warn( - "YOU NEED TO DEBUG THIS!!!\n\n", + "YOU NEED TO DEBUG THIS!!!\n\n", "NA values detected in `downscale_preds`:\n" ) knitr::kable(na_summary, format = "simple") @@ -204,7 +424,7 @@ if (nrow(na_summary) > 0) { ens_county_preds <- downscale_preds |> # Now aggregate to get county level totals for each pool x ensemble - dplyr::group_by(model_output, county, ensemble) |> + dplyr::group_by(model_output, pft, county, ensemble) |> dplyr::summarize( n = dplyr::n(), total_c_Mg = sum(total_c_Mg), # total Mg C per county @@ -218,19 +438,19 @@ ens_county_preds <- downscale_preds |> total_c_Tg = PEcAn.utils::ud_convert(total_c_Mg, "Mg", "Tg"), c_density_Mg_ha = total_c_Mg / total_ha ) |> - dplyr::arrange(model_output, county, ensemble) + dplyr::arrange(model_output, pft, county, ensemble) ens_members_by_county <- ens_county_preds |> - dplyr::group_by(model_output, county) |> + dplyr::group_by(model_output, pft, county) |> dplyr::summarize(n_vals = dplyr::n_distinct(total_c_Mg), .groups = "drop") -if(all(ens_members_by_county$n_vals == length(ensemble_ids))) { +if (all(ens_members_by_county$n_vals == length(ensemble_ids))) { PEcAn.logger::logger.info("All counties have the correct number of ensemble members: (", length(ensemble_ids), ")") } else { - z <- ens_members_by_county |> - dplyr::group_by(county) |> - dplyr::summarise(n = mean(n_vals)) - PEcAn.logger::logger.error( + z <- ens_members_by_county |> + dplyr::group_by(county) |> + dplyr::summarise(n = mean(n_vals)) + PEcAn.logger::logger.error( sum(z$n != length(ensemble_ids)) / length(z$n), "counties have the wrong number of ensemble members after downscaling.", "Check ens_county_preds object." @@ -238,7 +458,7 @@ if(all(ens_members_by_county$n_vals == length(ensemble_ids))) { } county_summaries <- ens_county_preds |> - dplyr::group_by(model_output, county) |> + dplyr::group_by(model_output, pft, county) |> dplyr::summarize( # Number of fields in county should be same for each ensemble member n = max(n), @@ -263,7 +483,7 @@ readr::write_csv( PEcAn.logger::logger.info("County summaries written to", file.path(model_outdir, "county_summaries.csv")) PEcAn.logger::logger.info( - rep("🌟🌟🌟 ", 5), "\n\n", + rep(" ", 5), "\n\n", "Finished aggregation to County level", "\n\n", - rep("🌟🌟🌟 ", 5) + rep(" ", 5) ) diff --git a/scripts/041_downscale_analysis.R b/scripts/041_downscale_analysis.R index f446877..167d357 100644 --- a/scripts/041_downscale_analysis.R +++ b/scripts/041_downscale_analysis.R @@ -3,56 +3,76 @@ source("000-config.R") checkpoint_file <- file.path(cache_dir, "downscaling_output.RData") checkpoint_objects <- load(checkpoint_file) PEcAn.logger::logger.info("Loaded checkpoint objects:", paste(checkpoint_objects, collapse = ",")) -# downscale_output_list -# covariates -# design_points -# design_covariates -# ensemble_ids - -##### Variable Importance Analysis ##### -importance_summary <- purrr::map_dfr(outputs_to_extract, ~ { - # Extract the importance for each ensemble model in the carbon pool +# Objects expected: +# - downscale_output_list (named like "woody::AGB", "annual::AGB", ...) +# - covariates, design_points, design_covariates, ensemble_ids + +# Identify available PFT+pool specs from names +spec_table <- tibble::tibble(spec = names(downscale_output_list)) |> + tidyr::separate(spec, into = c("pft_key", "model_output"), sep = "::", remove = FALSE) |> + dplyr::mutate( + pft = dplyr::case_when( + pft_key == "woody" ~ "woody perennial crop", + pft_key == "annual" ~ "annual crop", + TRUE ~ pft_key + ) + ) + +##### Variable Importance Analysis (by PFT and pool) ##### +importance_summary <- purrr::map_dfr(spec_table$spec, function(sp) { + obj <- downscale_output_list[[sp]] importances <- purrr::map(ensemble_ids, function(i) { - model <- downscale_output_list[[.x]][["model"]][[i]] - randomForest::importance(model)[, "%IncMSE"] + model <- obj[["model"]][[i]] + vi <- randomForest::importance(model) + # Prefer %IncMSE if present; otherwise use IncNodePurity + if ("%IncMSE" %in% colnames(vi)) vi[, "%IncMSE"] else vi[, 1] }) - - # Turn the list of importance vectors into a data frame - importance_df <- purrr::map_dfr(importances, ~ tibble::tibble(importance = .x), .id = "ensemble") |> + predictors <- rownames(randomForest::importance(obj[["model"]][[1]])) + imp_df <- purrr::map_dfr(importances, ~ tibble::tibble(importance = .x), .id = "ensemble") |> dplyr::group_by(ensemble) |> - dplyr::mutate(predictor = names(importances[[1]])) |> + dplyr::mutate(predictor = predictors) |> dplyr::ungroup() - - # Now summarize median and IQR for each predictor across ensembles - summary_df <- importance_df |> + spec_row <- dplyr::filter(spec_table, spec == sp) + imp_df |> dplyr::group_by(predictor) |> dplyr::summarize( median_importance = median(importance, na.rm = TRUE), - lcl_importance = quantile(importance, 0.25, na.rm = TRUE), - ucl_importance = quantile(importance, 0.75, na.rm = TRUE), + lcl_importance = stats::quantile(importance, 0.25, na.rm = TRUE), + ucl_importance = stats::quantile(importance, 0.75, na.rm = TRUE), .groups = "drop" ) |> - dplyr::mutate(model_output = .x) - return(summary_df) + dplyr::mutate( + model_output = spec_row$model_output, + pft_key = spec_row$pft_key, + pft = spec_row$pft + ) }) -for (output in outputs_to_extract) { - # Top 2 predictors for this carbon pool +for (i in seq_len(nrow(spec_table))) { + sp <- spec_table$spec[i] + pft_label <- spec_table$pft[i] + pool <- spec_table$model_output[i] + obj <- downscale_output_list[[sp]] + model <- obj[["model"]][[1]] + + # Top 2 predictors for this PFT+pool top_predictors <- importance_summary |> - dplyr::filter(model_output == output) |> + dplyr::filter(model_output == pool, pft == pft_label) |> dplyr::arrange(dplyr::desc(median_importance)) |> dplyr::slice_head(n = 2) |> dplyr::pull(predictor) - # Prepare model and subset of covariates for plotting - model <- downscale_output_list[[output]][["model"]][[1]] + if (length(top_predictors) < 2) { + PEcAn.logger::logger.warn("Not enough predictors for partial plots:", sp) + next + } # Set up PNG for three panel plot - PEcAn.logger::logger.info("Creating importance and partial plots for", output) + PEcAn.logger::logger.info("Creating importance and partial plots for", sp) importance_partial_plot_fig <- here::here( "figures", - paste0(output, "_importance_partial_plots.png") - ) # Ensure the directory exists + paste0(gsub("::", "_", sp), "_importance_partial_plots.png") + ) png( filename = importance_partial_plot_fig, @@ -62,14 +82,14 @@ for (output in outputs_to_extract) { # Panel 1: Variable importance plot output_importance <- importance_summary |> - dplyr::filter(model_output == output) + dplyr::filter(model_output == pool, pft == pft_label) par(mar = c(5, 10, 4, 2)) with( output_importance, dotchart(median_importance, labels = reorder(predictor, median_importance), xlab = "Median Increase MSE (SD)", - main = paste("Importance -", output), + main = paste("Importance -", pool, "-", pft_label), pch = 19, col = "steelblue", cex = 1.2 ) ) @@ -90,7 +110,7 @@ for (output in outputs_to_extract) { x.var = top_predictors[1], main = paste("Partial Dependence -", top_predictors[1]), xlab = top_predictors[1], - ylab = paste("Predicted", output), + ylab = paste("Predicted", pool, "-", pft_label), col = "steelblue", lwd = 2 ) @@ -101,15 +121,14 @@ for (output in outputs_to_extract) { x.var = top_predictors[2], main = paste("Partial Dependence -", top_predictors[2]), xlab = top_predictors[2], - ylab = paste("Predicted", output), + ylab = paste("Predicted", pool, "-", pft_label), col = "steelblue", lwd = 2 ) - dev.off() # Save combined figure + dev.off() PEcAn.logger::logger.info( "Saved importance and partial plots for", - output, " to ", - importance_partial_plot_fig + sp, " to ", importance_partial_plot_fig ) } @@ -119,13 +138,14 @@ for (output in outputs_to_extract) { # library(iml) PEcAn.logger::logger.info("***Starting ALE plots***") - -for (output in outputs_to_extract) { - model <- downscale_output_list[[output]][["model"]][[1]] - # Use design points covariates instead of all covariates +for (i in seq_len(nrow(spec_table))) { + sp <- spec_table$spec[i] + pft_label <- spec_table$pft[i] + pool <- spec_table$model_output[i] + model <- downscale_output_list[[sp]][["model"]][[1]] top_predictors <- importance_summary |> - dplyr::filter(model_output == output) |> + dplyr::filter(model_output == pool, pft == pft_label) |> dplyr::arrange(dplyr::desc(median_importance)) |> dplyr::slice_head(n = 2) |> dplyr::pull(predictor) @@ -134,19 +154,16 @@ for (output in outputs_to_extract) { model = model, data = design_covariates, y = NULL, - predict.function = function(m, newdata) predict(m, newdata) + predict.function = function(m, newdata) stats::predict(m, newdata) ) - for (i in seq_along(top_predictors)) { - pred_var_name <- top_predictors[i] - - PEcAn.logger::logger.info("Starting ALE calculation for predictor:", pred_var_name) + for (j in seq_along(top_predictors)) { + pred_var_name <- top_predictors[j] + PEcAn.logger::logger.info("Starting ALE calculation for", sp, "predictor:", pred_var_name) ale <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ale") - - PEcAn.logger::logger.info("Saving ALE plot for predictor:", pred_var_name) ggplot2::ggsave( - filename = here::here("figures", paste0(output, "_ALE_predictor", i, ".png")), - plot = plot(ale) + ggplot2::ggtitle(paste("ALE for", pred_var_name, "on", output)), + filename = here::here("figures", paste0(gsub("::", "_", sp), "_ALE_predictor", j, ".png")), + plot = plot(ale) + ggplot2::ggtitle(paste("ALE for", pred_var_name, "on", pool, "-", pft_label)), width = 6, height = 4, units = "in", dpi = 300 ) } @@ -154,14 +171,14 @@ for (output in outputs_to_extract) { ## ICE Plots PEcAn.logger::logger.info("Creating ICE plots for top predictors") - -for (output in outputs_to_extract) { - model <- downscale_output_list[[output]][["model"]][[1]] - # Use design points covariates instead of all covariates - +for (i in seq_len(nrow(spec_table))) { + sp <- spec_table$spec[i] + pft_label <- spec_table$pft[i] + pool <- spec_table$model_output[i] + model <- downscale_output_list[[sp]][["model"]][[1]] top_predictors <- importance_summary |> - dplyr::filter(model_output == output) |> + dplyr::filter(model_output == pool, pft == pft_label) |> dplyr::arrange(dplyr::desc(median_importance)) |> dplyr::slice_head(n = 2) |> dplyr::pull(predictor) @@ -173,14 +190,12 @@ for (output in outputs_to_extract) { predict.function = function(m, newdata) stats::predict(m, newdata) ) - for (i in seq_along(top_predictors)) { - pred_var_name <- top_predictors[i] + for (j in seq_along(top_predictors)) { + pred_var_name <- top_predictors[j] ice <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ice") - - # Save plot ggplot2::ggsave( - filename = here::here("figures", paste0(output, "_ICE_predictor", i, ".png")), - plot = plot(ice) + ggplot2::ggtitle(paste("ICE for", pred_var_name, "on", output)), + filename = here::here("figures", paste0(gsub("::", "_", sp), "_ICE_predictor", j, ".png")), + plot = plot(ice) + ggplot2::ggtitle(paste("ICE for", pred_var_name, "on", pool, "-", pft_label)), width = 6, height = 4, units = "in", dpi = 300 ) } From 08e153233d6f13208171d11aa717b6d97de75335 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Wed, 17 Sep 2025 20:06:29 -0400 Subject: [PATCH 20/70] Rename downscaling scripts, split downscale+aggregate into separate scripts --- ...nscale_and_aggregate.R => 040_downscale.R} | 148 ++++++---------- scripts/041_aggregate_to_county.R | 158 ++++++++++++++++++ ...le_analysis.R => 042_downscale_analysis.R} | 2 +- ...level_plots.R => 043_county_level_plots.R} | 0 4 files changed, 214 insertions(+), 94 deletions(-) rename scripts/{040_downscale_and_aggregate.R => 040_downscale.R} (82%) create mode 100644 scripts/041_aggregate_to_county.R rename scripts/{041_downscale_analysis.R => 042_downscale_analysis.R} (99%) rename scripts/{042_county_level_plots.R => 043_county_level_plots.R} (100%) diff --git a/scripts/040_downscale_and_aggregate.R b/scripts/040_downscale.R similarity index 82% rename from scripts/040_downscale_and_aggregate.R rename to scripts/040_downscale.R index 5db4705..783b13c 100644 --- a/scripts/040_downscale_and_aggregate.R +++ b/scripts/040_downscale.R @@ -219,8 +219,7 @@ downscale_model_output <- function(date, # pred_covariates = pred_covariates, # covariate_names = covariate_names, # already defined earlier in 040 # nodesize = 1, - # ntree = 1000, - # seed = 123 + # ntree = 1000 # ) # } else { ## END HACK FOR SMALL-N @@ -229,8 +228,7 @@ downscale_model_output <- function(date, PEcAnAssimSequential::ensemble_downscale( ensemble_data = filtered_ens_data, site_coords = train_site_coords, - covariates = pred_covariates, - seed = 123 + covariates = pred_covariates ) # } ## REMOVE WITH HACK FOR SMALL-N if (is.null(downscale_output)) { @@ -253,6 +251,28 @@ for (pool in outputs_to_extract) { # train_ens: ensemble data filtered by PFT train_ens <- ensemble_data |> dplyr::filter(pft == pft_i & variable == pool) + + # Skip empty slices early + if (nrow(train_ens) == 0) { + PEcAn.logger::logger.warn("No ensemble rows for ", pft_i, "::", pool, " skipping") + next + } + + # Determine per-slice end date and warn if ensembles disagree + slice_end_date <- as.Date(max(train_ens$datetime)) + end_by_ens <- train_ens |> + dplyr::group_by(ensemble) |> + dplyr::summarise(last_date = max(lubridate::as_date(datetime)), .groups = "drop") + if (dplyr::n_distinct(end_by_ens$last_date) > 1) { + PEcAn.logger::logger.warn( + "End dates vary across ensembles for ", pft_i, "::", pool, + "; using slice_end_date=", as.character(slice_end_date) + ) + } else { + PEcAn.logger::logger.info( + "Using slice_end_date=", as.character(slice_end_date), " for ", pft_i, "::", pool + ) + } # train_pts: design points filtered by PFT # should be the same as train_pts <- train_ens |> @@ -270,8 +290,16 @@ for (pool in outputs_to_extract) { # prediction covariates: either full set for that PFT or sampled subset (dev) if (PRODUCTION) { + # Ensure design point covariates are included for training join + dp_pft <- design_covariates_unscaled |> + dplyr::filter(site_id %in% (design_points |> + dplyr::filter(pft == pft_i) |> + dplyr::pull(site_id))) + pred_covs <- covariates_full |> - dplyr::filter(site_id %in% (pft_site_ids |> dplyr::filter(pft == pft_i) |> dplyr::pull(site_id))) + dplyr::filter(site_id %in% (pft_site_ids |> dplyr::filter(pft == pft_i) |> dplyr::pull(site_id))) |> + dplyr::bind_rows(dp_pft) |> + dplyr::distinct(site_id, .keep_all = TRUE) } else { dp_pft <- design_covariates_unscaled |> dplyr::filter(site_id %in% (design_points |> @@ -292,15 +320,31 @@ for (pool in outputs_to_extract) { pred_covs <- dplyr::bind_rows(sampled, dp_pft) - # ensure prediction covariates have at least one site + # ensure prediction covariates have at least one site (development mode) if (nrow(pred_covs) == 0) { - PEcAn.logger::logger.warn("No prediction covariates for PFT:", pft_i, "pool:", pool) + PEcAn.logger::logger.warn("No prediction covariates for PFT:", pft_i, " pool:", pool, " skipping") next } } + # Guard for empty prediction covariates (both development mode and production) + if (nrow(pred_covs) == 0) { + PEcAn.logger::logger.warn("No prediction covariates for PFT:", pft_i, " pool:", pool, " skipping") + next + } + + # Sanity: ensure all training site_ids exist in prediction covariates + missing_train_cov <- setdiff(unique(train_pts$site_id), pred_covs$site_id) + if (length(missing_train_cov) > 0) { + PEcAn.logger::logger.error( + "Missing covariates for training site_ids (", length(missing_train_cov), ") in PFT ", pft_i, + ": ", paste(utils::head(missing_train_cov, 10), collapse = ", "), + if (length(missing_train_cov) > 10) " ..." else "" + ) + } + result <- downscale_model_output( - date = end_date, + date = slice_end_date, model_output = pool, train_ensemble_data = train_ens, train_site_coords = train_pts, @@ -366,12 +410,6 @@ PEcAn.logger::logger.info("Median downscaling model metrics:") dplyr::bind_rows(median_metrics, .id = "model_output") |> knitr::kable() -#### ---- Aggregate to County Level ---- #### -#### TODO Split into separate script? - -PEcAn.logger::logger.info("Aggregating to County Level") - - if (!PRODUCTION) { # For testing, use a subset of fields # could be even faster if we queried from gpkg: @@ -405,85 +443,9 @@ downscale_preds <- purrr::map(downscale_output_list, get_downscale_preds) |> # Calculate total Mg per field: c_density_Mg_ha * area_ha dplyr::mutate(total_c_Mg = c_density_Mg_ha * area_ha) -### TODO Debug and catch if it appears again -na_summary <- downscale_preds |> - dplyr::summarise(dplyr::across(dplyr::everything(), ~ sum(is.na(.x)))) |> - tidyr::pivot_longer(dplyr::everything(), names_to = "column", values_to = "n_na") |> - dplyr::filter(n_na > 0) - -if (nrow(na_summary) > 0) { - ## concise log message - PEcAn.logger::logger.warn( - "YOU NEED TO DEBUG THIS!!!\n\n", - "NA values detected in `downscale_preds`:\n" - ) - knitr::kable(na_summary, format = "simple") - # remove all rows with NA values - downscale_preds <- tidyr::drop_na(downscale_preds) -} - -ens_county_preds <- downscale_preds |> - # Now aggregate to get county level totals for each pool x ensemble - dplyr::group_by(model_output, pft, county, ensemble) |> - dplyr::summarize( - n = dplyr::n(), - total_c_Mg = sum(total_c_Mg), # total Mg C per county - total_ha = sum(area_ha), - .groups = "drop_last" - ) |> - dplyr::ungroup() |> - # counties with no fields will result in NA below - dplyr::filter(total_ha > 0) |> - dplyr::mutate( - total_c_Tg = PEcAn.utils::ud_convert(total_c_Mg, "Mg", "Tg"), - c_density_Mg_ha = total_c_Mg / total_ha - ) |> - dplyr::arrange(model_output, pft, county, ensemble) - -ens_members_by_county <- ens_county_preds |> - dplyr::group_by(model_output, pft, county) |> - dplyr::summarize(n_vals = dplyr::n_distinct(total_c_Mg), .groups = "drop") - -if (all(ens_members_by_county$n_vals == length(ensemble_ids))) { - PEcAn.logger::logger.info("All counties have the correct number of ensemble members: (", length(ensemble_ids), ")") -} else { - z <- ens_members_by_county |> - dplyr::group_by(county) |> - dplyr::summarise(n = mean(n_vals)) - PEcAn.logger::logger.error( - sum(z$n != length(ensemble_ids)) / length(z$n), - "counties have the wrong number of ensemble members after downscaling.", - "Check ens_county_preds object." - ) -} - -county_summaries <- ens_county_preds |> - dplyr::group_by(model_output, pft, county) |> - dplyr::summarize( - # Number of fields in county should be same for each ensemble member - n = max(n), - mean_total_c_Tg = mean(total_c_Tg), - sd_total_c_Tg = sd(total_c_Tg), - mean_c_density_Mg_ha = mean(c_density_Mg_ha), - sd_c_density_Mg_ha = sd(c_density_Mg_ha), - .groups = "drop" - ) |> - dplyr::mutate( - # Only save 3 significant digits - dplyr::across( - .cols = c(mean_total_c_Tg, sd_total_c_Tg, mean_c_density_Mg_ha, sd_c_density_Mg_ha), - .fns = ~ signif(.x, 3) - ) - ) +## Write out downscaled predictions readr::write_csv( - county_summaries, - file.path(model_outdir, "county_summaries.csv") -) -PEcAn.logger::logger.info("County summaries written to", file.path(model_outdir, "county_summaries.csv")) - -PEcAn.logger::logger.info( - rep(" ", 5), "\n\n", - "Finished aggregation to County level", "\n\n", - rep(" ", 5) + downscale_preds, + file.path(model_outdir, "downscaled_preds.csv") ) diff --git a/scripts/041_aggregate_to_county.R b/scripts/041_aggregate_to_county.R new file mode 100644 index 0000000..50f4b81 --- /dev/null +++ b/scripts/041_aggregate_to_county.R @@ -0,0 +1,158 @@ +# Aggregation to County Level +# Loads downscaling outputs saved by 040_downscale.R and aggregates to counties + +# Load configuration and paths +source("000-config.R") + +ca_attributes_csv <- file.path(data_dir, "ca_field_attributes.csv") +ca_attributes <- readr::read_csv(ca_attributes_csv) + + + +# Load downscaling checkpoint created by 040_downscale.R +checkpoint_file <- file.path(cache_dir, "downscaling_output.RData") +load(checkpoint_file) + +# Ensure required objects are present +required_objs <- c("downscale_output_list", "ensemble_ids", "covariates") +missing_objs <- required_objs[!vapply(required_objs, exists, logical(1))] +if (length(missing_objs) > 0) { + stop(paste("Missing required objects in checkpoint:", paste(missing_objs, collapse = ", "))) +} + +# Load field geometry/attributes needed for aggregation +if (!exists("ca_fields")) { + ca_fields_full <- sf::read_sf(file.path(data_dir, "ca_fields.gpkg")) + ca_fields <- ca_fields_full |> + dplyr::select(site_id, county, area_ha) +} + +PEcAn.logger::logger.info("***Starting Aggregation to County Level***") +#### ---- Aggregate to County Level ---- #### +#### TODO Split into separate script? + +PEcAn.logger::logger.info("Aggregating to County Level") + + +if (!PRODUCTION) { + # For testing, use a subset of fields + # could be even faster if we queried from gpkg: + # sf::read_sf(..., sql = "SELECT * FROM ca_fields WHERE site_id IN (...)") + ca_fields <- ca_fields |> + dplyr::right_join(covariates, by = "site_id") +} + +# Convert list to table with predictions and site identifier +# Helper: Convert a single downscale object to tidy predictions table +get_downscale_preds <- function(downscale_obj) { + purrr::map( + downscale_obj$predictions, + ~ tibble::tibble(site_id = downscale_obj$site_ids, prediction = .x) + ) |> + dplyr::bind_rows(.id = "ensemble") |> + dplyr::left_join(ca_fields, by = "site_id") +} + +# Assemble predictions; carry PFT label by parsing element name: "{pft}::{pool}" +downscale_preds <- purrr::map(downscale_output_list, get_downscale_preds) |> + dplyr::bind_rows(.id = "spec") |> + tidyr::separate( + col = "spec", + into = c("pft", "model_output"), + sep = "::", + remove = TRUE + ) |> + # Convert kg/m2 to Mg/ha using PEcAn.utils::ud_convert + dplyr::mutate(c_density_Mg_ha = PEcAn.utils::ud_convert(prediction, "kg/m2", "Mg/ha")) |> + # Calculate total Mg per field: c_density_Mg_ha * area_ha + dplyr::mutate(total_c_Mg = c_density_Mg_ha * area_ha) + +## Write out downscaled predictions + +readr::write_csv( + downscale_preds, + file.path(model_outdir, "downscaled_preds.csv") +) + +### TODO Debug and catch if it appears again +na_summary <- downscale_preds |> + dplyr::summarise(dplyr::across(dplyr::everything(), ~ sum(is.na(.x)))) |> + tidyr::pivot_longer(dplyr::everything(), names_to = "column", values_to = "n_na") |> + dplyr::filter(n_na > 0) + +if (nrow(na_summary) > 0) { + ## concise log message + PEcAn.logger::logger.warn( + "YOU NEED TO DEBUG THIS!!!\n\n", + "NA values detected in `downscale_preds`:\n" + ) + knitr::kable(na_summary, format = "simple") + # remove all rows with NA values + downscale_preds <- tidyr::drop_na(downscale_preds) +} + +ens_county_preds <- downscale_preds |> + # Now aggregate to get county level totals for each pool x ensemble + dplyr::group_by(model_output, pft, county, ensemble) |> + dplyr::summarize( + n = dplyr::n(), + total_c_Mg = sum(total_c_Mg), # total Mg C per county + total_ha = sum(area_ha), + .groups = "drop_last" + ) |> + dplyr::ungroup() |> + # counties with no fields will result in NA below + dplyr::filter(total_ha > 0) |> + dplyr::mutate( + total_c_Tg = PEcAn.utils::ud_convert(total_c_Mg, "Mg", "Tg"), + c_density_Mg_ha = total_c_Mg / total_ha + ) |> + dplyr::arrange(model_output, pft, county, ensemble) + +ens_members_by_county <- ens_county_preds |> + dplyr::group_by(model_output, pft, county) |> + dplyr::summarize(n_vals = dplyr::n_distinct(total_c_Mg), .groups = "drop") + +if (all(ens_members_by_county$n_vals == length(ensemble_ids))) { + PEcAn.logger::logger.info("All counties have the correct number of ensemble members: (", length(ensemble_ids), ")") +} else { + z <- ens_members_by_county |> + dplyr::group_by(county) |> + dplyr::summarise(n = mean(n_vals)) + PEcAn.logger::logger.error( + sum(z$n != length(ensemble_ids)) / length(z$n), + "counties have the wrong number of ensemble members after downscaling.", + "Check ens_county_preds object." + ) +} + +county_summaries <- ens_county_preds |> + dplyr::group_by(model_output, pft, county) |> + dplyr::summarize( + # Number of fields in county should be same for each ensemble member + n = max(n), + mean_total_c_Tg = mean(total_c_Tg), + sd_total_c_Tg = sd(total_c_Tg), + mean_c_density_Mg_ha = mean(c_density_Mg_ha), + sd_c_density_Mg_ha = sd(c_density_Mg_ha), + .groups = "drop" + ) |> + dplyr::mutate( + # Only save 3 significant digits + dplyr::across( + .cols = c(mean_total_c_Tg, sd_total_c_Tg, mean_c_density_Mg_ha, sd_c_density_Mg_ha), + .fns = ~ signif(.x, 3) + ) + ) + +readr::write_csv( + county_summaries, + file.path(model_outdir, "county_summaries.csv") +) +PEcAn.logger::logger.info("County summaries written to", file.path(model_outdir, "county_summaries.csv")) + +PEcAn.logger::logger.info( + rep(" ", 5), "\n\n", + "Finished aggregation to County level", "\n\n", + rep(" ", 5) +) diff --git a/scripts/041_downscale_analysis.R b/scripts/042_downscale_analysis.R similarity index 99% rename from scripts/041_downscale_analysis.R rename to scripts/042_downscale_analysis.R index 167d357..4cba288 100644 --- a/scripts/041_downscale_analysis.R +++ b/scripts/042_downscale_analysis.R @@ -12,7 +12,7 @@ spec_table <- tibble::tibble(spec = names(downscale_output_list)) |> tidyr::separate(spec, into = c("pft_key", "model_output"), sep = "::", remove = FALSE) |> dplyr::mutate( pft = dplyr::case_when( - pft_key == "woody" ~ "woody perennial crop", + pft_key == "woody" ~ "woody perennial crop", pft_key == "annual" ~ "annual crop", TRUE ~ pft_key ) diff --git a/scripts/042_county_level_plots.R b/scripts/043_county_level_plots.R similarity index 100% rename from scripts/042_county_level_plots.R rename to scripts/043_county_level_plots.R From 295ec2e22a4d8b429afeff7eab02a9944fc92fca Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 00:47:03 -0400 Subject: [PATCH 21/70] Set up quarto publish gh-pages: add _quarto.yml, update README, etc --- .gitignore | 43 +++++++++++++++++----------- 000-config.R | 81 ++++++++++++++++++++++++++-------------------------- README.md | 70 +++++++++++++++++++++++++++++++++++++++------ _quarto.yml | 56 ++++++++++++++++++++++++++++++++++++ index.qmd | 7 +++++ 5 files changed, 192 insertions(+), 65 deletions(-) create mode 100644 _quarto.yml create mode 100644 index.qmd diff --git a/.gitignore b/.gitignore index 954c56a..4d30745 100644 --- a/.gitignore +++ b/.gitignore @@ -1,33 +1,44 @@ -# R-specific files +########## Operating system / Editor ########## +.DS_Store +Thumbs.db +.vscode/ +.idea/ + +########## R / RStudio ########## .Rhistory .RData .Ruserdata .Rproj.user/ *.Rproj +Rplots.pdf -# Quarto directories -/.quarto/ +########## Quarto build outputs ########## /_site/ /_book/ -*/docs/ -!/docs/** /_freeze/ +/.quarto/ /_files/ +# Keep only top-level docs; ignore nested docs directories elsewhere +*/docs/ +!/docs/** -# VScode settings -.vscode/ - -# Knitr + R Markdown cache +########## Knitr / R Markdown caches ########## *_cache/ /cache/ *_files/ -*html - -# System files -.DS_Store -Thumbs.db +*.utf8.md +*.knit.md +*.html -# Temporary files -*.tmp +########## Logs / temp ########## *.log +*.tmp +nohup.out + +########## Deployment / misc ########## +rsconnect/ + +########## Generated figures ########## +# Do not track any rendered images; site publishing copies from _site +figures/ diff --git a/000-config.R b/000-config.R index 12805cd..dbb3930 100644 --- a/000-config.R +++ b/000-config.R @@ -1,25 +1,25 @@ ### Workflow Configuration Settings ### -# Check that we are in the correct working directory -if (!basename(here::here(getwd())) == "downscaling") { - PEcAn.logger::logger.error("Please run this script from the 'downscaling' directory") -} - -## Global configuration settings for logging +parser <- argparse::ArgumentParser() -options( - # Print all tibble columns - tibble.width = Inf, - # Suppress readr::read_csv messages - readr.show_col_types = FALSE, - vroom.show_col_types = FALSE +## Set development vs production mode ## +# Dev mode speeds up workflows by subsetting data for testing and debugging +parser$add_argument("--production", + type = "logical", default = FALSE, + help = "Set to true for production mode, false for faster development (default: FALSE)" ) +args <- parser$parse_args() +PRODUCTION <- args$production + +# Manual override for interactive sessions +if (rlang::is_interactive()) { + PRODUCTION <- TRUE +} ## Set parallel processing options no_cores <- max(future::availableCores() - 1, 1) future::plan(future::multicore, workers = no_cores) - # **set ccmmf_dir and pecan_outdir** # Define the CCMMF directory from environment variable ccmmf_dir <- Sys.getenv("CCMMF_DIR") @@ -27,18 +27,6 @@ if (ccmmf_dir == "") { ccmmf_dir <- "/projectnb2/dietzelab/ccmmf" } pecan_outdir <- file.path(ccmmf_dir, "modelout", "ccmmf_phase_2b_mixed_pfts_20250701") -# **Is this a test or production run?** -# Accepts command line argument '--production=false' to set testing mode -# -# Global switch to toggle between fast, small scale runs for development and testing -# and full-scale production runs. Defaults to production mode. -args <- commandArgs(trailingOnly = TRUE) -prod_arg <- grep("^--production=", args, value = TRUE) -if (length(prod_arg) > 0) { - PRODUCTION <- tolower(sub("^--production=", "", prod_arg)) != "false" -} else { - PRODUCTION <- TRUE -} # **Variables to extract** # see docs/workflow_documentation.qmd for complete list of outputs @@ -46,7 +34,7 @@ if (length(prod_arg) > 0) { # "TotSoilCarb", # "AGB" # ) -outputs_to_extract <- "TotSoilCarb" +outputs_to_extract <- c("TotSoilCarb", "AGB") if (!PRODUCTION) { # can subset for testing # depending on what part of the workflow you are testing @@ -65,24 +53,37 @@ model_outdir <- file.path(pecan_outdir, "out") set.seed(42) ca_albers_crs <- 3310 -#### Messagees #### -PEcAn.logger::logger.info("\n\n", +# Plot masking threshold for county-level maps (share of statewide total carbon) +# Example: 0.01 = mask counties below 1% of statewide total carbon for that PFT/pool +MASK_THRESHOLD <- 0.01 + +#### Messages #### +msg <- glue::glue( + "\n\n", "##### SETTINGS SUMMARY #####\n\n", - "Running in", ifelse(PRODUCTION, "**production**", "**development**"), "mode\n\n", + "Running in {ifelse(PRODUCTION, '**PRODUCTION**', '**DEVELOPMENT**')} mode\n\n", "### Directory Settings ###\n", - "- CCMMF directory:", ccmmf_dir, "\n", - "- data_dir :", data_dir, "\n", - "- cache_dir :", cache_dir, "\n", - "- raw_data_dir. :", raw_data_dir, "\n", - "- pecan_outdir. :", pecan_outdir, "\n", - "- model_outdir. :", model_outdir, "\n\n", + "- CCMMF directory: {ccmmf_dir}\n", + "- data_dir : {data_dir}\n", + "- cache_dir : {cache_dir}\n", + "- raw_data_dir. : {raw_data_dir}\n", + "- pecan_outdir. : {pecan_outdir}\n", + "- model_outdir. : {model_outdir}\n\n", "### Other Settings ###\n", - "- will extract variables:", paste(outputs_to_extract, collapse = ", "), "\n", - "- ca_albers_crs :", ca_albers_crs, - ifelse(ca_albers_crs == 3310, ", which is NAD83 / California Albers", ""), "\n", - wrap = FALSE + "- will extract variables: {paste(outputs_to_extract, collapse = ', ')}\n", + "- ca_albers_crs : {ca_albers_crs}{if(ca_albers_crs == 3310) ', which is NAD83 / California Albers' else ''}\n" +) +PEcAn.logger::logger.info(msg, wrap = FALSE) + +## Global configuration settings for logging + +options( + # Print all tibble columns + tibble.width = Inf, + # Suppress readr::read_csv messages + readr.show_col_types = FALSE ) -# Source all R scripts in the R/ directory +## Source all R scripts in the R/ directory r_scripts <- list.files(file.path(here::here(), "R"), pattern = "\\.R$", full.names = TRUE) lapply(r_scripts, source) diff --git a/README.md b/README.md index 67296fb..b07e3bf 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,5 @@ +[![](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) + # Downscaling Workflow ## Overview @@ -15,18 +17,10 @@ The downscaling workflow predicts carbon pools (SOC and AGB) for California crop - **Anchor sites:** Fields with ground truth validation data (e.g., Ameriflux sites) - **Design points:** Representative fields selected for SIPNET simulation based on environmental clustering -- **Downscaling:** Process of extending predictions from design points to all California crop fields - - - -## Documentation - -- [Detailed Workflow Documentation](docs/workflow_documentation.qmd) - Step-by-step description of methods -- [Results and Analysis](reports/downscaling_results.qmd) - Visualizations and interpretation +- **Downscaling:** Process of extending predictions from design points to all California crop fields. ## Environment and Setup - ### Key Configuration Files - `.future.R` - Sets up parallel processing @@ -102,3 +96,61 @@ qsub \ -e logs/03.err \ -b y Rscript downscale/999_workflow_step.R ``` + +### Building Documentation Site with Quarto + +**Preview** + +```bash +quarto preview +``` + +**Build** + +```bash +quarto render +``` + +#### Publish to GitHub Pages (gh-pages) + +These steps will publish to https://ccmmf.github.io/downscaling. + +This does not commit compiled HTML to `main`. Instead, Quarto pushes the built site to a separate `gh-pages` branch. + +One-time setup in GitHub: +1. Settings → Pages → Set Source to “Deploy from a branch”. +2. Select Branch: `gh-pages` and Folder: `/ (root)`. + +Publish from your machine after rendering: + +```bash +# Build site locally (runs R code and embeds results) +quarto render + +# Publish the built _site/ to gh-pages +quarto publish gh-pages +``` + +### Adding Content + +- Add any `.qmd` or `.md` file +- Add links in `_quarto.yml` + - under `project.render` + - in the appropriate section under `website.navbar.left`. + + +#### Notes on Quarto Configuration + +- **Global HTML Settings:** + - `self-contained: true`, `embed-resources: true`, `df-print: paged`, `toc: true`. + +- **Execution:** + - Uses `freeze: auto` to cache outputs; re-executes only when inputs change. + - Some pages (e.g., `reports/downscaling_results.qmd`) depend on paths in `000-config.R`. Build locally where paths exist. + +- **Configuration:** + - Quarto settings are in `_quarto.yml` (generates standalone HTML). + - Does not use GitHub Actions due to reliance on large local datasets. + +- **Home Page:** + - Defined in `index.qmd` (includes this `README.md`). diff --git a/_quarto.yml b/_quarto.yml new file mode 100644 index 0000000..984f189 --- /dev/null +++ b/_quarto.yml @@ -0,0 +1,56 @@ +project: + type: website + output-dir: _site + render: + # Explicit pages to render + - index.qmd + - README.md + - reports/downscaling_results.qmd + - reports/design_points_analysis.qmd + - reports/variable_importance.qmd + - docs/workflow_documentation.md + - docs/references.md + - docs/mixed_system_prototype.qmd + +website: + title: "Downscaling" + search: true + navbar: + left: + - href: index.qmd + text: Home + - text: Reports + menu: + - href: reports/downscaling_results.qmd + text: Downscaling Results + - href: reports/design_points_analysis.qmd + text: Design Points Analysis + - href: reports/variable_importance.qmd + text: Variable Importance + - href: docs/references.md + text: References + - text: Docs + menu: + - href: README.md + text: README + - href: docs/workflow_documentation.md + text: Workflow Documentation + - text: Examples + menu: + - href: docs/mixed_system_prototype.qmd + text: Mixed System Prototype + +format: + html: + theme: cosmo + toc: true + self-contained: false + df-print: paged + +# Freeze results so CI can publish without re-executing heavy code. +execute: + echo: false + freeze: auto + +author: "David LeBauer" +date: today diff --git a/index.qmd b/index.qmd new file mode 100644 index 0000000..7d3dbf8 --- /dev/null +++ b/index.qmd @@ -0,0 +1,7 @@ +--- +title: "Downscaling" +description: "Project overview and quick links" +--- + +{{< include README.md >}} + From 7de14183ead886dc5ca598f17cbc5d31a9a8a0d6 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 01:50:55 -0400 Subject: [PATCH 22/70] Add ggsave_optimized function and update plot saving in scripts --- R/ggsave_optimized.R | 75 ++++++++++++++++++++++++++++ scripts/011_prepare_anchor_sites.R | 2 +- scripts/021_clustering_diagnostics.R | 11 ++-- 3 files changed, 82 insertions(+), 6 deletions(-) create mode 100644 R/ggsave_optimized.R diff --git a/R/ggsave_optimized.R b/R/ggsave_optimized.R new file mode 100644 index 0000000..3bc4185 --- /dev/null +++ b/R/ggsave_optimized.R @@ -0,0 +1,75 @@ +#' Save Optimized Image Formats +#' +#' Saves a ggplot object to a file, supporting both vector and raster formats. +#' Raster formats are processed with `magick` for additional optimizations. +#' +#' @param filename Character. Path to save the plot, including file extension. +#' @param plot ggplot object. The plot to save. Defaults to the last plot. +#' @param width Numeric. Width of the plot in specified units. Default is 7. +#' @param height Numeric. Height of the plot in specified units. Default is 5. +#' @param units Character. Units for width and height ("in", "cm", "mm"). Default is "in". +#' @param dpi Numeric. Resolution for raster formats. Default is 96. +#' @param quality Numeric. JPEG/WebP quality (1-100). Default is 80. +#' @param bg Character. Background color. Default is "white". +#' @param ... Additional arguments passed to `ggplot2::ggsave`. +#' +#' @return Invisibly returns the filename of the saved plot. +#' @examples +#' \dontrun{ +#' ggsave_optimized("plot.png", plot = my_plot, width = 8, height = 6) +#' } +#' @export +ggsave_optimized <- function( + filename, + plot = ggplot2::last_plot(), + width = 7, + height = 5, + units = "in", + dpi = 96, + quality = 80, + bg = "white", + ...) { + ext <- tolower(tools::file_ext(filename)) + + # Use ggsave for vector formats and raster formats other than WebP + if (ext %in% c("svg", "pdf", "eps", "png", "jpg", "jpeg")) { + ggplot2::ggsave( + filename = filename, + plot = plot, + width = width, + height = height, + units = units, + dpi = dpi, + bg = bg, + ... + ) + # Optimize PNGs with magick + if (ext == "png") { + img <- magick::image_read(filename) + img <- magick::image_quantize(img, max = 256) + magick::image_write(img, path = filename, format = "png", compression_level = 9) + } + return(invisible(filename)) + } + + # Use magick for WebP (not natively supported by ggsave) + if (ext == "webp") { + tmp <- withr::local_tempfile(fileext = ".png") + ggplot2::ggsave( + filename = tmp, + plot = plot, + width = width, + height = height, + units = units, + dpi = dpi, + bg = bg, + ... + ) + img <- magick::image_read(tmp) + magick::image_write(img, path = filename, format = "webp", quality = quality) + unlink(tmp) + return(invisible(filename)) + } + + stop("Unsupported file extension: ", ext) +} diff --git a/scripts/011_prepare_anchor_sites.R b/scripts/011_prepare_anchor_sites.R index 2929117..602aeec 100644 --- a/scripts/011_prepare_anchor_sites.R +++ b/scripts/011_prepare_anchor_sites.R @@ -23,7 +23,7 @@ p <- anchor_sites_pts |> scale_color_brewer(palette = "Dark2") + labs(color = "PFT") + theme_minimal() -ggsave(p, filename = "figures/anchor_sites.png", dpi = 300, bg = "white") +ggsave_optimized("figures/anchor_sites.webp", plot = p, dpi = 96, bg = "white") #' Match anchor sites to LandIQ fields #' diff --git a/scripts/021_clustering_diagnostics.R b/scripts/021_clustering_diagnostics.R index 120a9eb..985f4dc 100644 --- a/scripts/021_clustering_diagnostics.R +++ b/scripts/021_clustering_diagnostics.R @@ -26,9 +26,10 @@ ggpairs_plot <- sites_clustered |> mapping = aes(color = as.factor(cluster), alpha = 0.8) ) + theme_minimal() -ggsave(ggpairs_plot, - filename = "figures/cluster_pairs.png", - dpi = 300, width = 10, height = 10, units = "in" +ggsave_optimized( + "figures/cluster_pairs.webp", + plot = ggpairs_plot, + width = 10, height = 10, units = "in", dpi = 96 ) # scale and reshape to long for plotting @@ -52,7 +53,7 @@ cluster_plot <- ggplot( labs(x = "Variable", y = "Normalized Value") + theme_minimal() -ggsave(cluster_plot, filename = "figures/cluster_plot.png", dpi = 300, bg = "white") +ggsave_optimized("figures/cluster_plot.svg", plot = cluster_plot) #' #' #### Stratification by Crops and Climate Regions @@ -130,4 +131,4 @@ design_pt_plot <- ggplot() + size = 2, stat = "sf_coordinates" ) -ggsave(design_pt_plot, filename = "figures/design_points.png", dpi = 300, bg = "white") +ggsave_optimized("figures/design_points.webp", plot = design_pt_plot, width = 10, height = 6, units = "in", dpi = 96, bg = "white") From 61afd269b0f2c41e227a0995e76bc859bbf8d8d0 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 17:53:21 -0400 Subject: [PATCH 23/70] rename, vectorize, and test combine_value --> combine_mixed_crops. --- 000-config.R | 3 + R/combine_mixed_crops.R | 123 ++++++++++++++++++++++ R/mixed_aggregation.R | 69 ------------ tests/testthat/test-combine_mixed_crops.R | 77 ++++++++++++++ 4 files changed, 203 insertions(+), 69 deletions(-) create mode 100644 R/combine_mixed_crops.R delete mode 100644 R/mixed_aggregation.R create mode 100644 tests/testthat/test-combine_mixed_crops.R diff --git a/000-config.R b/000-config.R index dbb3930..6630885 100644 --- a/000-config.R +++ b/000-config.R @@ -87,3 +87,6 @@ options( ## Source all R scripts in the R/ directory r_scripts <- list.files(file.path(here::here(), "R"), pattern = "\\.R$", full.names = TRUE) lapply(r_scripts, source) + +## Check that all tests pass before proceeding +testthat::test_dir(here::here("tests/testthat")) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R new file mode 100644 index 0000000..677455a --- /dev/null +++ b/R/combine_mixed_crops.R @@ -0,0 +1,123 @@ +#' Combine two-PFT outputs to represent mixed cropping systems +#' +#' Rules for combining woody and annual PFT outputs to represent a +#' mixed cropping system. Supports two methods: +#' - "weighted": PFTs partition area (woody_cover + annual_cover = 1) and the +#' output is a weighted average: `woody_cover * woody_value + annual_cover * annual_value`. +#' - "incremental": preserves the full-area woody baseline (requires `woody_cover == 1`) +#' and treats annual as an increment relative to an annual initial baseline: +#' `annual_delta = annual_value - annual_init`; `result = woody_value + annual_cover * annual_delta`. +#' +#' Vectorization & recycling are handled via `vctrs::vec_recycle_common()`. All scalar +#' inputs are broadcast to the common length. Length mismatches other than 1 or the +#' common size trigger a severe error. +#' +#' Validation rules (severe errors unless otherwise noted): +#' * No input values may be NA (including covers, pool sizes, annual_init if required) +#' * Covers must lie within [0,1] +#' * Method "incremental": `woody_cover` must be 1 (within 0.1%); if not, severe error +#' * Method "incremental": `annual_init` required +#' * Method "weighted": rows whose `woody_cover + annual_cover` differ from 1 by more than tolerance +#' are set to NA in the result; a single aggregated warning is emitted listing the count +#' +#' @param woody_value numeric. Pool size for the woody PFT (kg/m2). +#' @param annual_value numeric. Pool size for the annual PFT (kg/m2). +#' @param annual_init numeric, required for method = "incremental"; the initial annual pool. +#' @param annual_cover numeric. Fractional cover of the annual PFT (f_annual, 0-1). +#' @param woody_cover numeric. Fractional cover of the woody PFT (f_woody, 0-1). Must be 1 for incremental. +#' @param method character. One of "weighted" or "incremental". +#' +#' @return numeric vector of combined values. +#' +#' @examples +#' # Discrete mixing (weights sum to 1) +#' combine_mixed_crops( +#' woody_value = 100, annual_value = 50, +#' annual_cover = 0.2, woody_cover = 0.8, method = "weighted" +#' ) +#' +#' # Overlap: preserve woody baseline (woody_cover==1), add annual increment scaled by cover +#' combine_mixed_crops( +#' woody_value = 200, annual_value = 220, annual_init = 200, +#' annual_cover = 0.3, woody_cover = 1.0, method = "incremental" +#' ) +combine_mixed_crops <- function(woody_value, + annual_value, + annual_init = NULL, + annual_cover, + woody_cover, + method = c("weighted", "incremental")) { + method <- match.arg(method) + + + # Accept scalars for cover and vectors for values + # Collect inputs for recycling (add annual_init only if provided) + recycle_inputs <- list( + woody_value = woody_value, + annual_value = annual_value, + annual_cover = annual_cover, + woody_cover = woody_cover + ) + if (!is.null(annual_init)) recycle_inputs$annual_init <- annual_init + + recycled <- vctrs::vec_recycle_common( + !!!recycle_inputs, + .size = vctrs::vec_size_common(!!!recycle_inputs) + ) + + woody_value <- recycled$woody_value + annual_value <- recycled$annual_value + annual_cover <- recycled$annual_cover + woody_cover <- recycled$woody_cover + if (!is.null(annual_init)) annual_init <- recycled$annual_init + # Internal tolerance for floating point comparisons + tol <- 1e-3 + + # NA checks (annual_init only if required for incremental) + na_checks <- list( + woody_value = woody_value, + annual_value = annual_value, + annual_cover = annual_cover, + woody_cover = woody_cover + ) + if (method == "incremental") { + if (is.null(annual_init)) { + PEcAn.logger::logger.severe("incremental: annual_init is required but missing.") + } + na_checks$annual_init <- annual_init + } + na_found <- vapply(names(na_checks), function(nm) anyNA(na_checks[[nm]]), logical(1)) + if (any(na_found)) { + fields <- paste(names(na_found)[na_found], collapse = ", ") + PEcAn.logger::logger.severe(paste0("NA values not allowed in inputs: ", fields)) + } + + # Range checks for covers + out_of_range_annual <- (annual_cover < 0 - tol) | (annual_cover > 1 + tol) + out_of_range_woody <- (woody_cover < 0 - tol) | (woody_cover > 1 + tol) + if (any(out_of_range_annual | out_of_range_woody, na.rm = TRUE)) { + n_bad <- sum(out_of_range_annual | out_of_range_woody, na.rm = TRUE) + PEcAn.logger::logger.severe(paste0(n_bad, " rows have cover fractions outside [0,1] (tol).")) + } + + if (method == "incremental") { + not_one <- abs(woody_cover - 1) > tol + if (any(not_one, na.rm = TRUE)) { + n_bad <- sum(not_one, na.rm = TRUE) + PEcAn.logger::logger.severe(paste0("incremental: woody_cover must be 1 (+/- ", tol, "); ", n_bad, " rows violate.")) + } + res <- woody_value + annual_cover * (annual_value - annual_init) + return(as.numeric(res)) + } + + # weighted method + sum_cov <- woody_cover + annual_cover + bad_sum <- abs(sum_cov - 1) > tol | is.na(sum_cov) + res <- woody_cover * woody_value + annual_cover * annual_value + if (any(bad_sum, na.rm = TRUE)) { + n_bad <- sum(bad_sum, na.rm = TRUE) + PEcAn.logger::logger.warn(paste0("weighted: ", n_bad, " rows with cover fractions not summing to 1 (tol); results set to NA for those rows.")) + res[bad_sum] <- NA_real_ + } + return(as.numeric(res)) +} diff --git a/R/mixed_aggregation.R b/R/mixed_aggregation.R deleted file mode 100644 index 1041c59..0000000 --- a/R/mixed_aggregation.R +++ /dev/null @@ -1,69 +0,0 @@ -#' Combine two-PFT outputs to represent mixed cropping systems -#' -#' Canonical rules for combining woody and annual PFT outputs. The function -#' implements two scenarios: -#' - "discrete": the woody and annual covers are assumed to partition the area -#' (woody_cover + annual_cover must equal 1) and the output is a weighted -#' average: woody_cover * woody_value + annual_cover * annual_value. -#' - "overlap": the woody baseline is preserved and the annual contribution is -#' treated as an increment relative to an initial annual baseline: -#' annual_delta = annual_value - annual_init, result = woody_value + -#' annual_cover * annual_delta. In this case annual_init must be provided. -#' -#' Variable classes currently supported include carbon stocks "AGB" and "TotSoilCarb" -#' -#' @param var character. Variable name, must be one of "AGB" or "TotSoilCarb" -#' @param woody_value numeric. Pool size for the woody PFT (kg/m2) -#' @param annual_value numeric. Pool size for the annual PFT (kg/m2) -#' @param annual_init numeric, optional. Required and only used for "overlap" -#' scenario; the initial conditions used to compute the change -#' attributed to the annual crop (annual_value - annual_init). -#' @param annual_cover numeric. Fractional cover of the annual PFT (f_annual). -#' @param woody_cover numeric. Fractional cover of the woody PFT (f_woody). -#' @param scenario character. One of "discrete" (default) or "overlap". -#' -#' @details -#' - Discrete: enforces woody_cover + annual_cover == 1 and returns a simple weighted sum. -#' - Overlap: requires annual_init; computes the incremental effect of the annual crop and -#' adds the increment scaled by annual_cover to the woody baseline. -#' -#' @return Numeric. The combined value according to the selected scenario. -#' -#' @author David LeBauer -#' @examples -#' # Discrete mixing (weights sum to 1) -#' combine_value("AGB", woody_value = 100, annual_value = 50, -#' annual_cover = 0.2, woody_cover = 0.8, scenario = "discrete") -#' -#' # Overlap: preserve woody baseline, add annual increment scaled by cover -#' combine_value("TotSoilCarb", woody_value = 200, -#' annual_value = 220, annual_init = 200, -#' annual_cover = 0.3, woody_cover = 0.9, scenario = "overlap") -combine_value <- function(woody_value, annual_value, annual_init = NULL, - annual_cover, woody_cover, - method = c("weighted", "incremental")) { - method <- match.arg(method, choices = c("weighted", "incremental")) - - # Basic NA checks for covers - if (is.na(woody_cover) || is.na(annual_cover)) { - PEcAn.logger::logger.warn("One or both cover fractions are NA; returning NA for combine_value.") - return(NA_real_) - } - - if (method == "weighted") { - sum_cov <- woody_cover + annual_cover - # exact within tolerance -> normal weighted average - if (abs(sum_cov - 1) <= 1e-8) { - return(woody_cover * woody_value + annual_cover * annual_value) - } - } - - if (method == "incremental") { - if (is.null(annual_init) || is.na(annual_init)) { - PEcAn.logger::logger.warn("incremental: annual_init is missing; returning NA.") - return(NA_real_) - } - annual_delta <- annual_value - annual_init - return(woody_value + annual_cover * annual_delta) - } -} diff --git a/tests/testthat/test-combine_mixed_crops.R b/tests/testthat/test-combine_mixed_crops.R new file mode 100644 index 0000000..de9faae --- /dev/null +++ b/tests/testthat/test-combine_mixed_crops.R @@ -0,0 +1,77 @@ +library(testthat) + +loglevel <- PEcAn.logger::logger.setLevel("OFF") + +test_that("weighted mixing basic case", { + res <- combine_mixed_crops( + woody_value = 100, annual_value = 50, + annual_cover = 0.2, woody_cover = 0.8, + method = "weighted" + ) + expected <- 0.8 * 100 + 0.2 * 50 + expect_true(abs(res - expected) < 1e-10) +}) + +test_that("weighted cover sum mismatch yields NA output", { + res <- combine_mixed_crops( + woody_value = 100, annual_value = 50, + annual_cover = 0.25, woody_cover = 0.8, + method = "weighted" + ) + expect_true(is.na(res)) +}) + +test_that("incremental mixing basic case", { + res <- combine_mixed_crops( + woody_value = 200, annual_value = 220, annual_init = 200, + annual_cover = 0.3, woody_cover = 1.0, method = "incremental" + ) + expected <- 200 + 0.3 * (220 - 200) + expect_true(abs(res - expected) < 1e-10) +}) + +test_that("incremental requires annual_init", { + testthat::expect_error( + res <- combine_mixed_crops( + woody_value = 200, annual_value = 220, + annual_cover = 0.3, woody_cover = 1.0, + method = "incremental" + ), + "annual_init is required but missing" + ) +}) + +test_that("incremental requires woody_cover == 1", { + testthat::expect_error( + res <- combine_mixed_crops( + woody_value = 200, annual_value = 220, annual_init = 200, + annual_cover = 0.3, woody_cover = 0.9, + method = "incremental" + ), + "woody_cover must be 1" + ) +}) + +test_that("NA inputs rejected", { + expect_error( + res <- combine_mixed_crops( + woody_value = NA_real_, annual_value = 50, + annual_cover = 0.2, woody_cover = 0.8, + method = "weighted" + ), + "NA values not allowed in inputs" + ) +}) + +test_that("length mismatch rejected", { + expect_error( + res <- combine_mixed_crops( + woody_value = 1:2, annual_value = 1:3, + annual_cover = 0.2, woody_cover = 0.8, + method = "weighted" + ), + "Can't recycle" + ) +}) + +PEcAn.logger::logger.setLevel(loglevel) From b576d1ece951dc509b816a8312d3460f51e17d50 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 18:48:36 -0400 Subject: [PATCH 24/70] Refactor county-level plotting to include delta calculations for carbon density and stock and mask counties that cumulatively contribute <1% of total (consider this below detection limit given uncertainty) --- scripts/043_county_level_plots.R | 269 +++++++++++++++++++++++++++---- 1 file changed, 235 insertions(+), 34 deletions(-) diff --git a/scripts/043_county_level_plots.R b/scripts/043_county_level_plots.R index 6f0a5dc..9999497 100644 --- a/scripts/043_county_level_plots.R +++ b/scripts/043_county_level_plots.R @@ -6,58 +6,259 @@ county_summaries <- readr::read_csv(file = file.path(model_outdir, "county_summa co_preds_to_plot <- county_summaries |> dplyr::right_join(county_boundaries, by = "county") |> - dplyr::arrange(county, model_output) |> + dplyr::arrange(county, model_output, pft) |> tidyr::pivot_longer( cols = c(mean_total_c_Tg, sd_total_c_Tg, mean_c_density_Mg_ha, sd_c_density_Mg_ha), - names_to = "stat", + names_to = "stat", values_to = "value" ) |> dplyr::mutate( units = dplyr::case_when( - stringr::str_detect(stat, "total_c") ~ "Carbon Stock (Tg)", + stringr::str_detect(stat, "total_c") ~ "Carbon Stock (Tg)", stringr::str_detect(stat, "c_density") ~ "Carbon Density (Mg/ha)" ), stat = dplyr::case_when( stringr::str_detect(stat, "mean") ~ "Mean", - stringr::str_detect(stat, "sd") ~ "SD" + stringr::str_detect(stat, "sd") ~ "SD" ) ) -units <- rep(unique(co_preds_to_plot$units), each = length(outputs_to_extract)) -pool_x_units <- co_preds_to_plot |> - dplyr::select(model_output, units) |> - dplyr::distinct() |> - # remove na - dplyr::filter(!is.na(model_output)) |> # why is one field in SF county NA? - dplyr::arrange(model_output, units) |> - dplyr::filter(!is.na(model_output)) - -p <- purrr::map2(pool_x_units$model_output, pool_x_units$units, function(pool, unit) { - .p <- ggplot2::ggplot( - dplyr::filter(co_preds_to_plot, model_output == pool & units == unit), - ggplot2::aes(geometry = geom, fill = value) - ) + - ggplot2::geom_sf(data = county_boundaries, fill = "lightgrey", color = "black") + +combos <- co_preds_to_plot |> + dplyr::filter(!is.na(model_output), !is.na(pft)) |> + dplyr::distinct(pft, model_output, units) |> + dplyr::arrange(pft, model_output, units) + +p <- purrr::pmap( + list(combos$pft, combos$model_output, combos$units), + function(.pft, pool, unit) { + # Cumulative-share mask: exclude smallest-contributing counties whose aggregate + # share of statewide total carbon is below MASK_THRESHOLD for this PFT + pool + mask_df <- county_summaries |> + dplyr::filter(pft == .pft, model_output == pool) |> + dplyr::mutate(share = mean_total_c_Tg / sum(mean_total_c_Tg, na.rm = TRUE)) |> + dplyr::arrange(share) |> + dplyr::mutate(cum_share = cumsum(dplyr::coalesce(share, 0))) |> + dplyr::mutate(mask_ok = cum_share >= get0("MASK_THRESHOLD", ifnotfound = 0.01)) |> + dplyr::select(county, mask_ok) + + dat <- dplyr::filter(co_preds_to_plot, pft == .pft, model_output == pool, units == unit) |> + dplyr::left_join(mask_df, by = "county") |> + dplyr::mutate( + .mask_ok = dplyr::coalesce(mask_ok, FALSE), + value_plot = dplyr::if_else(.mask_ok, value, NA_real_) + ) + .plt <- ggplot2::ggplot( + dat, + ggplot2::aes(geometry = geom, fill = value_plot) + ) + + ggplot2::geom_sf(data = county_boundaries, mapping = ggplot2::aes(geometry = geom), fill = "white", color = "black", inherit.aes = FALSE) + + ggplot2::geom_sf() + + ggplot2::scale_fill_viridis_c(option = "plasma", na.value = "white") + + ggplot2::theme_minimal() + + ggplot2::facet_grid(model_output ~ stat) + + ggplot2::labs( + title = paste("County-Level", pool, "-", .pft), + subtitle = paste0( + "Excluding smallest counties whose combined share < ", + scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), + " of statewide total carbon" + ), + fill = unit + ) + + ggplot2::guides(fill = ggplot2::guide_colorbar(title.position = "top")) + + unit_key <- ifelse(unit == "Carbon Stock (Tg)", "stock", + ifelse(unit == "Carbon Density (Mg/ha)", "density", NA) + ) + pft_key <- stringr::str_replace_all(.pft, "[^A-Za-z0-9]+", "_") + plotfile <- here::here("figures", paste0("county_", pft_key, "_", pool, "_carbon_", unit_key, ".webp")) + PEcAn.logger::logger.info("Creating county-level plot for", pool, unit, "PFT:", .pft) + ggsave_optimized( + filename = plotfile, + plot = .plt, + width = 10, height = 5, units = "in", dpi = 96, + bg = "white" + ) + return(.plt) + } +) + +## --- County-level differences from matched fields: (woody + annual) minus (woody perennial crop) --- ## +dp <- vroom::vroom( + file.path(model_outdir, "downscaled_preds.csv"), + col_types = readr::cols( + site_id = readr::col_character(), + pft = readr::col_character(), + ensemble = readr::col_double(), + c_density_Mg_ha = readr::col_double(), + total_c_Mg = readr::col_double(), + area_ha = readr::col_double(), + county = readr::col_character(), + model_output = readr::col_character() + ) +) +mix <- dp |> + dplyr::filter(pft == "woody + annual") |> + dplyr::select(site_id, ensemble, model_output, county, area_ha_mix = area_ha, total_c_Mg_mix = total_c_Mg) +wood <- dp |> + dplyr::filter(pft == "woody perennial crop") |> + dplyr::select(site_id, ensemble, model_output, county, area_ha_woody = area_ha, total_c_Mg_woody = total_c_Mg) +diff_county <- mix |> + dplyr::inner_join(wood, by = c("site_id", "ensemble", "model_output", "county")) |> + dplyr::mutate(diff_total_Mg = total_c_Mg_mix - total_c_Mg_woody, area_ha = dplyr::coalesce(area_ha_woody, area_ha_mix)) |> + dplyr::group_by(county, model_output, ensemble) |> + dplyr::summarise(diff_total_Mg = sum(diff_total_Mg), total_ha = sum(area_ha), .groups = "drop") |> + dplyr::mutate(diff_total_Tg = PEcAn.utils::ud_convert(diff_total_Mg, "Mg", "Tg"), diff_density_Mg_ha = diff_total_Mg / total_ha) |> + dplyr::group_by(county, model_output) |> + dplyr::summarise(mean_diff_total_Tg = mean(diff_total_Tg), mean_diff_density_Mg_ha = mean(diff_density_Mg_ha), .groups = "drop") |> + dplyr::right_join(county_boundaries, by = "county") + +for (pool in unique(stats::na.omit(diff_county$model_output))) { + dat_pool <- dplyr::filter(diff_county, model_output == pool) + + # Cumulative-share mask based on absolute change magnitudes + sum_abs <- sum(abs(dat_pool$mean_diff_total_Tg), na.rm = TRUE) + if (is.finite(sum_abs) && sum_abs > 0) { + dat_pool_mask <- dat_pool |> + dplyr::mutate(share = abs(mean_diff_total_Tg) / sum_abs) |> + dplyr::arrange(share) |> + dplyr::mutate(cum_share = cumsum(dplyr::coalesce(share, 0))) |> + dplyr::mutate(mask_ok = cum_share >= get0("MASK_THRESHOLD", ifnotfound = 0.01)) + } else { + dat_pool_mask <- dplyr::mutate(dat_pool, mask_ok = TRUE) + } + + # Density difference map (Mg/ha) + p_density <- ggplot2::ggplot(dat_pool_mask, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_diff_density_Mg_ha, NA_real_))) + + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + ggplot2::geom_sf() + - ggplot2::scale_fill_viridis_c(option = "plasma") + + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + ggplot2::theme_minimal() + - ggplot2::facet_grid(model_output ~ stat) + ggplot2::labs( - title = paste("County-Level Predictions for", pool, unit), - fill = "Value" + title = paste("Difference in Carbon Density (Mg/ha): (woody + annual) - (woody)", pool), + subtitle = paste0( + "Excluding smallest counties whose combined share < ", + scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), + " of statewide total change" + ), + fill = "Delta (Mg/ha)" ) + ggsave_optimized( + filename = here::here("figures", paste0("county_diff_woody_plus_annual_minus_woody_", pool, "_carbon_density.webp")), + plot = p_density, + width = 10, height = 5, units = "in", dpi = 96, bg = "white" + ) + + # Stock difference map (Tg) + p_stock <- ggplot2::ggplot(dat_pool_mask, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_diff_total_Tg, NA_real_))) + + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + + ggplot2::geom_sf() + + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + + ggplot2::theme_minimal() + + ggplot2::labs( + title = paste("Difference in Carbon Stock (Tg): (woody + annual) - (woody)", pool), + subtitle = paste0( + "Excluding smallest counties whose combined share < ", + scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), + " of statewide total change" + ), + fill = "Delta (Tg)" + ) + ggsave_optimized( + filename = here::here("figures", paste0("county_diff_woody_plus_annual_minus_woody_", pool, "_carbon_stock.webp")), + plot = p_stock, + width = 10, height = 5, units = "in", dpi = 96, bg = "white" + ) +} - unit <- ifelse(unit == "Carbon Stock (Tg)", "stock", - ifelse(unit == "Carbon Density (Mg/ha)", "density", NA) +## --- County-level deltas: start->end for woody and annual --- ## +delta_csv <- file.path(model_outdir, "downscaled_deltas.csv") +if (file.exists(delta_csv)) { + deltas <- vroom::vroom( + delta_csv, + col_types = readr::cols( + site_id = readr::col_character(), + pft = readr::col_character(), + ensemble = readr::col_double(), + delta_c_density_Mg_ha = readr::col_double(), + delta_total_c_Mg = readr::col_double(), + area_ha = readr::col_double(), + county = readr::col_character(), + model_output = readr::col_character() + ) ) - plotfile <- here::here("figures", paste0("county_", pool, "_carbon_", unit, ".png")) - PEcAn.logger::logger.info("Creating county-level plot for", pool, unit) - ggplot2::ggsave( - plot = .p, - filename = plotfile, - width = 10, height = 5, - bg = "white" + delta_county <- deltas |> + dplyr::group_by(model_output, pft, county, ensemble) |> + dplyr::summarize(total_delta_Mg = sum(delta_total_c_Mg), total_ha = sum(area_ha), .groups = "drop") |> + dplyr::mutate( + delta_density_Mg_ha = total_delta_Mg / total_ha, + delta_total_Tg = PEcAn.utils::ud_convert(total_delta_Mg, "Mg", "Tg") + ) |> + dplyr::group_by(model_output, pft, county) |> + dplyr::summarize( + mean_delta_density_Mg_ha = mean(delta_density_Mg_ha), + mean_delta_total_Tg = mean(delta_total_Tg), + .groups = "drop" + ) |> + dplyr::right_join(county_boundaries, by = "county") + + combos_delta <- delta_county |> + dplyr::filter(!is.na(model_output), !is.na(pft)) |> + dplyr::distinct(pft, model_output) + + purrr::pwalk( + combos_delta, + function(pft, model_output) { + datp <- dplyr::filter(delta_county, pft == !!pft, model_output == !!model_output) + sum_abs <- sum(abs(datp$mean_delta_total_Tg), na.rm = TRUE) + if (is.finite(sum_abs) && sum_abs > 0) { + datp <- datp |> + dplyr::mutate(share = abs(mean_delta_total_Tg) / sum_abs) |> + dplyr::arrange(share) |> + dplyr::mutate(cum_share = cumsum(dplyr::coalesce(share, 0))) |> + dplyr::mutate(mask_ok = cum_share >= get0("MASK_THRESHOLD", ifnotfound = 0.01)) + } else { + datp <- dplyr::mutate(datp, mask_ok = TRUE) + } + pft_key <- stringr::str_replace_all(pft, "[^A-Za-z0-9]+", "_") + # density + p_den <- ggplot2::ggplot(datp, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_delta_density_Mg_ha, NA_real_))) + + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + + ggplot2::geom_sf() + + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + + ggplot2::theme_minimal() + + ggplot2::labs( + title = paste("Delta Density (start->end)", model_output, "-", pft), + subtitle = paste0( + "Excluding smallest counties whose combined share < ", + scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), + " of statewide total change" + ), + fill = "Delta (Mg/ha)" + ) + ggsave_optimized( + filename = here::here("figures", paste0("county_delta_", pft_key, "_", model_output, "_carbon_density.webp")), + plot = p_den, width = 10, height = 5, units = "in", dpi = 96, bg = "white" + ) + # stock + p_stk <- ggplot2::ggplot(datp, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_delta_total_Tg, NA_real_))) + + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + + ggplot2::geom_sf() + + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + + ggplot2::theme_minimal() + + ggplot2::labs( + title = paste("Delta Stock (start->end)", model_output, "-", pft), + subtitle = paste0( + "Excluding smallest counties whose combined share < ", + scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), + " of statewide total change" + ), + fill = "Delta (Tg)" + ) + ggsave_optimized(filename = here::here("figures", paste0("county_delta_", pft_key, "_", model_output, "_carbon_stock.webp")), plot = p_stk, width = 10, height = 5, units = "in", dpi = 96, bg = "white") + } ) - return(.p) -}) +} else { + PEcAn.logger::logger.warn("downscaled_deltas.csv not found; skipping delta maps") +} From 176e2006b3f38d8a2ec0444b1357591224c84614 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 19:06:56 -0400 Subject: [PATCH 25/70] downscaling script: add mixed scenario calculations for carbon density and stock, save variable importance results; add logging and memory tracking --- R/helper.R | 17 ++ scripts/040_downscale.R | 488 +++++++++++++++++++++++++++++++++++----- 2 files changed, 452 insertions(+), 53 deletions(-) create mode 100644 R/helper.R diff --git a/R/helper.R b/R/helper.R new file mode 100644 index 0000000..31be91d --- /dev/null +++ b/R/helper.R @@ -0,0 +1,17 @@ +# ---- helper instrumentation utilities --------------------------------------- +ts_now <- function() format(Sys.time(), "%Y-%m-%d %H:%M:%S") +step_timer <- function() list(start = Sys.time()) +step_elapsed <- function(timer) as.numeric(difftime(Sys.time(), timer$start, units = "secs")) +log_mem <- function(prefix = "") { + # Best-effort simple memory usage (RSS) on Linux; fallback to NA + rss <- tryCatch( + { + pid <- Sys.getpid() + # RSS in kB + as.numeric(system(paste("ps -o rss= -p", pid), intern = TRUE)) / 1024 + }, + error = function(e) NA_real_ + ) + PEcAn.logger::logger.info(prefix, "Memory ~", ifelse(is.na(rss), "NA", paste0(round(rss, 1), " MB"))) +} +overall_timer <- step_timer() diff --git a/scripts/040_downscale.R b/scripts/040_downscale.R index 783b13c..f343a7b 100644 --- a/scripts/040_downscale.R +++ b/scripts/040_downscale.R @@ -17,10 +17,20 @@ PEcAn.logger::logger.info("***Starting Downscaling and Aggregation***") #----- load ensemble data ---------------------------------------------------- ensemble_csv <- file.path(model_outdir, "ensemble_output.csv") +timer_read_ensemble <- step_timer() ensemble_data <- readr::read_csv(ensemble_csv) |> dplyr::rename( ensemble = parameter # parameter is EFI std name for ensemble ) +PEcAn.logger::logger.info( + "Loaded ensemble data:", nrow(ensemble_data), "rows;", + dplyr::n_distinct(ensemble_data$site_id), "unique site_ids;", + dplyr::n_distinct(ensemble_data$ensemble), "ensembles;", + dplyr::n_distinct(ensemble_data$pft), "PFTs;", + dplyr::n_distinct(ensemble_data$variable), "variables (carbon pools) in file; load_time_s=", + round(step_elapsed(timer_read_ensemble), 2) +) +log_mem("After loading ensemble data :: ") ensemble_ids <- ensemble_data |> dplyr::pull(ensemble) |> @@ -54,13 +64,18 @@ if (length(pfts) == 0) { } #--- load site covariates -covariates_csv <- file.path(data_dir, "site_covariates_plus_dp_patch.csv") -# covariates_csv <- file.path(data_dir, "site_covariates.csv") +covariates_csv <- file.path(data_dir, "site_covariates.csv") +timer_read_cov <- step_timer() covariates <- readr::read_csv(covariates_csv) |> dplyr::select( site_id, where(is.numeric), -climregion_id ) +PEcAn.logger::logger.info( + "Loaded covariates:", nrow(covariates), "sites x", ncol(covariates) - 1, "numeric predictors; load_time_s=", + round(step_elapsed(timer_read_cov), 2) +) +log_mem("After loading covariates :: ") covariate_names <- names(covariates |> dplyr::select(where(is.numeric))) @@ -70,6 +85,72 @@ PEcAn.logger::logger.info( paste(covariate_names, collapse = ", ") ) +# ---- variable-importance helpers ------------------------------------------- +safe_sanitize <- function(x) { + gsub("[^A-Za-z0-9]+", "_", x) +} + +extract_vi <- function(model) { + # Supports randomForest and ranger models + if (inherits(model, "randomForest")) { + vi <- tryCatch(randomForest::importance(model), error = function(e) NULL) + if (is.null(vi)) { + return(NULL) + } + if ("%IncMSE" %in% colnames(vi)) as.numeric(vi[, "%IncMSE"]) else as.numeric(vi[, 1]) + } else if (inherits(model, "ranger")) { + vi <- tryCatch(model$variable.importance, error = function(e) NULL) + if (is.null(vi)) { + return(NULL) + } + as.numeric(vi) + } else { + NULL + } +} + +extract_vi_names <- function(model) { + if (inherits(model, "randomForest")) { + vi <- tryCatch(randomForest::importance(model), error = function(e) NULL) + if (is.null(vi)) { + return(NULL) + } + rownames(vi) + } else if (inherits(model, "ranger")) { + nms <- tryCatch(names(model$variable.importance), error = function(e) NULL) + nms + } else { + NULL + } +} + +extract_oob_r2 <- function(model, y_train = NULL) { + if (inherits(model, "randomForest")) { + if (!is.null(model$predicted) && !is.null(model$y)) { + y <- model$y + yhat <- model$predicted + if (length(y) == length(yhat) && stats::var(y) > 0) { + return(1 - sum((y - yhat)^2) / sum((y - mean(y))^2)) + } + } + return(NA_real_) + } + if (inherits(model, "ranger")) { + mse_oob <- tryCatch(model$prediction.error, error = function(e) NA_real_) + if (!is.na(mse_oob)) { + if (is.null(y_train)) { + return(NA_real_) + } + v <- stats::var(y_train) + if (!is.finite(v) || v <= 0) { + return(NA_real_) + } + return(1 - mse_oob / v) + } + } + NA_real_ +} + # ----define design points based on ensemble data------------------------------- # TODO: move this sanitization upstream to when ens data is created (030_extract_sipnet_output.R) # or better ... figure out why we so often run into mis-match!!! @@ -176,8 +257,14 @@ pft_site_ids <- ca_field_attributes |> dplyr::filter(pft %in% pfts) |> dplyr::distinct(site_id, pft) -sites_info <- lapply(pft_site_ids, length) -PEcAn.logger::logger.info("Sites per PFT:", paste(sites_info, collapse = ", ")) +sites_info <- pft_site_ids |> + dplyr::group_by(pft) |> + dplyr::summarise(n_sites = dplyr::n(), .groups = "drop") +PEcAn.logger::logger.info( + "Sites per PFT:", + paste(paste0(sites_info$pft, "=", sites_info$n_sites), collapse = "; ") +) +log_mem("After computing sites per PFT :: ") #### Target sites: per-PFT site lists built above (pft_site_ids) @@ -244,9 +331,21 @@ downscale_model_output <- function(date, # We downscale each carbon pool for both woody and annual PFTs, # predicting to the same target set for that PFT downscale_output_list <- list() +delta_output_records <- list() +training_sites_records <- list() +combo_total <- length(outputs_to_extract) * length(pfts) +combo_index <- 0L +loop_global_timer <- step_timer() for (pool in outputs_to_extract) { for (pft_i in pfts) { - PEcAn.logger::logger.info("Starting downscaling for", pool, "(", pft_i, ")") + combo_index <- combo_index + 1L + iter_timer <- step_timer() + PEcAn.logger::logger.info( + sprintf( + "[Progress %d/%d] Starting downscaling for %s (%s) at %s", + combo_index, combo_total, pool, pft_i, ts_now() + ) + ) # train_ens: ensemble data filtered by PFT train_ens <- ensemble_data |> @@ -274,38 +373,29 @@ for (pool in outputs_to_extract) { ) } # train_pts: design points filtered by PFT - # should be the same as train_pts <- train_ens |> dplyr::select(site_id, lat, lon, pft) |> dplyr::distinct() - # train_pts <- design_points |> - # dplyr::filter(pft == pft_i) - # Quick diagnostic: overlapping site counts + # Diagnostic: overlapping site counts n_train_ens_sites <- length(unique(train_ens$site_id)) n_train_pts <- nrow(train_pts) PEcAn.logger::logger.info("Training sites: ensemble has", n_train_ens_sites, "site_ids; using", n_train_pts, "coords") - - # NOTE: We skip strict pre-check; wrapper will attempt subset_ensemble then manual fallback. - - # prediction covariates: either full set for that PFT or sampled subset (dev) + training_sites_records[[paste0(pft_i, "::", pool)]] <- + tibble::tibble(site_id = unique(train_pts$site_id), pft = pft_i, model_output = pool) + + # Ensure design point covariates are included for training join + dp_pft <- design_covariates_unscaled |> + dplyr::filter(site_id %in% (design_points |> + dplyr::filter(pft == pft_i) |> + dplyr::pull(site_id))) + # prediction covariates: either full set for that PFT (prod) or sampled subset (dev) if (PRODUCTION) { - # Ensure design point covariates are included for training join - dp_pft <- design_covariates_unscaled |> - dplyr::filter(site_id %in% (design_points |> - dplyr::filter(pft == pft_i) |> - dplyr::pull(site_id))) - pred_covs <- covariates_full |> dplyr::filter(site_id %in% (pft_site_ids |> dplyr::filter(pft == pft_i) |> dplyr::pull(site_id))) |> dplyr::bind_rows(dp_pft) |> dplyr::distinct(site_id, .keep_all = TRUE) } else { - dp_pft <- design_covariates_unscaled |> - dplyr::filter(site_id %in% (design_points |> - dplyr::filter(pft == pft_i) |> - dplyr::pull(site_id))) - sample_pool <- covariates_full |> dplyr::filter(site_id %in% pft_site_ids[[pft_i]]) |> dplyr::anti_join(dp_pft, by = "site_id") @@ -319,12 +409,6 @@ for (pool in outputs_to_extract) { } pred_covs <- dplyr::bind_rows(sampled, dp_pft) - - # ensure prediction covariates have at least one site (development mode) - if (nrow(pred_covs) == 0) { - PEcAn.logger::logger.warn("No prediction covariates for PFT:", pft_i, " pool:", pool, " skipping") - next - } } # Guard for empty prediction covariates (both development mode and production) @@ -343,6 +427,7 @@ for (pool in outputs_to_extract) { ) } + call_timer <- step_timer() result <- downscale_model_output( date = slice_end_date, model_output = pool, @@ -350,38 +435,128 @@ for (pool in outputs_to_extract) { train_site_coords = train_pts, pred_covariates = pred_covs ) + PEcAn.logger::logger.info( + "Completed downscaling for", pool, "(", pft_i, ") in", + round(step_elapsed(call_timer), 2), "s; n_pred_sites=", nrow(pred_covs), + " n_train_sites=", n_train_pts + ) + log_mem(paste0("After downscaling ", pool, " (", pft_i, ") :: ")) + # Also compute start-date predictions to enable delta maps + start_obj <- downscale_model_output( + date = start_date, + model_output = pool, + train_ensemble_data = train_ens, + train_site_coords = train_pts, + pred_covariates = pred_covs + ) + + # Save models and VI metrics if available from downscale outputs + if (!is.null(result) && !is.null(result$model)) { + models_dir <- file.path(cache_dir, "models") + train_dir <- file.path(cache_dir, "training_data") + if (!dir.exists(models_dir)) dir.create(models_dir, recursive = TRUE, showWarnings = FALSE) + if (!dir.exists(train_dir)) dir.create(train_dir, recursive = TRUE, showWarnings = FALSE) + + spec_key <- paste0(safe_sanitize(pft_i), "_", safe_sanitize(pool)) + saveRDS(result$model, file = file.path(models_dir, paste0(spec_key, "_models.rds"))) - PEcAn.logger::logger.info("Downscaling complete for", pool, "(", pft_i, ")") + if (!is.null(result$data) && !is.null(result$data$training)) { + tr_path <- file.path(train_dir, paste0(spec_key, "_training.csv")) + readr::write_csv(result$data$training, tr_path) + } + + ens_labels <- names(result$predictions) + if (is.null(ens_labels)) ens_labels <- as.character(seq_along(result$predictions)) + vi_rows <- list() + for (mi in seq_along(result$model)) { + mdl <- result$model[[mi]] + vi_vals <- extract_vi(mdl) + vi_nms <- extract_vi_names(mdl) + if (is.null(vi_vals) || is.null(vi_nms)) next + y_train <- NULL + if (!is.null(result$data) && !is.null(result$data$training)) { + tr <- result$data$training + if ("ensemble" %in% names(tr) && "prediction" %in% names(tr)) { + y_train <- tr$prediction[tr$ensemble == ens_labels[mi]] + } + } else if (!is.null(mdl$y)) { + y_train <- mdl$y + } + r2_oob <- extract_oob_r2(mdl, y_train) + vi_rows[[length(vi_rows) + 1L]] <- tibble::tibble( + pft = pft_i, + model_output = pool, + ensemble = ens_labels[mi], + predictor = vi_nms, + importance = as.numeric(vi_vals[seq_along(vi_nms)]), + oob_r2 = r2_oob + ) + } + if (length(vi_rows) > 0) { + vi_tbl <- dplyr::bind_rows(vi_rows) + out_vi_per_ens <- file.path(model_outdir, paste0("vi_", spec_key, "_by_ensemble.csv")) + readr::write_csv(vi_tbl, out_vi_per_ens) + } + } # store using pft::pool names (e.g. "woody::AGB"). downscale_output_list[[paste0(pft_i, "::", pool)]] <- result + if (!is.null(result) && !is.null(start_obj)) { + end_df <- purrr::map( + result$predictions, + ~ tibble::tibble(site_id = result$site_ids, prediction = .x) + ) |> + dplyr::bind_rows(.id = "ensemble") |> + dplyr::rename(end_pred = prediction) + start_df <- purrr::map( + start_obj$predictions, + ~ tibble::tibble(site_id = start_obj$site_ids, prediction = .x) + ) |> + dplyr::bind_rows(.id = "ensemble") |> + dplyr::rename(start_pred = prediction) + delta_df <- end_df |> + dplyr::inner_join(start_df, by = c("site_id", "ensemble")) |> + dplyr::mutate(delta_pred = end_pred - start_pred) |> + dplyr::left_join(ca_fields, by = "site_id") |> + dplyr::mutate( + pft = pft_i, + model_output = pool, + delta_c_density_Mg_ha = PEcAn.utils::ud_convert(delta_pred, "kg/m2", "Mg/ha"), + delta_total_c_Mg = delta_c_density_Mg_ha * area_ha + ) |> + dplyr::select(site_id, pft, ensemble, delta_c_density_Mg_ha, delta_total_c_Mg, area_ha, county, model_output) + delta_output_records[[paste0(pft_i, "::", pool)]] <- delta_df + } + + # Incremental checkpoint (so production runs can be resumed) + if (!is.null(result)) { + tryCatch( + { + saveRDS(downscale_output_list, file = file.path(cache_dir, "downscale_partial.rds")) + }, + error = function(e) { + PEcAn.logger::logger.warn("Failed to write partial checkpoint: ", conditionMessage(e)) + } + ) + } + PEcAn.logger::logger.info( + sprintf( + "[Progress %d/%d] Finished %s (%s); iter_time_s=%.2f; elapsed_total_s=%.2f", + combo_index, combo_total, pool, pft_i, + step_elapsed(iter_timer), step_elapsed(loop_global_timer) + ) + ) } } if (length(downscale_output_list) == 0) { PEcAn.logger::logger.severe("No downscale outputs produced") } -PEcAn.logger::logger.info("Downscaling complete for all model outputs") - -## Save to make it easier to restart -#### ---Create checkpoint for downstream analysis---#### -checkpoint_file <- file.path(cache_dir, "downscaling_output.RData") -start_end <- system.time( - save( - downscale_output_list, - covariates, - design_points, - design_covariates, - ensemble_ids, - pft_site_ids, - file = checkpoint_file, - compress = FALSE - ) -) PEcAn.logger::logger.info( - "Downscaling output objects saved to", checkpoint_file, - "\nIt took", round(start_end[3] / 60, 2), "minutes" + "Downscaling loop complete; total_elapsed_s=", + round(step_elapsed(loop_global_timer), 2) ) +log_mem("Post primary downscaling loop :: ") PEcAn.logger::logger.info( paste0(" Finished downscaling "), @@ -392,7 +567,8 @@ PEcAn.logger::logger.info( ### --- Print Metrics for Each Ensemble Member ---#### PEcAn.logger::logger.info("Downscaling model results for each ensemble member:") -metrics <- lapply(downscale_output_list, PEcAnAssimSequential::downscale_metrics) # nolint +metrics_timer <- step_timer() +metrics <- lapply(downscale_output_list, PEcAnAssimSequential::downscale_metrics) median_metrics <- purrr::map(metrics, function(m) { m |> @@ -409,6 +585,9 @@ median_metrics <- purrr::map(metrics, function(m) { PEcAn.logger::logger.info("Median downscaling model metrics:") dplyr::bind_rows(median_metrics, .id = "model_output") |> knitr::kable() +PEcAn.logger::logger.info( + "Computed median metrics in", round(step_elapsed(metrics_timer), 2), "s" +) if (!PRODUCTION) { # For testing, use a subset of fields @@ -441,11 +620,214 @@ downscale_preds <- purrr::map(downscale_output_list, get_downscale_preds) |> # Convert kg/m2 to Mg/ha using PEcAn.utils::ud_convert dplyr::mutate(c_density_Mg_ha = PEcAn.utils::ud_convert(prediction, "kg/m2", "Mg/ha")) |> # Calculate total Mg per field: c_density_Mg_ha * area_ha - dplyr::mutate(total_c_Mg = c_density_Mg_ha * area_ha) + dplyr::mutate(total_c_Mg = c_density_Mg_ha * area_ha) |> + dplyr::select(site_id, pft, ensemble, c_density_Mg_ha, total_c_Mg, area_ha, county, model_output, -prediction) + +dp <- downscale_preds |> + dplyr::select( + site_id, pft, ensemble, + c_density_Mg_ha, total_c_Mg, + area_ha, county, model_output + ) + +## --- Mixed scenario: orchard overlap with 50% grass on woody fields --- ## +# Goal: add an additional PFT record "woody + annual" computed as: +# combined = woody_value + f_annual * (annual_end - annual_start) +# where values are in kg/m2 and f_annual = 0.5. + +# Identify labels used for woody and annual PFTs +# TODO use lookup table to map crops --> pfts and get labels +woody_label <- pfts[grepl("woody", pfts, ignore.case = TRUE)] +annual_label <- pfts[grepl("annual", pfts, ignore.case = TRUE)] + +if (is.na(woody_label) | is.na(annual_label)) { + PEcAn.logger::logger.warn("Cannot build mixed scenario: missing woody or annual PFT") +} else { + PEcAn.logger::logger.info( + "Building mixed scenario 'woody + annual' using overlap (incremental) with 50% annual cover" + ) +} + +# Helper to tidy a downscale object to site_id/ensemble/prediction (kg/m2) +tidy_downscale <- function(ds) { + purrr::map( + ds$predictions, + ~ tibble::tibble(site_id = ds$site_ids, prediction = .x) + ) |> + dplyr::bind_rows(.id = "ensemble") +} + +# Determine target woody sites that exist in current dp +target_woody_sites <- dp |> + dplyr::filter(pft == woody_label) |> + dplyr::distinct(site_id) |> + dplyr::pull(site_id) + +# If no woody sites present, skip +if (length(target_woody_sites) == 0) { + PEcAn.logger::logger.warn("No woody sites found in downscaled predictions; skipping mixed scenario") +} else { + # Build covariates for predicting annual onto woody sites, ensuring + # design-point covariates for the annual PFT are available for training join + dp_annual <- design_covariates_unscaled |> + dplyr::filter(site_id %in% (design_points |> + dplyr::filter(pft == annual_label) |> + dplyr::pull(site_id))) + + pred_cov_mixed <- covariates_full |> + dplyr::filter(site_id %in% target_woody_sites) |> + dplyr::bind_rows(dp_annual) |> + dplyr::distinct(site_id, .keep_all = TRUE) + + mixed_records <- list() + + for (pool in outputs_to_extract) { + # Training data for annual + train_ens_annual <- ensemble_data |> + dplyr::filter(pft == annual_label & variable == pool) + + if (nrow(train_ens_annual) == 0) { + PEcAn.logger::logger.warn("No annual ensemble data for pool ", pool, "; skipping mixed for this pool") + next + } + + train_pts_annual <- train_ens_annual |> + dplyr::select(site_id, lat, lon, pft) |> + dplyr::distinct() + + # Annual predictions at start and end dates on woody sites + annual_start_obj <- downscale_model_output( + date = start_date, + model_output = pool, + train_ensemble_data = train_ens_annual, + train_site_coords = train_pts_annual, + pred_covariates = pred_cov_mixed + ) + annual_end_obj <- downscale_model_output( + date = end_date, + model_output = pool, + train_ensemble_data = train_ens_annual, + train_site_coords = train_pts_annual, + pred_covariates = pred_cov_mixed + ) + + # Get woody predictions at end date from existing results + woody_key <- paste0(woody_label, "::", pool) + woody_obj <- downscale_output_list[[woody_key]] + + if (is.null(annual_start_obj) || is.null(annual_end_obj) || is.null(woody_obj)) { + PEcAn.logger::logger.warn("Missing components for mixed scenario in pool ", pool, "; skipping") + next + } + + woody_df <- tidy_downscale(woody_obj) |> + dplyr::filter(site_id %in% target_woody_sites) |> + dplyr::rename(woody_pred = prediction) + + ann_start_df <- tidy_downscale(annual_start_obj) |> + dplyr::filter(site_id %in% target_woody_sites) |> + dplyr::rename(annual_start = prediction) + + ann_end_df <- tidy_downscale(annual_end_obj) |> + dplyr::filter(site_id %in% target_woody_sites) |> + dplyr::rename(annual_end = prediction) + + # Join by site_id and ensemble to align predictions + mix_df <- woody_df |> + dplyr::inner_join(ann_end_df, by = c("site_id", "ensemble")) |> + dplyr::inner_join(ann_start_df, by = c("site_id", "ensemble")) + + if (nrow(mix_df) == 0) { + PEcAn.logger::logger.warn("No overlapping site/ensemble rows for mixed scenario in pool ", pool) + next + } + + f_annual <- 0.5 # TODO: will come from monitoring / scenario data later + mix_df <- mix_df |> + dplyr::mutate( + mixed_pred = combine_value( + woody_value = .data$woody_pred, + annual_value = .data$annual_end, + annual_init = .data$annual_start, + annual_cover = f_annual, + woody_cover = 1.0, + method = "incremental" + ) + ) |> + # add area/county for totals + dplyr::left_join(ca_fields, by = "site_id") |> + dplyr::mutate( + pft = "woody + annual", + model_output = pool, + c_density_Mg_ha = PEcAn.utils::ud_convert(mixed_pred, "kg/m2", "Mg/ha"), + total_c_Mg = c_density_Mg_ha * area_ha + ) |> + dplyr::select(site_id, pft, ensemble, c_density_Mg_ha, total_c_Mg, area_ha, county, model_output) + + mixed_records[[pool]] <- mix_df + } + + # Append mixed records if any + if (length(mixed_records) > 0) { + mixed_df_all <- dplyr::bind_rows(mixed_records, .id = "pool") |> + dplyr::select(-pool) + dp <- dplyr::bind_rows(dp, mixed_df_all) + } +} + ## Write out downscaled predictions readr::write_csv( - downscale_preds, + dp, # downscale predictions with mixed scenario appended (if available) file.path(model_outdir, "downscaled_preds.csv") ) + +# Write training site IDs used for each spec (pft x pool) +if (length(training_sites_records) > 0) { + train_sites_df <- dplyr::bind_rows(training_sites_records) |> + dplyr::distinct() + readr::write_csv(train_sites_df, file.path(model_outdir, "training_sites.csv")) + PEcAn.logger::logger.info("Training site list written to", file.path(model_outdir, "training_sites.csv")) +} +metadata <- list( + title = "Downscaled SIPNET Outputs", + description = "SIPNET model outputs downscaled to field level using Random Forest", + created = Sys.time(), + ensembles = sort(unique(as.integer(ensemble_ids))), + pfts = pfts, + outputs_to_extract = outputs_to_extract, + start_date = as.character(start_date), + end_date = as.character(end_date), + mixed_cover_fraction = 0.5, + columns = list( + site_id = "Unique identifier for each field from LandIQ", + pft = "Plant functional type", + ensemble = "Ensemble member identifier", + c_density_Mg_ha = "Predicted carbon density (Mg/ha)", + total_c_Mg = "Predicted total carbon (Mg) per field", + area_ha = "Field area in hectares", + county = "California county name where the field is located", + model_output = "Type of SIPNET model output (e.g., AGB, TotSoilCarb)" + ) +) + +metadata |> + jsonlite::write_json( + file.path(model_outdir, "downscaled_preds_metadata.json"), + pretty = TRUE, auto_unbox = TRUE + ) + +if (length(delta_output_records) > 0) { + delta_dp <- dplyr::bind_rows(delta_output_records, .id = "spec") |> + tidyr::separate(col = "spec", into = c("pft", "model_output"), sep = "::", remove = TRUE) + readr::write_csv(delta_dp, file.path(model_outdir, "downscaled_deltas.csv")) + PEcAn.logger::logger.info("Delta predictions written to", file.path(model_outdir, "downscaled_deltas.csv")) +} + +PEcAn.logger::logger.info("Downscaled predictions written to", file.path(model_outdir, "downscaled_preds.csv")) +PEcAn.logger::logger.info( + "Total script elapsed time (s):", + round(step_elapsed(overall_timer), 2) +) +log_mem("End of script :: ") From f5a8bf2ed77f40791d849494157ba68cbf93ce3d Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 19:08:23 -0400 Subject: [PATCH 26/70] Add testthat helper for logging configuration and sourcing R functions --- R/match_site_ids_by_location.R | 209 ++++++++++++++++++++++ tests/testthat/helper.R | 13 ++ tests/testthat/test-combine_mixed_crops.R | 4 - tests/testthat/test-match_site_ids.R | 165 +++++++++++++++++ 4 files changed, 387 insertions(+), 4 deletions(-) create mode 100644 R/match_site_ids_by_location.R create mode 100644 tests/testthat/helper.R create mode 100644 tests/testthat/test-match_site_ids.R diff --git a/R/match_site_ids_by_location.R b/R/match_site_ids_by_location.R new file mode 100644 index 0000000..61f3b6a --- /dev/null +++ b/R/match_site_ids_by_location.R @@ -0,0 +1,209 @@ +#' Match site IDs by geographic location +#' +#' Matches site IDs from a target data.frame to the nearest reference site IDs based on latitude/longitude. +#' Always returns a data.frame with one row per target row containing: +#' target_site_id, matched_site_id, coords, distance_m, and proximity class. +#' +#' @param target_df data.frame with at least id/lat/lon columns +#' @param reference_df data.frame with at least id/lat/lon columns +#' @param target_id_col character. ID column in target_df (default "site_id") +#' @param reference_id_col character. ID column in reference_df (default "site_id") +#' @param target_lat_col character. Latitude column in target_df (default "lat") +#' @param target_lon_col character. Longitude column in target_df (default "lon") +#' @param reference_lat_col character. Latitude column in reference_df (default "lat") +#' @param reference_lon_col character. Longitude column in reference_df (default "lon") +#' @param crs character. Coordinate reference system of input points (default "EPSG:4326"). +#' For California analyses, use "EPSG:3310" (NAD83 / California Albers). +#' @param map_all logical. If TRUE, compute nearest distances for all target rows. +#' If FALSE, only compute nearest distances for IDs missing from reference; +#' matched-by-ID rows are returned with distance 0. +#' @return a tibble with mapping and distances (same number of rows as target_df) +match_site_ids_by_location <- function( + target_df, + reference_df, + target_id_col = "site_id", + reference_id_col = "site_id", + target_lat_col = "lat", + target_lon_col = "lon", + reference_lat_col = "lat", + reference_lon_col = "lon", + crs = "EPSG:4326", + map_all = FALSE, + max_distance = 100) { + # Validate columns + req_target <- c(target_id_col, target_lat_col, target_lon_col) + req_ref <- c(reference_id_col, reference_lat_col, reference_lon_col) + + if (!all(req_target %in% colnames(target_df))) { + PEcAn.logger::logger.error( + "target_df is missing required columns: ", + paste(setdiff(req_target, colnames(target_df)), collapse = ", ") + ) + } + if (!all(req_ref %in% colnames(reference_df))) { + PEcAn.logger::logger.error( + "reference_df is missing required columns: ", + paste(setdiff(req_ref, colnames(reference_df)), collapse = ", ") + ) + } + + # Identify matched and mismatched rows by ID + by_id <- stats::setNames(reference_id_col, target_id_col) + target_df <- target_df |> + dplyr::mutate(`..row..` = dplyr::row_number()) + + matched_id <- target_df |> + dplyr::inner_join(reference_df, by = by_id, suffix = c(".t", ".r")) + + mismatched_id <- target_df |> + dplyr::anti_join(reference_df, by = by_id) + + n_needs <- nrow(mismatched_id) + if (n_needs == 0) { + PEcAn.logger::logger.info("All target IDs found in reference by ID.") + } else { + PEcAn.logger::logger.warn( + paste(n_needs, "target sites not in reference by ID; matching by nearest location.") + ) + } + + # Compute nearest for mismatches (always) + mapping_miss <- NULL + if (nrow(mismatched_id) > 0) { + tgt_vect_miss <- terra::vect( + mismatched_id, + geom = c(target_lon_col, target_lat_col), + crs = crs + ) + ref_vect <- terra::vect( + reference_df, + geom = c(reference_lon_col, reference_lat_col), + crs = crs + ) + nearest_site <- terra::nearest(tgt_vect_miss, ref_vect) + idx <- nearest_site$to_id + dist_m <- nearest_site$distance + + mapping_miss <- tibble::tibble( + `..row..` = mismatched_id$`..row..`, + target_site_id = mismatched_id[[target_id_col]], + matched_site_id = reference_df[[reference_id_col]][idx], + target_lat = mismatched_id[[target_lat_col]], + target_lon = mismatched_id[[target_lon_col]], + ref_lat = reference_df[[reference_lat_col]][idx], + ref_lon = reference_df[[reference_lon_col]][idx], + distance_m = dist_m + ) + } + + # Build mapping for matched-by-ID rows + mapping_match <- NULL + if (nrow(matched_id) > 0) { + # Prepare base table + mapping_match <- tibble::tibble( + `..row..` = matched_id$`..row..`, + # join key keeps one column name (no suffix) use it for both + target_site_id = matched_id[[target_id_col]], + matched_site_id = matched_id[[target_id_col]], + target_lat = matched_id[[paste0(target_lat_col, ".t")]], + target_lon = matched_id[[paste0(target_lon_col, ".t")]], + ref_lat = matched_id[[paste0(reference_lat_col, ".r")]], + ref_lon = matched_id[[paste0(reference_lon_col, ".r")]] + ) + + # Distances for matched rows + if (isTRUE(map_all)) { + tgt_pts <- terra::vect(mapping_match, geom = c("target_lon", "target_lat"), crs = crs) + ref_pts <- terra::vect(mapping_match, geom = c("ref_lon", "ref_lat"), crs = crs) + dvec <- vapply(seq_len(nrow(mapping_match)), function(i) as.numeric(terra::distance(tgt_pts[i], ref_pts[i])), numeric(1)) + mapping_match$distance_m <- dvec + } else { + mapping_match$distance_m <- 0 + } + } + + # Combine matched and mismatched, preserve original row order + mapping <- dplyr::bind_rows(mapping_match, mapping_miss) |> + dplyr::arrange(`..row..`) |> + dplyr::mutate( + close = dplyr::case_when( + distance_m <= 10 ~ "same location", + distance_m <= 100 ~ "very close (<=100m)", + distance_m <= 500 ~ "close (100-500m)", + distance_m <= 1000 ~ "moderate (500-1000m)", + distance_m <= 5000 ~ "far (1000-5000m)", + TRUE ~ "far (>5000m)" + ) + ) |> + dplyr::select(-`..row..`) |> + dplyr::distinct() + + # Enforce maximum allowable distance + if (any(mapping$distance_m > max_distance, na.rm = TRUE)) { + n_exceed <- sum(mapping$distance_m > max_distance, na.rm = TRUE) + PEcAn.logger::logger.severe( + n_exceed, " matched target rows exceed max_distance of ", max_distance, " m" + ) + } + + return(mapping) +} + +#' Update site IDs in a target data.frame by nearest reference site +#' +#' If some target IDs don't exist in reference, update them to the nearest reference IDs by location. +#' If all target IDs exist in reference, returns the original target_df unchanged. +#' +#' @param target_df data.frame +#' @param reference_df data.frame +#' @inheritParams match_site_ids_by_location +#' @return data.frame with potentially updated ID column +update_site_ids_by_location <- function( + target_df, + reference_df, + id_col = "site_id", + target_lat_col = "lat", + target_lon_col = "lon", + reference_id_col = "site_id", + reference_lat_col = "lat", + reference_lon_col = "lon", + crs = "EPSG:4326", + max_distance = 100) { + mapping <- match_site_ids_by_location( + target_df = target_df, + reference_df = reference_df, + target_id_col = id_col, + reference_id_col = reference_id_col, + target_lat_col = target_lat_col, + target_lon_col = target_lon_col, + reference_lat_col = reference_lat_col, + reference_lon_col = reference_lon_col, + crs = crs, + map_all = FALSE, + max_distance = max_distance + ) + + # Replace IDs where mapping exists + tdf <- target_df + orig_id_col <- id_col + # unify id column name for join + if (orig_id_col != "site_id") { + names(tdf)[names(tdf) == orig_id_col] <- "site_id" + } + + updated <- tdf |> + dplyr::left_join( + mapping |> + dplyr::select(target_site_id, matched_site_id), + by = c("site_id" = "target_site_id") + ) |> + dplyr::mutate(site_id = dplyr::if_else(is.na(matched_site_id), site_id, matched_site_id)) |> + dplyr::select(-matched_site_id) + + # rename back to original id name if needed + if (orig_id_col != "site_id") { + names(updated)[names(updated) == "site_id"] <- orig_id_col + } + + return(updated) +} diff --git a/tests/testthat/helper.R b/tests/testthat/helper.R new file mode 100644 index 0000000..cf38ef2 --- /dev/null +++ b/tests/testthat/helper.R @@ -0,0 +1,13 @@ +# testthat helper: configure logging and source all R/ functions before tests + +# Silence PEcAn.logger during tests to reduce noise +level <- PEcAn.logger::logger.setLevel("OFF") +withr::defer(PEcAn.logger::logger.setLevel(level)) + +# Source project functions +r_dir <- here::here("R") +r_scripts <- list.files(r_dir, pattern = "\\.R$", full.names = TRUE) +for (f in r_scripts) { + # source quietly; re-definitions are harmless for tests + try(source(f, local = TRUE), silent = TRUE) +} diff --git a/tests/testthat/test-combine_mixed_crops.R b/tests/testthat/test-combine_mixed_crops.R index de9faae..c3635e2 100644 --- a/tests/testthat/test-combine_mixed_crops.R +++ b/tests/testthat/test-combine_mixed_crops.R @@ -1,7 +1,5 @@ library(testthat) -loglevel <- PEcAn.logger::logger.setLevel("OFF") - test_that("weighted mixing basic case", { res <- combine_mixed_crops( woody_value = 100, annual_value = 50, @@ -73,5 +71,3 @@ test_that("length mismatch rejected", { "Can't recycle" ) }) - -PEcAn.logger::logger.setLevel(loglevel) diff --git a/tests/testthat/test-match_site_ids.R b/tests/testthat/test-match_site_ids.R new file mode 100644 index 0000000..be58a51 --- /dev/null +++ b/tests/testthat/test-match_site_ids.R @@ -0,0 +1,165 @@ +context("match_site_ids_by_location") +test_that("match_site_ids_by_location returns full mapping when all IDs exist", { + target <- data.frame( + site_id = c("A", "B"), + lat = c(34.0, 35.0), + lon = c(-118.0, -119.0) + ) + reference <- data.frame( + site_id = c("A", "B"), + lat = c(34.0, 35.0), + lon = c(-118.0, -119.0) + ) + + map <- match_site_ids_by_location(target, reference) + expect_s3_class(map, "data.frame") + expect_equal(nrow(map), nrow(target)) + expect_equal(map$target_site_id, target$site_id) + expect_equal(map$matched_site_id, target$site_id) + expect_true(all(map$distance_m == 0)) + + updated <- update_site_ids_by_location(target, reference) + expect_equal(updated$site_id, target$site_id) +}) + +test_that("update_site_ids_by_location replaces missing IDs by nearest", { + # B is missing from reference; it's closest to Y + target <- data.frame( + site_id = c("A", "B"), + lat = c(34.0000, 35.3000), + lon = c(-118.0000, -119.4000) + ) + reference <- data.frame( + site_id = c("A", "X", "Y"), + lat = c(34.0000, 35.3000, 35.3000), + lon = c(-118.0000, -119.5000, -119.4010) # Y is ~90m from B + ) + + map <- match_site_ids_by_location(target, reference) + expect_s3_class(map, "data.frame") + expect_equal(nrow(map), 2) + # A remains A with zero distance + rowA <- map[map$target_site_id == "A", ] + expect_equal(rowA$matched_site_id, "A") + expect_true(rowA$distance_m == 0) + # B maps to Y with positive distance + rowB <- map[map$target_site_id == "B", ] + expect_equal(rowB$matched_site_id, "Y") + expect_true(is.numeric(rowB$distance_m)) + expect_gt(rowB$distance_m, 0) + + updated <- update_site_ids_by_location(target, reference) + expect_equal(updated$site_id, c("A", "Y")) +}) + +test_that("match_site_ids_by_location map_all=TRUE maps all rows and includes close class", { + target <- data.frame( + site_id = c("A", "B"), + lat = c(34.0, 35.0), + lon = c(-118.0, -119.0) + ) + reference <- data.frame( + site_id = c("A", "B"), + lat = c(34.0, 35.0), + lon = c(-118.0, -119.0) + ) + + map_all <- match_site_ids_by_location(target, reference, map_all = TRUE) + expect_s3_class(map_all, "data.frame") + expect_equal(nrow(map_all), 2) + expect_true(all(c("target_site_id", "matched_site_id", "target_lat", "target_lon", "ref_lat", "ref_lon", "distance_m", "close") %in% names(map_all))) + expect_equal(sort(map_all$target_site_id), c("A", "B")) + expect_equal(map_all$target_site_id, map_all$matched_site_id) + # identical coords should produce zero distance, even when map_all=TRUE + expect_true(all(map_all$distance_m == 0)) +}) + +test_that("update_site_ids_by_location supports custom column names", { + target <- data.frame( + id = c("A", "B"), + latitude = c(34.0000, 35.3000), + longitude = c(-118.0000, -119.4000) + ) + reference <- data.frame( + id = c("A", "X", "Y"), + latitude = c(34.0000, 35.3000, 35.3000), + longitude = c(-118.0000, -119.5000, -119.4010) + ) + + updated <- update_site_ids_by_location( + target_df = target, + reference_df = reference, + id_col = "id", + target_lat_col = "latitude", + target_lon_col = "longitude", + reference_id_col = "id", + reference_lat_col = "latitude", + reference_lon_col = "longitude" + ) + expect_equal(updated$id, c("A", "Y")) +}) + +test_that("match_site_ids_by_location errors when exceeding max_distance", { + # Construct a far-away reference so distance >> 100 m + target <- data.frame( + site_id = c("A"), + lat = c(34.0000), + lon = c(-118.0000) + ) + reference <- data.frame( + site_id = c("Z"), + lat = c(35.0000), # ~111 km north + lon = c(-118.0000) + ) + + expect_error( + match_site_ids_by_location(target, reference, map_all = TRUE, max_distance = 100), + regexp = "exceed max_distance|exceed max_distance of" + ) +}) + +test_that("close classification reflects ~90m as 'very close (<=100m)'", { + testthat::skip_if_not_installed("terra") + + target <- data.frame( + site_id = c("A", "B"), + lat = c(34.0000, 35.3000), + lon = c(-118.0000, -119.4000) + ) + reference <- data.frame( + site_id = c("A", "X", "Y"), + lat = c(34.0000, 35.3000, 35.3000), + lon = c(-118.0000, -119.5000, -119.4010) + ) + + map <- match_site_ids_by_location(target, reference) + rowB <- map[map$target_site_id == "B", ] + expect_true(rowB$distance_m > 0) + expect_equal(rowB$close, "very close (<=100m)") +}) + +test_that("map_all computes distances for ID-matched rows", { + testthat::skip_if_not_installed("terra") + + target <- data.frame( + site_id = c("A"), + lat = c(34.0000), + lon = c(-118.0000) + ) + reference <- data.frame( + site_id = c("A"), + lat = c(34.0000), + lon = c(-118.0009) # ~82 m west at this latitude + ) + + map_default <- match_site_ids_by_location(target, reference, map_all = FALSE) + expect_equal(nrow(map_default), 1) + expect_equal(map_default$matched_site_id, "A") + expect_true(map_default$distance_m == 0) + + map_all <- match_site_ids_by_location(target, reference, map_all = TRUE) + expect_equal(nrow(map_all), 1) + expect_equal(map_all$matched_site_id, "A") + expect_true(map_all$distance_m > 0) + expect_equal(map_all$close, "very close (<=100m)") +}) From 7e4c5e6eccc62615d1c003eff69cefefeba41fe1 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 19 Sep 2025 19:11:02 -0400 Subject: [PATCH 27/70] Add references section and update workflow documentation title and format --- docs/references.md | 34 ++++++++++++++++++++++++++++++++++ docs/workflow_documentation.md | 6 ++---- 2 files changed, 36 insertions(+), 4 deletions(-) create mode 100644 docs/references.md diff --git a/docs/references.md b/docs/references.md new file mode 100644 index 0000000..4f37ed2 --- /dev/null +++ b/docs/references.md @@ -0,0 +1,34 @@ +# References + +**EFI Standards** + +Dietze, Michael C., R. Quinn Thomas, Jody Peters, Carl Boettiger, Gerbrand Koren, Alexey N. Shiklomanov, and Jaime Ashander. 2023. “A Community Convention for Ecological Forecasting: Output Files and Metadata Version 1.0.” Ecosphere 14 (11): e4686. https://doi.org/10.1002/ecs2.4686. + +**CADWR LandIQ Crop Map** + +California Department of Water Resources. (2018). Statewide Crop Mapping—California Natural Resources Agency Open Data. Retrieved “Oct 14, 2024” from https://data.cnra.ca.gov/dataset/statewide-crop-mapping. + +**CalAdapt** + +Lyons, Andrew and R Development Core Team. 2025. “Caladaptr: Tools for the Cal-Adapt API in R.” Manual. https://ucanr-igis.github.io/caladaptr. + +**SoilGrids250m** + +Hengl, T. et al. 2017. “SoilGrids250m: Global Gridded Soil Information Based on Machine Learning.” PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748 + +**ERA5 Climate Data** + +Hersbach, H. et al. 2020. “The ERA5 Global Reanalysis.” Quarterly Journal of the Royal Meteorological Society 146: 1999–2049. https://doi.org/10.1002/qj.3803 + +**CalAdapt Climate Zones** +CalAdapt. 2024. “Climate Zones.” Accessed October 14, 2024. https://cal-adapt.org/tools/climate-zones/. + +**SIPNET** + +Braswell, Bobby H., William J. Sacks, Ernst Linder, and David S. Schimel. 2005. “Estimating Diurnal to Annual Ecosystem Parameters by Synthesis of a Carbon Flux Model with Eddy Covariance Net Ecosystem Exchange Observations.” Global Change Biology 11 (2): 335–55. https://doi.org/10.1111/j.1365-2486.2005.00897.x. + +Sacks, William J., David S. Schimel, Russell K. Monson, and Bobby H. Braswell. 2006. “Model‐data Synthesis of Diurnal and Seasonal CO2 Fluxes at Niwot Ridge, Colorado.” Global Change Biology 12 (2): 240–59. https://doi.org/10.1111/j.1365-2486.2005.01059.x. + +**Random Forest** + +Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 0be98d0..9170937 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -1,7 +1,5 @@ --- -title: "Downscaling Workflow Documentation" -author: "David LeBauer" -date: "`r Sys.Date()`" +title: "Downscaling Workflow Technical Documentation" format: html: self-contained: true @@ -11,7 +9,7 @@ execute: echo: false --- -# Workflow Overview +# Technical Documentation {#sec-workflow-overview} The downscaling workflow predicts carbon pools (Soil Organic Carbon and Aboveground Biomass) for cropland fields in California and then aggregates these predictions to the county scale. From 2fd85c9456bf755c73e17f4bfcffcf3f18f5a3d8 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Tue, 23 Sep 2025 00:36:48 -0400 Subject: [PATCH 28/70] - Separate out `variable_importance.qmd` and `deisng_points_analysis.qmd` from `downscaling_report.qmd` - Updated `040_downscale.R` to improve naming conventions and ensure training covariate matrices are saved correctly. - Refactor `041_aggregate_to_county.R` to streamline the aggregation process and improve readability. - Enhance `042_downscale_analysis.R` to load predictions and covariates, compute variable importance, and generate plots for partial dependence, ALE, and ICE effects using the `iml` package. - Improved logging. --- 000-config.R | 7 +- _quarto.yml | 11 +- docs/references.md | 2 +- docs/workflow_documentation.md | 111 +++++---- reports/design_points_analysis.qmd | 57 +++++ reports/downscaling_results.qmd | 373 ++++++++++++++++++----------- reports/variable_importance.qmd | 115 +++++++++ scripts/040_downscale.R | 14 +- scripts/041_aggregate_to_county.R | 98 +++----- scripts/042_downscale_analysis.R | 305 ++++++++++++----------- 10 files changed, 686 insertions(+), 407 deletions(-) create mode 100644 reports/design_points_analysis.qmd create mode 100644 reports/variable_importance.qmd diff --git a/000-config.R b/000-config.R index 6630885..3eeb874 100644 --- a/000-config.R +++ b/000-config.R @@ -73,7 +73,9 @@ msg <- glue::glue( "- will extract variables: {paste(outputs_to_extract, collapse = ', ')}\n", "- ca_albers_crs : {ca_albers_crs}{if(ca_albers_crs == 3310) ', which is NAD83 / California Albers' else ''}\n" ) -PEcAn.logger::logger.info(msg, wrap = FALSE) +if (!isTRUE(getOption("ccmmf.quiet_banner", FALSE))) { + PEcAn.logger::logger.info(msg, wrap = FALSE) +} ## Global configuration settings for logging @@ -87,6 +89,3 @@ options( ## Source all R scripts in the R/ directory r_scripts <- list.files(file.path(here::here(), "R"), pattern = "\\.R$", full.names = TRUE) lapply(r_scripts, source) - -## Check that all tests pass before proceeding -testthat::test_dir(here::here("tests/testthat")) diff --git a/_quarto.yml b/_quarto.yml index 984f189..bf00e34 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -3,8 +3,7 @@ project: output-dir: _site render: # Explicit pages to render - - index.qmd - - README.md + - index.qmd # renders README - reports/downscaling_results.qmd - reports/design_points_analysis.qmd - reports/variable_importance.qmd @@ -27,18 +26,14 @@ website: text: Design Points Analysis - href: reports/variable_importance.qmd text: Variable Importance - - href: docs/references.md - text: References - text: Docs menu: - - href: README.md - text: README - href: docs/workflow_documentation.md text: Workflow Documentation - - text: Examples - menu: - href: docs/mixed_system_prototype.qmd text: Mixed System Prototype + - href: docs/references.md + text: References format: html: diff --git a/docs/references.md b/docs/references.md index 4f37ed2..4655129 100644 --- a/docs/references.md +++ b/docs/references.md @@ -31,4 +31,4 @@ Sacks, William J., David S. Schimel, Russell K. Monson, and Bobby H. Braswell. 2 **Random Forest** -Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. +Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. \ No newline at end of file diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 9170937..2ca1702 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -29,9 +29,25 @@ It uses an ensemble-based approach to uncertainty propagation and analysis, main The workflows are -1. **Site Selection**: uses environmental variables (later also management layers) to create clusters and then select representative sites. The design_points.csv are then passed to the ensemble workflow -2. Ensemble in ccmmf/workflows repository, generates ensemble outputs -3. **Downscaling**: uses ensemble outputs to make predictions for each field in CA then aggregate to county level summaries. +1. **Site Selection**: uses environmental variables (in the future we will add agronomic practice changes) to create clusters and then select representative sites. The `design_points.csv` are then passed to the ensemble workflow. +2. **Ensemble Runs**: generate ensemble outputs. These are done in the [ccmmf/workflows](https://github.com/ccmmf/workflows) repository. +3. **Downscaling**: uses ensemble outputs to make predictions for each field in CA, then aggregates to county level summaries and produces figures. + +Overview: + +```r +# Site Selection +Rscript scripts/009_update_landiq.R --production true +Rscript scripts/010_prepare_covariates.R --production true +Rscript scripts/011_prepare_anchor_sites.R --production true +Rscript scripts/020_cluster_and_select_design_points.R --production true +# Downscaling and Aggregation +Rscript scripts/030_extract_sipnet_output.R --production true +Rscript scripts/040_downscale.R --production true +Rscript scripts/041_aggregate_to_county.R --production true +Rscript scripts/042_downscale_analysis.R --production true +Rscript scripts/043_county_level_plots.R --production true +``` ## Workflow Steps @@ -68,11 +84,13 @@ git clone git@github.com:ccmmf/downscaling - sets repositories from which R packages are installed - runs `renv/activate.R` - `000-config.R` - - set `pecan_outdir` based on the CCMMF_DIR. - - confirm that relative paths (`data_raw`, `data`, `cache`) are correct. - - detect and use resources for parallel processing (with future package); default is `available cores - 1` - - PRODUCTION mode setting. For testing, set `PRODUCTION` to `FALSE`. This is _much_ faster and requires fewer computing resources because it subsets large datasets. Once a test run is successful, set `PRODUCTION` to `TRUE` to run the full workflow. + - Parses a `--production` flag via `argparse` to toggle development vs production mode. In non-interactive runs, default is development (`--production FALSE`); interactive sessions override to production. Example: `Rscript scripts/040_downscale.R --production true`. + - Sets directories for inputs and outputs used by the pipelines. + - Sets parallelization via `future::plan(multicore, workers = availableCores() - 1)`. + - Controls variables to extract via `outputs_to_extract` default: `c("TotSoilCarb", "AGB")`. + - Auto-sources all helper functions in `R/` and runs unit tests under `tests/testthat` at startup. + ### 1. Data Preparation @@ -214,63 +232,72 @@ Rscript scripts/030_extract_sipnet_output.R Extracts and formats SIPNET outputs for downscaling: -- Extract output variables (AGB, TotSoilCarb) from SIPNET simulations -- Aggregate site-level ensemble outputs into long and 4D array formats -- Save CSV and NetCDF files following EFI standards +- Extract output variables (AGB, TotSoilCarb) from SIPNET simulations (configurable via `outputs_to_extract`). +- Aggregate site-level ensemble outputs into a long EFI-style format: `datetime, site_id, lat, lon, pft, parameter, variable, prediction`. **Inputs:** - `out/ENS--/YYYY.nc` **Outputs:** -- `out/ensemble_output.csv`: Long format data +- `out/ensemble_output.csv`: Long format data for selected variables -### 5. Downscale and Aggregate SIPNET Output +### 5. Mixed Cropping Systems ```sh -Rscript scripts/040_downscale_and_aggregate.R -Rscript scripts/041_downscale_analysis.R +Rscript scripts/031_aggregate_sipnet_output.R ``` -Builds Random Forest models to predict carbon pools for all fields: - -- Train models on SIPNET ensemble runs at design points -- Use environmental covariates to downscale predictions to all fields -- Aggregate to county-level estimates -- Output maps and statistics of carbon density and totals - -**Inputs:** -- `out/ensemble_output.csv`: SIPNET outputs -- `data/site_covariates.csv`: Environmental covariates +Simulates mixed-cropping scenarios by combining outputs across two PFTs using the `combine_mixed_crops()` function. This is described in more detail in [Mixed System Prototype](../docs/mixed_system_prototype.qmd). -**Outputs:** -- `out/county_total_AGB.png`: County-level AGB predictions -- `out/county_total_TotSoilCarb.png`: County-level SOC predictions -- `out/county_summaries.csv`: County statistics +There are two methods for combining outputs: +- `weighted`: area-partitioned mix where `woody_cover + annual_cover = 1` +- `incremental`: preserve woody baseline (`woody_cover = 1`) and add the annual delta scaled by `annual_cover`. -## Technical Reference +The current analysis uses the weighted method to represent ground cover in orchards and vineyards. -### Ensemble Structure +Outputs include `multi_pft_ensemble_output.csv`, `combined_ensemble_output.csv`, and `ensemble_output_with_mixed.csv` with a synthetic mixed PFT. -Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. +### 6. Downscale, Aggregate to County, and Plot +```sh +Rscript scripts/040_downscale.R --production true +Rscript scripts/041_aggregate_to_county.R --production true +Rscript scripts/042_downscale_analysis.R --production true +Rscript scripts/043_county_level_plots.R --production true +``` +Builds Random Forest models to predict carbon pools for all fields; aggregates to county-level; summarizes variable importance; and produces maps: -# References +- Train models on SIPNET ensemble runs at design points +- Use environmental covariates to downscale predictions to all fields +- Aggregate to county-level estimates +- Output maps and statistics of carbon density and stocls. -**EFI Standards** +**Inputs:** +- `out/ensemble_output.csv`: SIPNET outputs +- `data/site_covariates.csv`: Environmental covariates -Dietze, Michael C., R. Quinn Thomas, Jody Peters, Carl Boettiger, Gerbrand Koren, Alexey N. Shiklomanov, and Jaime Ashander. 2023. "A Community Convention for Ecological Forecasting: Output Files and Metadata Version 1.0." Ecosphere 14 (11): e4686. https://doi.org/10.1002/ecs2.4686. +Outputs from `040_downscale.R`: +- `out/downscaled_preds.csv`: Per-field predictions with ensemble, area, and county +- `out/downscaled_preds_metadata.json`: Metadata documenting ensembles, PFTs, variables, and date range +- `out/training_sites.csv`: Training site IDs per PFT × pool +- `out/vi___by_ensemble.csv`: Variable importance per ensemble +- `out/downscaled_deltas.csv`: Start --> end deltas used for delta maps -**Data Sources** +Outputs from `041_aggregate_to_county.R`: +- `out/county_summaries.csv`: County statistics (means/SDs across ensembles for stocks and densities) -Land IQ, LLC. California Crop Mapping (2014). California Department of Water Resources, 2017. https://data.cnra.ca.gov/dataset/statewide-crop-mapping. +Outputs from `043_county_level_plots.R` (saved in `figures/`): +- County-level maps of carbon stock and density by PFT and pool. +- Difference maps (mixed − woody) and delta maps (start→end). -Hengl, T. et al. 2017. "SoilGrids250m: Global Gridded Soil Information Based on Machine Learning." PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748 +## Technical Notes -Hersbach, H. et al. 2020. "The ERA5 Global Reanalysis." Quarterly Journal of the Royal Meteorological Society 146: 1999–2049. https://doi.org/10.1002/qj.3803 +**Ensemble Structure** -**Models** +Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. -Braswell, Bobby H., William J. Sacks, Ernst Linder, and David S. Schimel. 2005. "Estimating Diurnal to Annual Ecosystem Parameters by Synthesis of a Carbon Flux Model with Eddy Covariance Net Ecosystem Exchange Observations." Global Change Biology 11 (2): 335–55. https://doi.org/10.1111/j.1365-2486.2005.00897.x. +**Implementation notes:** -Liaw, Andy, and Matthew Wiener. 2002. "Classification and Regression by randomForest." R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. +- EFI-standard column `parameter` (ensemble ID) is renamed to `ensemble` in some steps for clarity. +- Pool variables are in `kg/m²` and are converted to `Mg/ha` for densities via `PEcAn.utils::ud_convert()`; field-level stocks are computed as `density × area`. diff --git a/reports/design_points_analysis.qmd b/reports/design_points_analysis.qmd new file mode 100644 index 0000000..77e0b15 --- /dev/null +++ b/reports/design_points_analysis.qmd @@ -0,0 +1,57 @@ +--- +title: "Design Points Analysis" +format: + html: + self-contained: true + embed-resources: true + df-print: paged + toc: true +--- + +# Design Point Distribution + +Design points are representative locations selected to capture the range of environmental conditions across California croplands from CADWR (2018). +These are the locations where SIPNET is run to produce outputs later used in downscaling to all ~600k crop fields. + +We used clustering to select design points that represent the range of environmental conditions across California croplands. +The following figures illustrate the distribution of these design points, which were clustered based on environmental covariates. + +The environmental covariates used for clustering are: + +| Variable | Description | Source | Units | +|----------|-------------|--------|-------| +| temp | Mean annual temperature | ERA5 | °C | +| precip | Mean annual precipitation | ERA5 | mm/year | +| srad | Solar radiation | ERA5 | W/m² | +| vapr | Vapor pressure deficit | ERA5 | kPa | +| clay | Clay content | SoilGrids | % | +| ocd | Organic carbon density | SoilGrids | g/kg | +| twi | Topographic wetness index | SRTM-derived | - | + +## Map of Selected Design Points + +Here we check the geographic distribution of design points relative to the distribution of all croplands. +Grey areas are the fields in the CADWR (2018) land use dataset. +Boundaries are Climate Zones from CalAdapt (Lyons, 2025). + +The design points should be well distributed across California croplands and Climate Zones. + +![Clustered Design Points](../figures/design_points.webp) + +## Relationships Among Environmental Covariates + +This pairs plot shows the relationships between covariates used for clustering, with colors indicating cluster membership. + +The clusters should show distinct groupings based on the environmental covariates. + +![Clustered Pairs Plot](../figures/cluster_pairs.webp) + +## Environmental Characteristics of each Cluster + +These plots present This the normalized mean values of environmental covariates for each cluster. + +This summary highlights the characteristics of each cluster based on the environmental covariates used for clustering. + +Here we expect to see clusters with distinct environmental signatures, reflecting the unique multivariate profiles of each cluster. + +![Cluster Summary](../figures/cluster_plot.svg) diff --git a/reports/downscaling_results.qmd b/reports/downscaling_results.qmd index f3d5031..3afb0fb 100644 --- a/reports/downscaling_results.qmd +++ b/reports/downscaling_results.qmd @@ -1,12 +1,11 @@ --- title: "Downscaling Results and Analysis" author: "David LeBauer" -date: "`r Sys.Date()`" +date: today format: html: - self-contained: true - embed-resources: true - df-print: paged + self-contained: false + df-print: default toc: true execute: echo: false @@ -16,185 +15,285 @@ execute: # Overview -This document presents the results of downscaling workflow: soil carbon stocks and aboveground biomass in California cropland fields, aggregated to the county level. +Here we present results of aggregating from field scale pool sizes to county-level estimates of county-level carbon stocks and densities for soil carbon stocks to 30cm and aboveground biomass. -The Results section presents the county-level carbon stocks and densities for soil carbon and aboveground biomass. The Analysis section explores results from the downscaling and aggregation analysis steps. +**Not validated, for illustrative purposes only.** -For detailed information about methods and workflow, please see the Workflow Documentation in [`docs/workflow_documentation.md`](https://github.com/ccmmf/downscaling/blob/update_site_selection/docs/workflow_documentation.md). +For detailed information about methods and workflow, see the [Workflow Documentation](../docs/workflow_documentation.md). -# Results - -## County-Level Carbon Stocks and Densities - -The following maps illustrate the spatial variation and uncertainty (mean and standard deviation) of the predicted carbon pools at the county level. - -### Soil Carbon (TotSoilCarb) - -#### Carbon Stock by County - -![](../figures/county_TotSoilCarb_carbon_stock.png) - -#### Carbon Density by County - -![](../figures/county_TotSoilCarb_carbon_density.png) - -### Aboveground Biomass (AGB) - -#### Carbon Stock by County - -![](../figures/county_AGB_carbon_stock.png) - -#### Carbon Density by County - -![](../figures/county_AGB_carbon_density.png) - - -### Table of Carbon Stocks and Density by County - -The table below provides a searchable summary of the county-level carbon stocks and densities. - -```{r} -source(here::here("000-config.R")) - -# Load county summaries data -county_summaries <- readr::read_csv(file.path(model_outdir, "county_summaries.csv"), - show_col_types = FALSE) -#colnames(county_summaries) -# Combine mean and SD into a single column for carbon density -county_summaries_table <- county_summaries |> - dplyr::mutate( - `Mean Total C (Tg/county)` = paste0( - signif(mean_total_c_Tg, 2), - " (", signif(sd_total_c_Tg, 2), ")" - ), - `Mean C Density (Mg/ha)` = paste0( - signif(mean_c_density_Mg_ha, 2), - " (", signif(sd_c_density_Mg_ha, 2), ")" - ) - ) |> - dplyr::rename( - `Carbon Pool` = model_output, - `County` = county, - `# Fields` = n - ) |> - dplyr::select(`Carbon Pool`, `County`, `# Fields`, `Mean Total C (Tg/county)`, `Mean C Density (Mg/ha)`) - -# Create Table -# TODO -# - Fix point w/ missing county - -htmlwidgets::setWidgetIdSeed(123) # required to embed table self-contained in html -options(htmlwidgets.TEMP_DIR = "htmlwidgets") +Estimates for woody and mixed woody + annual systems are done for all fields in the LandIQ dataset with tree and vinyard crops; annual systems are done for all fields with herbaceous annual crops. -DT::datatable( - county_summaries_table, - options = list( - pageLength = 10, - searchHighlight = TRUE - ), - rownames = FALSE, - escape = FALSE -) -``` +In the mixed scenario, ground cover is planted in all orchards. The methods for simulating the addition of ground cover to orchards, and is described in ![](../docs/mixed_system_prototype.qmd). Briefly, it combines downscaled model outputs from the woody perennial crop and annual crop PFTs to represent the effect of ground cover on carbon pools. -# Analysis - -## Design Point Distribution +# Results -Design points are representative locations selected to capture the range of environmental conditions across California croplands from CADWR (2018). -These are the locations where SIPNET is run to produce outputs later used in downscaling to all ~600k crop fields. +## County-Level Carbon Stocks and Densities -We used clustering to select design points that represent the range of environmental conditions across California croplands. -The following figures illustrate the distribution of these design points, which were clustered based on environmental covariates. +The following maps illustrate the predicted carbon pools at the county level, for each PFT. We show both total county carbon (Tg) and area‑normalized density (Mg/ha). -The environmental covariates used for clustering are: +### Soil Carbon (TotSoilCarb) by PFT +#### Annual Crop -| Variable | Description | Source | Units | -|----------|-------------|--------|-------| -| temp | Mean annual temperature | ERA5 | °C | -| precip | Mean annual precipitation | ERA5 | mm/year | -| srad | Solar radiation | ERA5 | W/m² | -| vapr | Vapor pressure deficit | ERA5 | kPa | -| clay | Clay content | SoilGrids | % | -| ocd | Organic carbon density | SoilGrids | g/kg | -| twi | Topographic wetness index | SRTM-derived | - | +- Stock: ![](../figures/county_annual_crop_TotSoilCarb_carbon_stock.webp) +- Density: ![](../figures/county_annual_crop_TotSoilCarb_carbon_density.webp) +#### Woody Perennial Crop (Orchards & Vineyards) -### Map of Selected Design Points +This is the scenario where there is no ground cover in orchards and vineyards. -Here we check the geographic distribution of design points relative to the distribution of all croplands. -Grey areas are the fields in the CADWR (2018) land use dataset. -Boundaries are Climate Zones from CalAdapt (Lyons, 2025). +- Stock: ![](../figures/county_woody_perennial_crop_TotSoilCarb_carbon_stock.webp) +- Density: ![](../figures/county_woody_perennial_crop_TotSoilCarb_carbon_density.webp) -The design points should be well distributed across California croplands and Climate Zones. + -This pairs plot shows the relationships between covariates used for clustering, with colors indicating cluster membership. +### Difference: Woody + Annual minus Woody -The clusters should show distinct groupings based on the environmental covariates. +These maps show the impact of adding 50% annual ground cover in orchards. Positive values indicate an increase relative to woody-only. -![Clustered Pairs Plot](../figures/cluster_pairs.png) +- Density (Mg/ha): ![](../figures/county_diff_woody_plus_annual_minus_woody_TotSoilCarb_carbon_density.webp) +- Stock (Tg): ![](../figures/county_diff_woody_plus_annual_minus_woody_TotSoilCarb_carbon_stock.webp) -### Environmental Characteristics of each Cluster +### Aboveground Biomass (AGB) by PFT -These plots present This the normalized mean values of environmental covariates for each cluster. +#### Annual Crop -This summary highlights the characteristics of each cluster based on the environmental covariates used for clustering. +- Stock: ![](../figures/county_annual_crop_AGB_carbon_stock.webp) +- Density: ![](../figures/county_annual_crop_AGB_carbon_density.webp) -Here we expect to see clusters with distinct environmental signatures, reflecting the unique multivariate profiles of each cluster. +#### Woody Perennial Crop -![Cluster Summary](../figures/cluster_plot.png) +- Stock: ![](../figures/county_woody_perennial_crop_AGB_carbon_stock.webp) +- Density: ![](../figures/county_woody_perennial_crop_AGB_carbon_density.webp) -## Variable Importance and Partial Dependence + +### Difference: Woody + Annual minus Woody (AGB) -Variable importance quantifies how useful each covariate is in predicting the carbon stock. Partial dependence plots show the marginal effect of individual predictors on model response after averaging over the other predictors. +- Density (Mg/ha): ![](../figures/county_diff_woody_plus_annual_minus_woody_AGB_carbon_density.webp) +- Stock (Tg): ![](../figures/county_diff_woody_plus_annual_minus_woody_AGB_carbon_stock.webp) -### Total Soil Carbon -![](../figures/TotSoilCarb_importance_partial_plots.png) +### Tables: County Stocks and Density (by PFT) and PFT Comparison -### Aboveground Biomass (AGB) Carbon +The first table summarizes county-level metrics for each PFT. The second table pivots TotSoilCarb density by PFT to simplify cross-PFT comparisons. -![](../figures/AGB_importance_partial_plots.png) +```{r, include=FALSE} +options(ccmmf.quiet_banner = TRUE) +source(here::here("000-config.R")) +``` +```{r, message=FALSE, warning=FALSE} -# References +# Load county summaries data +county_summaries <- readr::read_csv( + file.path(model_outdir, "county_summaries.csv"), + show_col_types = FALSE +) -**EFI Standards** +# Long table with PFT + area +county_summaries_table <- county_summaries |> + dplyr::mutate( + `Mean Total C (Tg/county)` = paste0(signif(mean_total_c_Tg, 2), " (", signif(sd_total_c_Tg, 2), ")"), + `Mean C Density (Mg/ha)` = paste0(signif(mean_c_density_Mg_ha, 2), " (", signif(sd_c_density_Mg_ha, 2), ")"), + `Area (ha)` = signif(mean_total_ha, 3) + ) |> + dplyr::rename( + `Carbon Pool` = model_output, + `PFT` = pft, + `County` = county, + `# Fields` = n + ) |> + dplyr::select(`Carbon Pool`, `PFT`, `County`, `# Fields`, `Area (ha)`, `Mean Total C (Tg/county)`, `Mean C Density (Mg/ha)`) -Dietze, Michael C., R. Quinn Thomas, Jody Peters, Carl Boettiger, Gerbrand Koren, Alexey N. Shiklomanov, and Jaime Ashander. 2023. “A Community Convention for Ecological Forecasting: Output Files and Metadata Version 1.0.” Ecosphere 14 (11): e4686. https://doi.org/10.1002/ecs2.4686. +htmlwidgets::setWidgetIdSeed(123) -**CADWR LandIQ Crop Map** +DT::datatable( + county_summaries_table, + extensions = c('Scroller','SearchPanes','Select'), + options = list( + dom = 'Plfrtip', + pageLength = 10, + searchHighlight = TRUE, + deferRender = TRUE, + scroller = TRUE, + scrollY = "60vh", + searchPanes = list(cascadePanes = TRUE, initCollapsed = TRUE), + columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1,2))) + ), + class = "stripe hover compact", + rownames = FALSE, escape = FALSE +) -California Department of Water Resources. (2018). Statewide Crop Mapping—California Natural Resources Agency Open Data. Retrieved “Oct 14, 2024” from https://data.cnra.ca.gov/dataset/statewide-crop-mapping. +# Wide comparison: TotSoilCarb density by PFT +density_wide <- county_summaries |> + dplyr::select(County = county, `Carbon Pool` = model_output, PFT = pft, `C Density (Mg/ha)` = mean_c_density_Mg_ha) |> + tidyr::pivot_wider(names_from = PFT, values_from = `C Density (Mg/ha)`) |> + dplyr::arrange(`Carbon Pool`, County) + +# Drop rows where the row-wise max value is < MASK_THRESHOLD * colsum of the column where that max occurs, +# evaluated within each Carbon Pool separately +if (exists("MASK_THRESHOLD")) { + mask_group <- function(df) { + vals <- as.matrix(dplyr::select(df, -County, -`Carbon Pool`)) + vals2 <- vals; vals2[is.na(vals2)] <- -Inf + col_sums <- colSums(vals, na.rm = TRUE) + if (all(!is.finite(col_sums)) || sum(col_sums, na.rm = TRUE) <= 0) return(df) + row_max_val <- apply(vals2, 1, max) + row_max_col <- max.col(vals2, ties.method = "first") + thresholds <- col_sums[row_max_col] * get0("MASK_THRESHOLD", ifnotfound = 0.01) + keep <- is.finite(row_max_val) & row_max_val >= thresholds + df[keep, , drop = FALSE] + } + density_wide <- density_wide |> + dplyr::group_split(`Carbon Pool`, .keep = TRUE) |> + purrr::map(mask_group) |> + dplyr::bind_rows() +} -**CalAdapt** +DT::datatable( + density_wide, + extensions = c('Scroller','SearchPanes','Select'), + options = list( + dom = 'Plfrtip', + pageLength = 10, + searchHighlight = TRUE, + deferRender = TRUE, + scroller = TRUE, + scrollY = "60vh", + searchPanes = list(cascadePanes = TRUE, initCollapsed = TRUE), + columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1))) + ), + class = "stripe hover compact", + rownames = FALSE, escape = FALSE, + filter = 'top' +) +``` -Lyons, Andrew and R Development Core Team. 2025. “Caladaptr: Tools for the Cal-Adapt API in R.” Manual. https://ucanr-igis.github.io/caladaptr. +## Start-to-End Delta by PFT -**SoilGrids250m** +The following maps show county-level changes (start-to-end) in carbon density and stock for each PFT and pool. Positive values indicate an increase over the modeled period. -Hengl, T. et al. 2017. “SoilGrids250m: Global Gridded Soil Information Based on Machine Learning.” PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748 +### Annual Crop -**ERA5 Climate Data** +- TotSoilCarb: density: ![](../figures/county_delta_annual_crop_TotSoilCarb_carbon_density.webp), stock: ![](../figures/county_delta_annual_crop_TotSoilCarb_carbon_stock.webp) +- AGB: density: ![](../figures/county_delta_annual_crop_AGB_carbon_density.webp), stock: ![](../figures/county_delta_annual_crop_AGB_carbon_stock.webp) -Hersbach, H. et al. 2020. “The ERA5 Global Reanalysis.” Quarterly Journal of the Royal Meteorological Society 146: 1999–2049. https://doi.org/10.1002/qj.3803 +### Woody Perennial Crop -**CalAdapt Climate Zones** -CalAdapt. 2024. “Climate Zones.” Accessed October 14, 2024. https://cal-adapt.org/tools/climate-zones/. +- TotSoilCarb: density: ![](../figures/county_delta_woody_perennial_crop_TotSoilCarb_carbon_density.webp), stock: ![](../figures/county_delta_woody_perennial_crop_TotSoilCarb_carbon_stock.webp) +- AGB: density: ![](../figures/county_delta_woody_perennial_crop_AGB_carbon_density.webp), stock: ![](../figures/county_delta_woody_perennial_crop_AGB_carbon_stock.webp) -**SIPNET** +### Per-county Mixed-Woody Differences (Tables) -Braswell, Bobby H., William J. Sacks, Ernst Linder, and David S. Schimel. 2005. “Estimating Diurnal to Annual Ecosystem Parameters by Synthesis of a Carbon Flux Model with Eddy Covariance Net Ecosystem Exchange Observations.” Global Change Biology 11 (2): 335–55. https://doi.org/10.1111/j.1365-2486.2005.00897.x. +```{r} +# Compute Mixed – Woody differences from matched fields for both pools +dp <- vroom::vroom( + file.path(model_outdir, "downscaled_preds.csv"), + col_types = readr::cols( + site_id = readr::col_character(), + pft = readr::col_character(), + ensemble = readr::col_double(), + c_density_Mg_ha = readr::col_double(), + total_c_Mg = readr::col_double(), + area_ha = readr::col_double(), + county = readr::col_character(), + model_output = readr::col_character() + ) +) +mix <- dp |> dplyr::filter(pft == "woody + annual") |> + dplyr::select(site_id, ensemble, model_output, county, total_c_Mg_mix = total_c_Mg) +wood <- dp |> dplyr::filter(pft == "woody perennial crop") |> + dplyr::select(site_id, ensemble, model_output, county, total_c_Mg_woody = total_c_Mg) +diff_tbl <- mix |> + dplyr::inner_join(wood, by = c("site_id","ensemble","model_output","county")) |> + dplyr::mutate(diff_total_Mg = total_c_Mg_mix - total_c_Mg_woody) |> + dplyr::group_by(model_output, county, ensemble) |> + dplyr::summarise(diff_total_Mg = sum(diff_total_Mg), .groups = "drop") |> + dplyr::group_by(model_output, county) |> + dplyr::summarise(delta_Tg = PEcAn.utils::ud_convert(mean(diff_total_Mg), "Mg", "Tg"), .groups = "drop") |> + dplyr::mutate(`Delta Total C (Tg)` = ifelse(abs(delta_Tg) < 0.005, 0, round(delta_Tg, 3))) |> + dplyr::select(model_output, county, `Delta Total C (Tg)`) |> + dplyr::arrange(model_output, county) -Sacks, William J., David S. Schimel, Russell K. Monson, and Bobby H. Braswell. 2006. “Model‐data Synthesis of Diurnal and Seasonal CO2 Fluxes at Niwot Ridge, Colorado.” Global Change Biology 12 (2): 240–59. https://doi.org/10.1111/j.1365-2486.2005.01059.x. +DT::datatable( + diff_tbl, + extensions = c('Scroller','SearchPanes','Select'), + options = list( + dom = 'Plfrtip', + pageLength = 10, + searchHighlight = TRUE, + deferRender = TRUE, + scroller = TRUE, + scrollY = "60vh", + searchPanes = list(cascadePanes = TRUE, initCollapsed = TRUE), + columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1))) + ), + class = "stripe hover compact", + rownames = FALSE, escape = FALSE, + filter = 'top' +) +``` -**Random Forest** +### Per-county Start-to-End Deltas (Tables) -Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. \ No newline at end of file +```{r} +delta_csv <- file.path(model_outdir, "downscaled_deltas.csv") +if (file.exists(delta_csv)) { + deltas <- vroom::vroom( + delta_csv, + col_types = readr::cols( + site_id = readr::col_character(), + pft = readr::col_character(), + ensemble = readr::col_double(), + delta_c_density_Mg_ha = readr::col_double(), + delta_total_c_Mg = readr::col_double(), + area_ha = readr::col_double(), + county = readr::col_character(), + model_output = readr::col_character() + ) + ) + delta_tbl <- deltas |> + dplyr::group_by(model_output, pft, county, ensemble) |> + dplyr::summarise(total_delta_Mg = sum(delta_total_c_Mg), total_ha = sum(area_ha), .groups = "drop") |> + dplyr::mutate(`Delta Density (Mg/ha)` = total_delta_Mg / total_ha, + `Delta Total (Tg)` = PEcAn.utils::ud_convert(total_delta_Mg, "Mg", "Tg")) |> + dplyr::group_by(model_output, pft, county) |> + dplyr::summarise(`Delta Density (Mg/ha)` = mean(`Delta Density (Mg/ha)`), + `Delta Total (Tg)` = mean(`Delta Total (Tg)`), .groups = "drop") |> + dplyr::mutate(`Delta Density (Mg/ha)` = round(`Delta Density (Mg/ha)`, 4), + `Delta Total (Tg)` = round(`Delta Total (Tg)`, 3)) |> + dplyr::rename(`Carbon Pool` = model_output, `PFT` = pft, County = county) |> + dplyr::arrange(`Carbon Pool`, `PFT`, County) + + DT::datatable( + delta_tbl, + extensions = c('Scroller','SearchPanes','Select'), + options = list( + dom = 'Plfrtip', + pageLength = 10, + searchHighlight = TRUE, + deferRender = TRUE, + scroller = TRUE, + scrollY = "60vh", + searchPanes = list(cascadePanes = TRUE, initCollapsed = TRUE), + columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1,2))) + ), + class = "stripe hover compact", + rownames = FALSE, escape = FALSE, + filter = 'top' + ) +} else { + cat("Delta file not found.") +} +``` diff --git a/reports/variable_importance.qmd b/reports/variable_importance.qmd new file mode 100644 index 0000000..fe3c486 --- /dev/null +++ b/reports/variable_importance.qmd @@ -0,0 +1,115 @@ +--- +title: "Variable Importance and Effects" +format: + html: + self-contained: true + embed-resources: true + df-print: paged + toc: true +--- + +# Overview + +Diagnostics reflect exactly what scripts/040_downscale.R and scripts/042_downscale_analysis.R produce. + +- Variable importance: per-ensemble importance computed during downscaling (040) and summarized by median and IQR across ensembles (042). +- Partial dependence (PDP): top two predictors (by median importance) using the saved RF model and training data from 040. +- ALE (Accumulated Local Effects) and ICE: optional if the iml package is available; computed from the same saved model and training data. + +Notes + +- Importance units depend on the RF backend: %IncMSE or an equivalent importance metric; higher values indicate stronger influence. +- PDP assumes feature independence; may mislead under strong collinearity. ALE is preferred in that case and is centered at 0. +- The mixed scenario (woody + annual) has no standalone model; therefore VI/PDP/ALE/ICE are not generated for that PFT. +- PDPs are generated from a single saved RF model (first ensemble) for illustration, while importance ranks are aggregated across ensembles. + +# Importance + Partial Dependence + +```{r, include=FALSE} +options(ccmmf.quiet_banner = TRUE) +source(here::here("000-config.R")) +``` + +```{r, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} + +fig_dir <- here::here("figures") +imp_pngs <- list.files(fig_dir, pattern = "_importance_partial_plots\\.png$", full.names = TRUE) +if (length(imp_pngs) == 0) { + cat("No importance/partial plots found. Run scripts/042_downscale_analysis.R to generate them.") +} else { + # Parse filenames into PFT and pool: __importance_partial_plots.png + meta <- tibble::tibble(file = imp_pngs) |> + dplyr::mutate(name = basename(file)) |> + # capture any pool token (last underscore-delimited token before suffix) + tidyr::extract(name, into = c("pft", "pool"), + regex = "^(.*)_([^_]+)_importance_partial_plots\\.png$", + remove = FALSE) |> + dplyr::arrange(pool, pft) + + pools <- unique(meta$pool) + for (po in pools) { + cat("\n## ", po, " by PFT\n\n", sep = "") + mf <- dplyr::filter(meta, pool == po) + for (i in seq_len(nrow(mf))) { + cat("### ", gsub("_", " ", mf$pft[i]), "\n\n", sep = "") + knitr::include_graphics(mf$file[i]) + cat("\n") + cat("- Left: variable importance (median across ensembles) with interquartile ranges.\n") + cat("- Middle/Right: PDP for the top two predictors.\n\n") + } + } +} +``` + +# ALE and ICE Effects + +```{r, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} +ale_svgs <- list.files(fig_dir, pattern = "_ALE_predictor[0-9]+\\.svg$", full.names = TRUE) +ice_svgs <- list.files(fig_dir, pattern = "_ICE_predictor[0-9]+\\.svg$", full.names = TRUE) + +if (length(ale_svgs) == 0 && length(ice_svgs) == 0) { + cat("ALE/ICE plots not found (package 'iml' may be missing during generation).\n") +} else { + parse_fx <- function(files, kind) { + tibble::tibble(file = files) |> + dplyr::mutate(name = basename(file)) |> + # capture any pool token (last underscore-delimited token before kind) + tidyr::extract( + name, + into = c("pft", "pool", "pred_idx"), + regex = paste0("^(.*)_([^_]+)_", kind, "_predictor([0-9]+)\\.svg$"), + remove = FALSE + ) |> + dplyr::mutate(pred_idx = as.integer(pred_idx)) + } + + ale_meta <- parse_fx(ale_svgs, "ALE") + ice_meta <- parse_fx(ice_svgs, "ICE") + + # Merge ALE and ICE by pft/pool/pred_idx + meta <- dplyr::full_join( + dplyr::rename(ale_meta, ale_file = file), + dplyr::rename(ice_meta, ice_file = file), + by = c("pft", "pool", "pred_idx") + ) |> + dplyr::arrange(pool, pft, pred_idx) + + pools <- unique(meta$pool) + for (po in pools) { + cat("\n## ", po, " – ALE and ICE\n\n", sep = "") + mf <- dplyr::filter(meta, pool == po) + pf <- unique(mf$pft) + for (pf_i in pf) { + cat("### ", gsub("_", " ", pf_i), "\n\n", sep = "") + for (k in unique(mf$pred_idx[mf$pft == pf_i])) { + row <- dplyr::filter(mf, pft == pf_i, pred_idx == k)[1, ] + cat("#### Top predictor ", k, "\n\n", sep = "") + if (!is.na(row$ale_file)) knitr::include_graphics(row$ale_file) + if (!is.na(row$ice_file)) knitr::include_graphics(row$ice_file) + cat("\n- ALE: centered effect; more robust under collinearity.\n", + "- ICE: per-observation curves; heterogeneity and interactions appear as diverse or crossing trajectories.\n\n") + } + } + } +} +``` diff --git a/scripts/040_downscale.R b/scripts/040_downscale.R index f343a7b..6e98c44 100644 --- a/scripts/040_downscale.R +++ b/scripts/040_downscale.R @@ -457,13 +457,15 @@ for (pool in outputs_to_extract) { if (!dir.exists(models_dir)) dir.create(models_dir, recursive = TRUE, showWarnings = FALSE) if (!dir.exists(train_dir)) dir.create(train_dir, recursive = TRUE, showWarnings = FALSE) - spec_key <- paste0(safe_sanitize(pft_i), "_", safe_sanitize(pool)) + spec_key <- paste0(janitor::make_clean_names(pft_i), "_", janitor::make_clean_names(pool)) saveRDS(result$model, file = file.path(models_dir, paste0(spec_key, "_models.rds"))) - if (!is.null(result$data) && !is.null(result$data$training)) { - tr_path <- file.path(train_dir, paste0(spec_key, "_training.csv")) - readr::write_csv(result$data$training, tr_path) - } + # Write the explicit training covariate matrix for the sites used to fit the model + tr_covs <- dp_pft |> + dplyr::semi_join(train_pts, by = "site_id") |> + dplyr::select(site_id, dplyr::all_of(covariate_names)) + tr_path <- file.path(train_dir, paste0(spec_key, "_training.csv")) + readr::write_csv(tr_covs, tr_path) ens_labels <- names(result$predictions) if (is.null(ens_labels)) ens_labels <- as.character(seq_along(result$predictions)) @@ -745,7 +747,7 @@ if (length(target_woody_sites) == 0) { f_annual <- 0.5 # TODO: will come from monitoring / scenario data later mix_df <- mix_df |> dplyr::mutate( - mixed_pred = combine_value( + mixed_pred = combine_mixed_crops( woody_value = .data$woody_pred, annual_value = .data$annual_end, annual_init = .data$annual_start, diff --git a/scripts/041_aggregate_to_county.R b/scripts/041_aggregate_to_county.R index 50f4b81..2656b1a 100644 --- a/scripts/041_aggregate_to_county.R +++ b/scripts/041_aggregate_to_county.R @@ -1,80 +1,48 @@ -# Aggregation to County Level -# Loads downscaling outputs saved by 040_downscale.R and aggregates to counties - -# Load configuration and paths -source("000-config.R") - -ca_attributes_csv <- file.path(data_dir, "ca_field_attributes.csv") -ca_attributes <- readr::read_csv(ca_attributes_csv) +#### ---- Aggregate to County Level ---- #### +# Inputs: downscaling outputs saved by 040_downscale.R +# Outputs: county_summaries.csv +PEcAn.logger::logger.info("***Starting Aggregation to County Level***") -# Load downscaling checkpoint created by 040_downscale.R -checkpoint_file <- file.path(cache_dir, "downscaling_output.RData") -load(checkpoint_file) -# Ensure required objects are present -required_objs <- c("downscale_output_list", "ensemble_ids", "covariates") -missing_objs <- required_objs[!vapply(required_objs, exists, logical(1))] -if (length(missing_objs) > 0) { - stop(paste("Missing required objects in checkpoint:", paste(missing_objs, collapse = ", "))) -} -# Load field geometry/attributes needed for aggregation -if (!exists("ca_fields")) { - ca_fields_full <- sf::read_sf(file.path(data_dir, "ca_fields.gpkg")) - ca_fields <- ca_fields_full |> - dplyr::select(site_id, county, area_ha) -} +# Load configuration and paths +source("000-config.R") -PEcAn.logger::logger.info("***Starting Aggregation to County Level***") -#### ---- Aggregate to County Level ---- #### -#### TODO Split into separate script? +downscale_preds_csv <- file.path(model_outdir, "downscaled_preds.csv") +library(readr) +downscale_preds <- vroom::vroom( + downscale_preds_csv, + col_types = readr::cols( + pft = readr::col_character(), + model_output = readr::col_character(), + ensemble = readr::col_double(), + site_id = readr::col_character(), + county = readr::col_character(), + area_ha = readr::col_double(), + c_density_Mg_ha = readr::col_double(), + total_c_Mg = readr::col_double() + ) +) -PEcAn.logger::logger.info("Aggregating to County Level") +ensemble_ids <- unique(downscale_preds$ensemble) +# For testing, sample predictions evenly across counties and pfts if (!PRODUCTION) { - # For testing, use a subset of fields - # could be even faster if we queried from gpkg: - # sf::read_sf(..., sql = "SELECT * FROM ca_fields WHERE site_id IN (...)") - ca_fields <- ca_fields |> - dplyr::right_join(covariates, by = "site_id") -} + # Sample the same site_ids per county across all PFTs + site_sample <- downscale_preds |> + dplyr::distinct(county, site_id) |> + dplyr::group_by(county) |> + dplyr::slice_sample(n = pmin(10L, dplyr::n())) |> + dplyr::ungroup() -# Convert list to table with predictions and site identifier -# Helper: Convert a single downscale object to tidy predictions table -get_downscale_preds <- function(downscale_obj) { - purrr::map( - downscale_obj$predictions, - ~ tibble::tibble(site_id = downscale_obj$site_ids, prediction = .x) - ) |> - dplyr::bind_rows(.id = "ensemble") |> - dplyr::left_join(ca_fields, by = "site_id") + downscale_preds <- downscale_preds |> + dplyr::inner_join(site_sample, by = c("county", "site_id")) } -# Assemble predictions; carry PFT label by parsing element name: "{pft}::{pool}" -downscale_preds <- purrr::map(downscale_output_list, get_downscale_preds) |> - dplyr::bind_rows(.id = "spec") |> - tidyr::separate( - col = "spec", - into = c("pft", "model_output"), - sep = "::", - remove = TRUE - ) |> - # Convert kg/m2 to Mg/ha using PEcAn.utils::ud_convert - dplyr::mutate(c_density_Mg_ha = PEcAn.utils::ud_convert(prediction, "kg/m2", "Mg/ha")) |> - # Calculate total Mg per field: c_density_Mg_ha * area_ha - dplyr::mutate(total_c_Mg = c_density_Mg_ha * area_ha) - -## Write out downscaled predictions - -readr::write_csv( - downscale_preds, - file.path(model_outdir, "downscaled_preds.csv") -) - -### TODO Debug and catch if it appears again +### TODO Debug and catch if NAs appears again na_summary <- downscale_preds |> dplyr::summarise(dplyr::across(dplyr::everything(), ~ sum(is.na(.x)))) |> tidyr::pivot_longer(dplyr::everything(), names_to = "column", values_to = "n_na") |> @@ -135,6 +103,8 @@ county_summaries <- ens_county_preds |> sd_total_c_Tg = sd(total_c_Tg), mean_c_density_Mg_ha = mean(c_density_Mg_ha), sd_c_density_Mg_ha = sd(c_density_Mg_ha), + mean_total_ha = mean(total_ha), + sd_total_ha = sd(total_ha), .groups = "drop" ) |> dplyr::mutate( diff --git a/scripts/042_downscale_analysis.R b/scripts/042_downscale_analysis.R index 4cba288..a76a5b7 100644 --- a/scripts/042_downscale_analysis.R +++ b/scripts/042_downscale_analysis.R @@ -1,135 +1,170 @@ -# Load required libraries and data +## Downscale analysis using predictions and covariates source("000-config.R") -checkpoint_file <- file.path(cache_dir, "downscaling_output.RData") -checkpoint_objects <- load(checkpoint_file) -PEcAn.logger::logger.info("Loaded checkpoint objects:", paste(checkpoint_objects, collapse = ",")) -# Objects expected: -# - downscale_output_list (named like "woody::AGB", "annual::AGB", ...) -# - covariates, design_points, design_covariates, ensemble_ids - -# Identify available PFT+pool specs from names -spec_table <- tibble::tibble(spec = names(downscale_output_list)) |> - tidyr::separate(spec, into = c("pft_key", "model_output"), sep = "::", remove = FALSE) |> - dplyr::mutate( - pft = dplyr::case_when( - pft_key == "woody" ~ "woody perennial crop", - pft_key == "annual" ~ "annual crop", - TRUE ~ pft_key - ) - ) -##### Variable Importance Analysis (by PFT and pool) ##### -importance_summary <- purrr::map_dfr(spec_table$spec, function(sp) { - obj <- downscale_output_list[[sp]] - importances <- purrr::map(ensemble_ids, function(i) { - model <- obj[["model"]][[i]] - vi <- randomForest::importance(model) - # Prefer %IncMSE if present; otherwise use IncNodePurity - if ("%IncMSE" %in% colnames(vi)) vi[, "%IncMSE"] else vi[, 1] - }) - predictors <- rownames(randomForest::importance(obj[["model"]][[1]])) - imp_df <- purrr::map_dfr(importances, ~ tibble::tibble(importance = .x), .id = "ensemble") |> - dplyr::group_by(ensemble) |> - dplyr::mutate(predictor = predictors) |> - dplyr::ungroup() - spec_row <- dplyr::filter(spec_table, spec == sp) - imp_df |> - dplyr::group_by(predictor) |> - dplyr::summarize( - median_importance = median(importance, na.rm = TRUE), - lcl_importance = stats::quantile(importance, 0.25, na.rm = TRUE), - ucl_importance = stats::quantile(importance, 0.75, na.rm = TRUE), - .groups = "drop" - ) |> - dplyr::mutate( - model_output = spec_row$model_output, - pft_key = spec_row$pft_key, - pft = spec_row$pft - ) -}) +# Variable Importance (VI) + +# Lighter defaults in development to avoid memory/timeouts +rf_ntree <- if (PRODUCTION) 500 else 300 +rf_sample_n <- 20000 # per-ensemble sampling for importance + +preds_csv <- file.path(model_outdir, "downscaled_preds.csv") +meta_json <- file.path(model_outdir, "downscaled_preds_metadata.json") + +downscale_preds <- vroom::vroom( + preds_csv, + col_types = readr::cols( + site_id = readr::col_character(), + pft = readr::col_character(), + ensemble = readr::col_double(), + c_density_Mg_ha = readr::col_double(), + total_c_Mg = readr::col_double(), + area_ha = readr::col_double(), + county = readr::col_character(), + model_output = readr::col_character() + ) +) + +meta <- tryCatch(jsonlite::read_json(meta_json, simplifyVector = TRUE), error = function(e) list()) +ensemble_ids <- if (!is.null(meta$ensembles)) meta$ensembles else sort(unique(downscale_preds$ensemble)) + +covariates_csv <- file.path(data_dir, "site_covariates.csv") + +covariates <- readr::read_csv(covariates_csv) |> + dplyr::select(site_id, where(is.numeric), -climregion_id) +covariate_names <- names(dplyr::select(covariates, where(is.numeric))) +PEcAn.logger::logger.info("Loaded predictions, metadata, and covariates") +PEcAn.logger::logger.info("Rows in predictions:", nrow(downscale_preds), "; unique sites:", dplyr::n_distinct(downscale_preds$site_id)) +PEcAn.logger::logger.info("Ensembles detected:", paste(head(ensemble_ids, 10), collapse = ", "), if (length(ensemble_ids) > 10) "..." else "") +PEcAn.logger::logger.info("Number of numeric predictors:", length(covariate_names), "; sample:", paste(utils::head(covariate_names, 6), collapse = ", ")) + +preds_join <- downscale_preds |> + dplyr::left_join(covariates, by = "site_id") |> + tidyr::drop_na(c_density_Mg_ha) + +# Optional: load training site metadata +train_sites_csv <- file.path(model_outdir, "training_sites.csv") +train_sites <- NULL +train_sites <- readr::read_csv(train_sites_csv, show_col_types = FALSE) +PEcAn.logger::logger.info("Training site metadata found:", train_sites_csv, "; rows:", nrow(train_sites)) +PEcAn.logger::logger.info("Training site columns:", paste(colnames(train_sites), collapse = ", ")) + +# Build spec table from predictions (includes mixed pft) +spec_table <- preds_join |> + dplyr::distinct(pft, model_output) |> + dplyr::arrange(pft, model_output) +PEcAn.logger::logger.info("Data prep complete") + +## No surrogate VI: read per-ensemble VI saved by 040 and summarize here +vi_path_for_spec <- function(pft_i, pool) { + spec_key <- paste0(janitor::make_clean_names(pft_i), "_", janitor::make_clean_names(pool)) + file.path(model_outdir, paste0("vi_", spec_key, "_by_ensemble.csv")) +} -for (i in seq_len(nrow(spec_table))) { - sp <- spec_table$spec[i] - pft_label <- spec_table$pft[i] - pool <- spec_table$model_output[i] - obj <- downscale_output_list[[sp]] - model <- obj[["model"]][[1]] +importance_summary <- purrr::pmap_dfr( + list(spec_table$pft, spec_table$model_output), + function(pft_i, pool) { + vi_file <- vi_path_for_spec(pft_i, pool) + if (!file.exists(vi_file)) { + PEcAn.logger::logger.warn("VI per-ensemble CSV not found for ", pft_i, "::", pool) + return(NULL) + } + vi_tbl <- readr::read_csv(vi_file, show_col_types = FALSE) + vi_tbl |> + dplyr::group_by(pft, model_output, predictor) |> + dplyr::summarize( + median_importance = stats::median(importance, na.rm = TRUE), + lcl_importance = stats::quantile(importance, 0.25, na.rm = TRUE), + ucl_importance = stats::quantile(importance, 0.75, na.rm = TRUE), + n_ensembles = dplyr::n(), + .groups = "drop" + ) + } +) + +for (row in seq_len(nrow(spec_table))) { + pft_i <- spec_table$pft[row] + pool <- spec_table$model_output[row] + PEcAn.logger::logger.info("Processing plots for spec:", paste(pft_i, pool, sep = "::")) + + spec_key <- paste0(janitor::make_clean_names(pft_i), "_", janitor::make_clean_names(pool)) + mdl_path <- file.path(cache_dir, "models", paste0(spec_key, "_models.rds")) + trn_path <- file.path(cache_dir, "training_data", paste0(spec_key, "_training.csv")) + rf <- NULL + x <- NULL + if (file.exists(mdl_path) && file.exists(trn_path)) { + models <- readRDS(mdl_path) + df_spec <- readr::read_csv(trn_path, show_col_types = FALSE) + rf <- models[[1]] + if (inherits(rf, "randomForest")) { + x <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() + } + } + if (is.null(rf)) { + PEcAn.logger::logger.warn("Saved model not found for ", pft_i, "::", pool, "; skipping PDP") + next + } - # Top 2 predictors for this PFT+pool top_predictors <- importance_summary |> - dplyr::filter(model_output == pool, pft == pft_label) |> + dplyr::filter(model_output == pool, pft == pft_i) |> dplyr::arrange(dplyr::desc(median_importance)) |> dplyr::slice_head(n = 2) |> dplyr::pull(predictor) if (length(top_predictors) < 2) { - PEcAn.logger::logger.warn("Not enough predictors for partial plots:", sp) + PEcAn.logger::logger.warn("Not enough predictors for partial plots:", paste(pft_i, pool, sep = "::")) next } - # Set up PNG for three panel plot - PEcAn.logger::logger.info("Creating importance and partial plots for", sp) + PEcAn.logger::logger.info("Creating importance and partial plots for", paste(pft_i, pool, sep = "::")) importance_partial_plot_fig <- here::here( "figures", - paste0(gsub("::", "_", sp), "_importance_partial_plots.png") + paste0(gsub(" ", "_", pft_i), "_", pool, "_importance_partial_plots.png") ) - png( - filename = importance_partial_plot_fig, - width = 14, height = 6, units = "in", res = 300, bg = "white" - ) + png(filename = importance_partial_plot_fig, width = 14, height = 6, units = "in", res = 300, bg = "white") par(mfrow = c(1, 3)) # Panel 1: Variable importance plot output_importance <- importance_summary |> - dplyr::filter(model_output == pool, pft == pft_label) + dplyr::filter(model_output == pool, pft == pft_i) par(mar = c(5, 10, 4, 2)) with( output_importance, dotchart(median_importance, labels = reorder(predictor, median_importance), xlab = "Median Increase MSE (SD)", - main = paste("Importance -", pool, "-", pft_label), + main = paste("Importance -", pool, "-", pft_i), pch = 19, col = "steelblue", cex = 1.2 ) ) with( output_importance, - segments(lcl_importance, - seq_along(predictor), - ucl_importance, - seq_along(predictor), - col = "gray50" - ) + segments(lcl_importance, seq_along(predictor), ucl_importance, seq_along(predictor), col = "gray50") ) - # Panel 2: Partial plot for top predictor + # Panel 2 and 3: Partial plots for top predictors par(mar = c(5, 5, 4, 2)) - randomForest::partialPlot(model, - pred.data = design_covariates, - x.var = top_predictors[1], - main = paste("Partial Dependence -", top_predictors[1]), - xlab = top_predictors[1], - ylab = paste("Predicted", pool, "-", pft_label), - col = "steelblue", - lwd = 2 - ) - - # Panel 3: Partial plot for second predictor - randomForest::partialPlot(model, - pred.data = design_covariates, - x.var = top_predictors[2], - main = paste("Partial Dependence -", top_predictors[2]), - xlab = top_predictors[2], - ylab = paste("Predicted", pool, "-", pft_label), - col = "steelblue", - lwd = 2 - ) + if (requireNamespace("iml", quietly = TRUE)) { + requireNamespace("randomForest", quietly = TRUE) + pred_fun <- function(m, newdata) stats::predict(m, newdata) + predictor_obj <- iml::Predictor$new(model = rf, data = x, y = NULL, predict.function = pred_fun) + fe1 <- iml::FeatureEffect$new(predictor_obj, feature = top_predictors[1], method = "pdp") + fe2 <- iml::FeatureEffect$new(predictor_obj, feature = top_predictors[2], method = "pdp") + plot(fe1) + plot(fe2) + } else { + randomForest::partialPlot(rf, + pred.data = x, x.var = top_predictors[1], + main = paste("Partial Dependence -", top_predictors[1]), + xlab = top_predictors[1], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 + ) + randomForest::partialPlot(rf, + pred.data = x, x.var = top_predictors[2], + main = paste("Partial Dependence -", top_predictors[2]), + xlab = top_predictors[2], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 + ) + } dev.off() - PEcAn.logger::logger.info( - "Saved importance and partial plots for", - sp, " to ", importance_partial_plot_fig - ) } @@ -137,66 +172,46 @@ for (i in seq_len(nrow(spec_table))) { # robust marginal effect estimates even with correlated predictors. # library(iml) -PEcAn.logger::logger.info("***Starting ALE plots***") -for (i in seq_len(nrow(spec_table))) { - sp <- spec_table$spec[i] - pft_label <- spec_table$pft[i] - pool <- spec_table$model_output[i] - model <- downscale_output_list[[sp]][["model"]][[1]] - - top_predictors <- importance_summary |> - dplyr::filter(model_output == pool, pft == pft_label) |> - dplyr::arrange(dplyr::desc(median_importance)) |> - dplyr::slice_head(n = 2) |> - dplyr::pull(predictor) +PEcAn.logger::logger.info("Starting ALE and ICE plots for all specs") +for (row in seq_len(nrow(spec_table))) { + pft_i <- spec_table$pft[row] + pool <- spec_table$model_output[row] + PEcAn.logger::logger.info("Starting ALE/ICE plots for spec:", paste(pft_i, pool, sep = "::")) + spec_key <- paste0(janitor::make_clean_names(pft_i), "_", janitor::make_clean_names(pool)) + mdl_path <- file.path(cache_dir, "models", paste0(spec_key, "_models.rds")) + trn_path <- file.path(cache_dir, "training_data", paste0(spec_key, "_training.csv")) + if (!file.exists(mdl_path) || !file.exists(trn_path)) { + PEcAn.logger::logger.warn("Saved model/training data missing for ", pft_i, "::", pool, "; skipping ALE/ICE") + next + } + models <- readRDS(mdl_path) + df_spec <- readr::read_csv(trn_path, show_col_types = FALSE) + rf <- models[[1]] + x <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() + requireNamespace("randomForest", quietly = TRUE) predictor_obj <- iml::Predictor$new( - model = model, - data = design_covariates, - y = NULL, + model = rf, data = x, y = NULL, predict.function = function(m, newdata) stats::predict(m, newdata) ) - - for (j in seq_along(top_predictors)) { - pred_var_name <- top_predictors[j] - PEcAn.logger::logger.info("Starting ALE calculation for", sp, "predictor:", pred_var_name) - ale <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ale") - ggplot2::ggsave( - filename = here::here("figures", paste0(gsub("::", "_", sp), "_ALE_predictor", j, ".png")), - plot = plot(ale) + ggplot2::ggtitle(paste("ALE for", pred_var_name, "on", pool, "-", pft_label)), - width = 6, height = 4, units = "in", dpi = 300 - ) - } -} - -## ICE Plots -PEcAn.logger::logger.info("Creating ICE plots for top predictors") -for (i in seq_len(nrow(spec_table))) { - sp <- spec_table$spec[i] - pft_label <- spec_table$pft[i] - pool <- spec_table$model_output[i] - model <- downscale_output_list[[sp]][["model"]][[1]] - top_predictors <- importance_summary |> - dplyr::filter(model_output == pool, pft == pft_label) |> + dplyr::filter(model_output == pool, pft == pft_i) |> dplyr::arrange(dplyr::desc(median_importance)) |> dplyr::slice_head(n = 2) |> dplyr::pull(predictor) - - predictor_obj <- iml::Predictor$new( - model = model, - data = design_covariates, - y = NULL, # y is not used by predict.randomForest - predict.function = function(m, newdata) stats::predict(m, newdata) - ) - for (j in seq_along(top_predictors)) { pred_var_name <- top_predictors[j] + ale <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ale") + ggsave_optimized( + filename = here::here("figures", paste0(gsub(" ", "_", pft_i), "_", pool, "_ALE_predictor", j, ".svg")), + plot = plot(ale) + ggplot2::ggtitle(paste("ALE for", pred_var_name, "on", pool, "-", pft_i)), + width = 6, height = 4, units = "in" + ) ice <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ice") - ggplot2::ggsave( - filename = here::here("figures", paste0(gsub("::", "_", sp), "_ICE_predictor", j, ".png")), - plot = plot(ice) + ggplot2::ggtitle(paste("ICE for", pred_var_name, "on", pool, "-", pft_label)), - width = 6, height = 4, units = "in", dpi = 300 + ggsave_optimized( + filename = here::here("figures", paste0(gsub(" ", "_", pft_i), "_", pool, "_ICE_predictor", j, ".svg")), + plot = plot(ice) + ggplot2::ggtitle(paste("ICE for", pred_var_name, "on", pool, "-", pft_i)), + width = 6, height = 4, units = "in" ) } } From a2e81d8bba9796eed95419afac6a9c2341cbd0c0 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Tue, 23 Sep 2025 01:30:47 -0400 Subject: [PATCH 29/70] Update document formats to non-self-contained and adjust references in reports --- README.md | 8 -- docs/mixed_system_prototype.qmd | 3 +- docs/workflow_documentation.md | 121 ++++++++++++++++------------- reports/design_points_analysis.qmd | 3 +- reports/downscaling_results.qmd | 2 +- reports/variable_importance.qmd | 5 +- 6 files changed, 73 insertions(+), 69 deletions(-) diff --git a/README.md b/README.md index b07e3bf..4500a52 100644 --- a/README.md +++ b/README.md @@ -115,14 +115,6 @@ quarto render These steps will publish to https://ccmmf.github.io/downscaling. -This does not commit compiled HTML to `main`. Instead, Quarto pushes the built site to a separate `gh-pages` branch. - -One-time setup in GitHub: -1. Settings → Pages → Set Source to “Deploy from a branch”. -2. Select Branch: `gh-pages` and Folder: `/ (root)`. - -Publish from your machine after rendering: - ```bash # Build site locally (runs R code and embeds results) quarto render diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 240d968..74e88bc 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -5,7 +5,8 @@ date: "`r Sys.Date()`" quarto-cache: true format: html: - self-contained: true + self-contained: false + df-print: default execute: cache: true echo: false diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 2ca1702..b00ea4f 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -1,21 +1,22 @@ --- title: "Downscaling Workflow Technical Documentation" +author: "David LeBauer" +date: "`r Sys.Date()`" format: html: - self-contained: true - embed-resources: true + self-contained: false toc: true execute: echo: false --- -# Technical Documentation {#sec-workflow-overview} +# Technical Documentation The downscaling workflow predicts carbon pools (Soil Organic Carbon and Aboveground Biomass) for cropland fields in California and then aggregates these predictions to the county scale. It uses an ensemble-based approach to uncertainty propagation and analysis, maintaining ensemble structure to propagate errors through the prediction and aggregation processes. -![Spatial downscaling workflow using machine learning with environmental covariates](figures/spatial_downscaling_workflow.png){width="5in"} +![Spatial downscaling workflow using machine learning with environmental covariates](../figures/spatial_downscaling_workflow.png){width="5in"} ## Terminology @@ -29,31 +30,33 @@ It uses an ensemble-based approach to uncertainty propagation and analysis, main The workflows are -1. **Site Selection**: uses environmental variables (in the future we will add agronomic practice changes) to create clusters and then select representative sites. The `design_points.csv` are then passed to the ensemble workflow. -2. **Ensemble Runs**: generate ensemble outputs. These are done in the [ccmmf/workflows](https://github.com/ccmmf/workflows) repository. -3. **Downscaling**: uses ensemble outputs to make predictions for each field in CA, then aggregates to county level summaries and produces figures. +1. **Site Selection**: uses environmental variables (later also management layers) to create clusters and then select representative sites. The design_points.csv are then passed to the ensemble workflow +2. Ensemble in ccmmf/workflows repository, generates ensemble outputs +3. **Downscaling**: uses ensemble outputs to make predictions for each field in CA then aggregate to county level summaries. -Overview: +## Overview -```r +```sh # Site Selection -Rscript scripts/009_update_landiq.R --production true -Rscript scripts/010_prepare_covariates.R --production true -Rscript scripts/011_prepare_anchor_sites.R --production true -Rscript scripts/020_cluster_and_select_design_points.R --production true +Rscript scripts/009_update_landiq.R +Rscript scripts/010_prepare_covariates.R +Rscript scripts/011_prepare_anchor_sites.R +Rscript scripts/020_cluster_and_select_design_points.R +Rscript scripts/021_clustering_diagnostics.R + # Downscaling and Aggregation -Rscript scripts/030_extract_sipnet_output.R --production true -Rscript scripts/040_downscale.R --production true -Rscript scripts/041_aggregate_to_county.R --production true -Rscript scripts/042_downscale_analysis.R --production true -Rscript scripts/043_county_level_plots.R --production true +Rscript scripts/030_extract_sipnet_output.R +Rscript scripts/031_aggregate_sipnet_output.R +Rscript scripts/040_downscale_and_aggregate.R +Rscript scripts/041_downscale_analysis.R +Rscript scripts/042_county_level_plots.R ``` ## Workflow Steps ### Configuration -Workflow settings are configured in `000-config.R`. +Workflow settings are configured in `000-config.R`. The configuration script reads the CCMMF directory from the environment variable `CCMMF_DIR` (set in .Renviron), and uses it to define paths for inputs and outputs. @@ -84,13 +87,11 @@ git clone git@github.com:ccmmf/downscaling - sets repositories from which R packages are installed - runs `renv/activate.R` - `000-config.R` - - Parses a `--production` flag via `argparse` to toggle development vs production mode. In non-interactive runs, default is development (`--production FALSE`); interactive sessions override to production. Example: `Rscript scripts/040_downscale.R --production true`. - - Sets directories for inputs and outputs used by the pipelines. - - Sets parallelization via `future::plan(multicore, workers = availableCores() - 1)`. - - Controls variables to extract via `outputs_to_extract` default: `c("TotSoilCarb", "AGB")`. - - Auto-sources all helper functions in `R/` and runs unit tests under `tests/testthat` at startup. + - set `pecan_outdir` based on the CCMMF_DIR. + - confirm that relative paths (`data_raw`, `data`, `cache`) are correct. + - detect and use resources for parallel processing (with future package); default is `available cores - 1` + - PRODUCTION mode setting. For testing, set `PRODUCTION` to `FALSE`. This is _much_ faster and requires fewer computing resources because it subsets large datasets. Once a test run is successful, set `PRODUCTION` to `TRUE` to run the full workflow. - +# ### 1. Data Preparation @@ -232,14 +233,15 @@ Rscript scripts/030_extract_sipnet_output.R Extracts and formats SIPNET outputs for downscaling: -- Extract output variables (AGB, TotSoilCarb) from SIPNET simulations (configurable via `outputs_to_extract`). -- Aggregate site-level ensemble outputs into a long EFI-style format: `datetime, site_id, lat, lon, pft, parameter, variable, prediction`. +- Extract output variables (AGB, TotSoilCarb) from SIPNET simulations +- Aggregate site-level ensemble outputs into long and 4D array formats +- Save CSV and NetCDF files following EFI standards **Inputs:** - `out/ENS--/YYYY.nc` **Outputs:** -- `out/ensemble_output.csv`: Long format data for selected variables +- `out/ensemble_output.csv`: Long format data ### 5. Mixed Cropping Systems @@ -247,11 +249,10 @@ Extracts and formats SIPNET outputs for downscaling: Rscript scripts/031_aggregate_sipnet_output.R ``` -Simulates mixed-cropping scenarios by combining outputs across two PFTs using the `combine_mixed_crops()` function. This is described in more detail in [Mixed System Prototype](../docs/mixed_system_prototype.qmd). +Simulates mixed-cropping scenarios by combining outputs across two PFTs using a mixed aggregation function (see Mixed System Prototype). Two methods are supported: -There are two methods for combining outputs: -- `weighted`: area-partitioned mix where `woody_cover + annual_cover = 1` -- `incremental`: preserve woody baseline (`woody_cover = 1`) and add the annual delta scaled by `annual_cover`. +- weighted: area-partitioned mix where `woody_cover + annual_cover = 1` +- incremental: preserve woody baseline (`woody_cover = 1`) and add the annual delta scaled by `annual_cover` The current analysis uses the weighted method to represent ground cover in orchards and vineyards. @@ -260,10 +261,8 @@ Outputs include `multi_pft_ensemble_output.csv`, `combined_ensemble_output.csv`, ### 6. Downscale, Aggregate to County, and Plot ```sh -Rscript scripts/040_downscale.R --production true -Rscript scripts/041_aggregate_to_county.R --production true -Rscript scripts/042_downscale_analysis.R --production true -Rscript scripts/043_county_level_plots.R --production true +Rscript scripts/040_downscale_and_aggregate.R +Rscript scripts/041_downscale_analysis.R ``` Builds Random Forest models to predict carbon pools for all fields; aggregates to county-level; summarizes variable importance; and produces maps: @@ -271,33 +270,47 @@ Builds Random Forest models to predict carbon pools for all fields; aggregates t - Train models on SIPNET ensemble runs at design points - Use environmental covariates to downscale predictions to all fields - Aggregate to county-level estimates -- Output maps and statistics of carbon density and stocls. +- Output maps and statistics of carbon density and totals **Inputs:** - `out/ensemble_output.csv`: SIPNET outputs - `data/site_covariates.csv`: Environmental covariates -Outputs from `040_downscale.R`: -- `out/downscaled_preds.csv`: Per-field predictions with ensemble, area, and county -- `out/downscaled_preds_metadata.json`: Metadata documenting ensembles, PFTs, variables, and date range -- `out/training_sites.csv`: Training site IDs per PFT × pool -- `out/vi___by_ensemble.csv`: Variable importance per ensemble -- `out/downscaled_deltas.csv`: Start --> end deltas used for delta maps +**Outputs from `040_downscale_and_aggregate.R`:** +- `cache/downscaling_output.RData`: Checkpoint for downstream analysis +- `model_outdir/county_summaries.csv`: County statistics (means/SDs across ensembles for stocks and densities) -Outputs from `041_aggregate_to_county.R`: -- `out/county_summaries.csv`: County statistics (means/SDs across ensembles for stocks and densities) +**Outputs from `041_downscale_analysis.R` (saved in `figures/`):** +- `_importance_partial_plots.png`: Variable importance with partial plots for top predictors +- `_ALE_predictor.png` and `_ICE_predictor.png`: ALE and ICE plots for top predictors -Outputs from `043_county_level_plots.R` (saved in `figures/`): -- County-level maps of carbon stock and density by PFT and pool. -- Difference maps (mixed − woody) and delta maps (start→end). +**Outputs from `042_county_level_plots.R` (saved in `figures/`):** +- `county__carbon_stock.png` and `county__carbon_density.png`: County-level maps of carbon stock and density by pool -## Technical Notes +## Technical Reference -**Ensemble Structure** +### Ensemble Structure Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. -**Implementation notes:** -- EFI-standard column `parameter` (ensemble ID) is renamed to `ensemble` in some steps for clarity. -- Pool variables are in `kg/m²` and are converted to `Mg/ha` for densities via `PEcAn.utils::ud_convert()`; field-level stocks are computed as `density × area`. + +# References + +**EFI Standards** + +Dietze, Michael C., R. Quinn Thomas, Jody Peters, Carl Boettiger, Gerbrand Koren, Alexey N. Shiklomanov, and Jaime Ashander. 2023. "A Community Convention for Ecological Forecasting: Output Files and Metadata Version 1.0." Ecosphere 14 (11): e4686. https://doi.org/10.1002/ecs2.4686. + +**Data Sources** + +Land IQ, LLC. California Crop Mapping (2014). California Department of Water Resources, 2017. https://data.cnra.ca.gov/dataset/statewide-crop-mapping. + +Hengl, T. et al. 2017. "SoilGrids250m: Global Gridded Soil Information Based on Machine Learning." PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748 + +Hersbach, H. et al. 2020. "The ERA5 Global Reanalysis." Quarterly Journal of the Royal Meteorological Society 146: 1999–2049. https://doi.org/10.1002/qj.3803 + +**Models** + +Braswell, Bobby H., William J. Sacks, Ernst Linder, and David S. Schimel. 2005. "Estimating Diurnal to Annual Ecosystem Parameters by Synthesis of a Carbon Flux Model with Eddy Covariance Net Ecosystem Exchange Observations." Global Change Biology 11 (2): 335–55. https://doi.org/10.1111/j.1365-2486.2005.00897.x. + +Liaw, Andy, and Matthew Wiener. 2002. "Classification and Regression by randomForest." R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. diff --git a/reports/design_points_analysis.qmd b/reports/design_points_analysis.qmd index 77e0b15..81572ef 100644 --- a/reports/design_points_analysis.qmd +++ b/reports/design_points_analysis.qmd @@ -2,8 +2,7 @@ title: "Design Points Analysis" format: html: - self-contained: true - embed-resources: true + self-contained: false df-print: paged toc: true --- diff --git a/reports/downscaling_results.qmd b/reports/downscaling_results.qmd index 3afb0fb..43318b9 100644 --- a/reports/downscaling_results.qmd +++ b/reports/downscaling_results.qmd @@ -23,7 +23,7 @@ For detailed information about methods and workflow, see the [Workflow Documenta Estimates for woody and mixed woody + annual systems are done for all fields in the LandIQ dataset with tree and vinyard crops; annual systems are done for all fields with herbaceous annual crops. -In the mixed scenario, ground cover is planted in all orchards. The methods for simulating the addition of ground cover to orchards, and is described in ![](../docs/mixed_system_prototype.qmd). Briefly, it combines downscaled model outputs from the woody perennial crop and annual crop PFTs to represent the effect of ground cover on carbon pools. +In the mixed scenario, ground cover is planted in all orchards. The methods for simulating the addition of ground cover to orchards are described in [Mixed System Prototype](../docs/mixed_system_prototype.qmd). Briefly, it combines downscaled model outputs from the woody perennial crop and annual crop PFTs to represent the effect of ground cover on carbon pools. # Results diff --git a/reports/variable_importance.qmd b/reports/variable_importance.qmd index fe3c486..4851e84 100644 --- a/reports/variable_importance.qmd +++ b/reports/variable_importance.qmd @@ -2,8 +2,7 @@ title: "Variable Importance and Effects" format: html: - self-contained: true - embed-resources: true + self-contained: false df-print: paged toc: true --- @@ -61,7 +60,7 @@ if (length(imp_pngs) == 0) { } ``` -# ALE and ICE Effects +## ALE and ICE Effects ```{r, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} ale_svgs <- list.files(fig_dir, pattern = "_ALE_predictor[0-9]+\\.svg$", full.names = TRUE) From d89297edcc7d65d04671da6598960a0cab250da1 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Wed, 24 Sep 2025 19:37:31 -0400 Subject: [PATCH 30/70] Remove unused masking threshold variable from configuration --- 000-config.R | 4 ---- 1 file changed, 4 deletions(-) diff --git a/000-config.R b/000-config.R index 3eeb874..6dd2e6c 100644 --- a/000-config.R +++ b/000-config.R @@ -53,10 +53,6 @@ model_outdir <- file.path(pecan_outdir, "out") set.seed(42) ca_albers_crs <- 3310 -# Plot masking threshold for county-level maps (share of statewide total carbon) -# Example: 0.01 = mask counties below 1% of statewide total carbon for that PFT/pool -MASK_THRESHOLD <- 0.01 - #### Messages #### msg <- glue::glue( "\n\n", From 57c2f90a9609a7bb430b24c0f17ce4d8270e55df Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Wed, 24 Sep 2025 19:52:17 -0400 Subject: [PATCH 31/70] Add county comparison sections and update workflow documentation, remove masks, re-organize plots for ease of comparison --- docs/mixed_system_prototype.qmd | 90 +++++++++++- docs/workflow_documentation.md | 13 +- .../spatial_downscaling_workflow.png | Bin figures/spatial_downscaling_workflow.webp | Bin 0 -> 66886 bytes reports/downscaling_results.qmd | 135 ++++++++++-------- scripts/040_downscale.R | 65 ++++++++- scripts/041_aggregate_to_county.R | 1 - scripts/043_county_level_plots.R | 86 +++-------- 8 files changed, 249 insertions(+), 141 deletions(-) rename {docs/figures => figures}/spatial_downscaling_workflow.png (100%) create mode 100644 figures/spatial_downscaling_workflow.webp diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 74e88bc..5f7f7ea 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -236,6 +236,74 @@ vars <- unique(combined_output$variable) ``` +## County Comparisons on Woody Sites (Downscaled) + +This section compares three treatment scenarios on the same set of woody sites using the downscaled outputs: + +- 100% woody (baseline) +- 100% woody + 50% annual (overlap) +- 100% annual (counterfactual on woody sites) + +```{r woody-treatments} +treat_path <- file.path(model_outdir, "treatments_woody_sites.csv") +if (file.exists(treat_path)) { + tr <- readr::read_csv(treat_path, show_col_types = FALSE) + + # County-level per-ensemble totals and densities, then average across ensembles + tr_county <- tr |> + dplyr::group_by(model_output, scenario, county, ensemble) |> + dplyr::summarise(total_Mg = sum(total_c_Mg, na.rm = TRUE), total_ha = sum(area_ha, na.rm = TRUE), .groups = "drop") |> + dplyr::mutate( + density_Mg_ha = dplyr::if_else(total_ha > 0, total_Mg / total_ha, NA_real_), + total_Tg = PEcAn.utils::ud_convert(total_Mg, "Mg", "Tg") + ) |> + dplyr::group_by(model_output, scenario, county) |> + dplyr::summarise( + mean_total_Tg = mean(total_Tg, na.rm = TRUE), + mean_density_Mg_ha = mean(density_Mg_ha, na.rm = TRUE), + .groups = "drop" + ) + + # Bring in county boundaries for maps + county_boundaries <- sf::st_read(file.path(data_dir, "ca_counties.gpkg"), quiet = TRUE) + + # Differences vs 100% woody baseline + base <- tr_county |> + dplyr::filter(scenario == "woody_100") |> + dplyr::select(model_output, county, base_total_Tg = mean_total_Tg, base_density_Mg_ha = mean_density_Mg_ha) + diffs <- tr_county |> + dplyr::filter(scenario != "woody_100") |> + dplyr::left_join(base, by = c("model_output", "county")) |> + dplyr::mutate( + delta_total_Tg = mean_total_Tg - base_total_Tg, + delta_density_Mg_ha = mean_density_Mg_ha - base_density_Mg_ha + ) |> + dplyr::left_join(county_boundaries, by = "county") + + # Simple maps: density deltas by scenario and pool + ggplot2::ggplot(diffs, ggplot2::aes(geometry = geom, fill = delta_density_Mg_ha)) + + ggplot2::geom_sf(color = "black", fill = NA) + + ggplot2::geom_sf() + + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + + ggplot2::facet_grid(model_output ~ scenario) + + ggplot2::theme_minimal() + + ggplot2::labs(title = "County-level density delta vs 100% woody (on woody sites)", fill = "Delta (Mg/ha)") + + # Table preview + DT::datatable( + diffs |> + dplyr::select(model_output, scenario, county, delta_total_Tg, delta_density_Mg_ha) |> + dplyr::arrange(model_output, scenario, county), + options = list(pageLength = 10, scrollY = "50vh", scroller = TRUE, deferRender = TRUE), + extensions = c('Scroller'), + rownames = FALSE + ) |> + DT::formatSignif(columns = c("delta_total_Tg", "delta_density_Mg_ha"), digits = 2) +} else { + cat("treatments_woody_sites.csv not found; run scripts/040_downscale.R to generate.") +} +``` + **Selecting Representative Sites** Next we select representative sites based on their productivity. We use AGB in the 100% woody scenario as an indicator of productivity. We select sites that are near the 15th, 50th, and 85th percentiles of AGB and categorize them as low, medium, and high productivity. @@ -499,6 +567,26 @@ ggplot() + ) ``` +## Effect on County Level Carbon Stocks + +Here we present county-level maps showing the impact of adding 50% ground cover to orchards on carbon stocks and densities. Positive values indicate an increase relative to woody-only. + +These maps are generated from the downscaled outputs that are presented in the [Downscaling Results](downscaling_results.qmd#results) page. + +### Change in Carbon Stocks (Tg): + +![](../figures/county_diff_woody_plus_annual_minus_woody_TotSoilCarb_carbon_stock.webp) + + +![](../figures/county_diff_woody_plus_annual_minus_woody_AGB_carbon_stock.webp) + + +### Change in Carbon Density (Mg/ha): + +![](../figures/county_diff_woody_plus_annual_minus_woody_TotSoilCarb_carbon_density.webp) + +![](../figures/county_diff_woody_plus_annual_minus_woody_AGB_carbon_density.webp) + ## Other Approaches Considered @@ -668,4 +756,4 @@ for (v in vars) { | Annual → Perennial | Change PFT; partial biomass retained. | High | Low | Low | | Agroforestry / Woody Buffers | Independent woody runs; spatial aggregation. | High | Low | Low | -: Healthy Soils Program practices, model representation, and priority for implementation in SIPNET. {#tbl-hsp} \ No newline at end of file +: Healthy Soils Program practices, model representation, and priority for implementation in SIPNET. {#tbl-hsp} diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index b00ea4f..9c07f25 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -16,7 +16,7 @@ The downscaling workflow predicts carbon pools (Soil Organic Carbon and Abovegro It uses an ensemble-based approach to uncertainty propagation and analysis, maintaining ensemble structure to propagate errors through the prediction and aggregation processes. -![Spatial downscaling workflow using machine learning with environmental covariates](../figures/spatial_downscaling_workflow.png){width="5in"} +![Spatial downscaling workflow using machine learning with environmental covariates](figures/spatial_downscaling_workflow.webp){width="5in"} ## Terminology @@ -82,7 +82,6 @@ git clone git@github.com:ccmmf/downscaling These are in a subdirectory of the CCMMF directory in order to make them available across all users (and because on some computers, they exceed allocated space in the home directory). - `R_LIBS_USER` must point to the platform and R version specific subdirectory inside `RENV_PATHS_LIBRARY`. - Example: `/projectnb/dietzelab/ccmmf/renv-library/linux-almalinux-8.10/R-4.4/x86_64-pc-linux-gnu` - `.Rprofile` - sets repositories from which R packages are installed - runs `renv/activate.R` @@ -92,16 +91,6 @@ git clone git@github.com:ccmmf/downscaling - detect and use resources for parallel processing (with future package); default is `available cores - 1` - PRODUCTION mode setting. For testing, set `PRODUCTION` to `FALSE`. This is _much_ faster and requires fewer computing resources because it subsets large datasets. Once a test run is successful, set `PRODUCTION` to `TRUE` to run the full workflow. -**Others:** - -_these shouldn't need to be changed unless you want to change the default behavior of the workflow_ - -- `renv.lock` is used for package management with `renv`. - - See [project renv setup docs](docs/renv_setup.md) for instructions about using `renv` for these workflows. - - See [renv package documentation](https://rstudio.github.io/renv/articles/renv.html) for more details. - -# - ### 1. Data Preparation ```sh diff --git a/docs/figures/spatial_downscaling_workflow.png b/figures/spatial_downscaling_workflow.png similarity index 100% rename from docs/figures/spatial_downscaling_workflow.png rename to figures/spatial_downscaling_workflow.png diff --git a/figures/spatial_downscaling_workflow.webp b/figures/spatial_downscaling_workflow.webp new file mode 100644 index 0000000000000000000000000000000000000000..2fac4a21649c6d216ba595e71ec7aa19f412a953 GIT binary patch literal 66886 zcmce-V{~NQ+BO=aW81cEJL#Zf+qToOZQHhO+cr8*$H_@Qz4yD{{hc$upJ$EwQKM$n zsB7)zQzq8{re!c&`29gO#$q(iU%xlG(Bu-33LO^8SE}jDu zYG!T2vfurA_xrf<5Hc-%jNXxt#DKt9yJ{df^DVmZZPPjULk>}=e$pvi&Clm_UTBil ztILo@t2pFn;G@|U@7L#myNunZB_7%9XL{F$2i=%ZuII5&`Fp{m*t^M_OAXlr-8FB~ z=Y{uqj?TsBWp>-=tIx)?i_XfspikT9(7DXL!0p9vyN~0~@JFe&@z2kX zsBP`UPl`{x50^Ker)Vo}$`7G8wCB(J>`gDaHR))Ha8KR)Z)Fe5 z@1GC61wNFo?aw}!pT~R$pFf{t-;&?qZo8wql0WQTkKej)_+ots@M1nWKXKld->To} zZt$pVanH4YJ`z9JKBx29Ha?cz7vJDsY;%+i($xgq8Bh7Q8w_0wB_g1bCJa~;;^LIPwv z!xdx^w#LT|Up@r0X0!or-tTY?aDBvs!72opdXZDR)4LW!s;4(;o8 zG_@(Zdh$Es==~|n69M1;CL5A7Yr=Yb9ZXYr^5=G&9GNJ=IMW^Z?x^S(rMP@3EY3dNp<>FSo z8g(V3iCxkwO=L3WFuNXDQ3v3&QnI{wE1{wq=wbBxHplw|HmET^OEtP`YQqcR^VSwf)3qQc7s$KzCb;% z(7c)hVWP4erW56%|IUfV7~Cgq@uY!wKB&p22XunhCW_jS_qsXVss0_ioc_Xnc~hXN zyd25q$~vyDBZbyg(0W+o@Lc#q`IH{?`ie8^S;K%tu=n?AbDw*G?{6Wt0CJ=xFzx=_ z`=k5D5jLh?uG^f>{!JeB3HctKtySQa>7OCF-g3Kq-gI#T6JFHHu>2@q$Qng7u97+o zV({NA`G34e(1unbeu5IF+)w;IPW-j;GwIeZQsLMCZ3+fJt`>!K$oHQtN*rFu>prxfa4!wrG{8l7+0~M0CW%!#T1$ah65)(2^Xy$ee&_`oVrP zMs42B4l=T+XuCQQ9;$S~*sFF9?uGuC`197W^2X=>VJMPG5nt;^Nb;eP|&MmW4(~O z2j0Dbib)F!rn$M(=B#Rmz_T5duD!HfwliSV`<}pbw@KwKlSGix275aY|2Ok#;siU)m*oMDyB#r z=cyHG+4HQkWoTTV&B)hdO5_r4%y+m5d5u|9b|wE|TxB;9t7OM;&nW!dg?@q#RevpEQKu{S0HOB2usl|C8P3oc8* zjbLSEguzCk6E||QnygRfO*=iA-a`241srvUg4|9gWaecxffjsA+=nNX`w+`}rI718 z8cmo(*0k|HXZ1wYV2JVhHNes{Aa)eBqP%gBRCf}bPmsu*Lvke{8R)+$7#`I{aLRJv zy^xbZhJm$w3OYl;DxMi)WlTzKC-psphIUBIOxUM}Xgf7_tgXcbX!4DR=V_+oPW zPri5wYKMtd`Wc=HGL(3hlKs-}Mc)7qku>GfjvYfo$%lv#;@aZO0FZMEEgw0C&mV!3 zO#%i4!;7A0QA70Fb9U@9$m=<*S52pV+Rb5;AF8_>lOfrp!&IN1GTsgH3A0>$na@o9 zi36kD@I;s07Es*p$!n$dB-FgAdedNGcLw%kY3s{R1=4LAD;tW@fno+}b6!0_Q?{NB z*Abd1n05PLmoHvRwCKdk*6s&rUe-iHG4+|7r-=OS?oFl;TlvIy95}N+2*xfOn+ER@ zNUzJEUk5l6nB)XFHqdPVj>mAc6)>E8J$1!NLJ(P0G-=?uzh)Q(sC=G*$|TVtW#73+ zsOen7TYfSVNwO7T@hVBjs8@AihzgZEMzD(Sj!kL?S@x$Y>QE75ybSQEt z77bKJeM`Ek9}jqwB_i4ST45G51pniVH;vRW@_MyQiE%&6t&V*uh z5&%JaC=@6HT!9CbB1#gbo!q9@2&F0Td1`M(Do~**xAc*2?I!~D_NZ>fV(z5~#}~A> z<(g*MmfA0=LEeZkWb;OHAsk{cLQ2_G13Qf3>`sC-?D&P8Sg#7ltl3yJ=m)m0t>7IG zllQ?!D17hW0O|$vV39+jnybL}gIgG+oTpM)=k~=l<$^Weo*z`GoM$%c z^(7ZW3d z{d7s0Hsa2pClK)A8=n*oy*RvfZ=7_>5F0nZjm_x!gKB?ZYJ^qJ6I1qWCh^7s5ubbn z^{QwV9p{-KD>O)HzuL)OR_ixH-Myu|j|DL`Dc|{r0*?aZ*iR;o-?M{2S-Go^cT@`K zDtrhQN>qmCfcgQLaS7`g&lg#2urRgJaRq8AP|4z~6}cf-!Xv+RRN~+Sm*(Ab zD@BQsIo9L*n79B>$=B7T>q}7?!6Yq0JRLPqao&aSttSfers7%%)>pIpzqcw0oyH#J zZlqfU)uC2mvUNAoDCU7bjB&{i8NK=yU5V>8WSmbxW*YC1u6L0d>M#~0&=zL8Ho7{3 z9zJI9E<1rz4D}8!{BrR&4`x08=bUd+iu}jN|Cbt``y4qc-UQxrfS=HkM0}-%HRC@N zl0Aq3@{NpprbVE55Jr2A(=~(B&O0i~PqqGceoHl{zFKx!1B`I`V2!6F4HoAP(*8Q5qX;&Bcydz(RKwlF0U*`t8e4|X<|{^(ivskdW*hzw)eXhIH8er> zl&`Tl%*?0!XSkRGME=NwQ53v;X+^R99FA+r;3|rs?U}*v4U*Xup7>TZ`U&l`A{sAq z+%>UHD8nGrLcZkXGod7qZ4q(|xA-^6Lnt;xd~DUDj(C=#xk3&ZMLV4{DS*<@S%9FZ z7RlvSt!R9zc?u^$AxEAg1Ly3SetEx`)b5b^3rk;0O`|f6^mp z9d@>a60|X+i6R8ny5-n(eclUwM@yKp0vPh0z*+Ji1W2i3+l~+@y4>z3>j0|TVb-S? zc&kFkM85Q#x`$`NDv}C~ml5CSr&%V$(`IX0@@I4dBLq@E2nCoVi|C5p>z`(Obc*~E z6f3~GIV~7{rXXmr`!RGZTt6^W_a9krH2nAk@n)n9vzzx08wK|_(Fq6SDVGVRo$bD% zu|mli%S~z@RQ_3qxgz2NX*)Y)?IZ-#D;>B-wsRy zY)n(3@$VsY)_l*cigR@8)?$r~PS~BZuUI6H;QeBovg`p=XSTa>Jat&Ve8Gx+N)w}R zscvN6!gx=Q8-62I^+ zH~OE+{FZBYgC%0o81~?yd;c!mR)o)mp~2OPQD6OPC&!`P!F+CxJBHf1As?vIV(l?{ zN)W)H1*dz9f|7H=d96DVAX${g6jXA9*Cx0cxO6nsIha z`g7x@$nnHeKTI%v>zW5f#Xd^E70Ce5hxEq66|wdrcZt9fs!K#a(neav6#G5-;$T-y zZv^iwyNa4z;!;Rtz6-NmBBNUIeKoG-&4~sM(8}u#;H{bG zK0l8#-c)Sa=W5mu-Cjhq!PjVYI+!FQ>M^y1dJ1ECbFY9eytOYHR+T^P-v5#{XHx{C zuYK5Zq2K;G7fOYym}uC4bc0AC$@uRQfEq}Q=JM+QF*dYqo0-l^>5<0f*_LuC|4Kat z-F=!yPZ#3rI6-@yCE5R23lmUh)I2hhGPZ31+9}A`n$$V#-%@T=od3EF*3`o+=1@EU zU-n6eM#_F52bCL|_aQ6Qe@amZ7X3vUp}mHXmVL$K5yAAo{FMa~{1BVm#UM>d^P^`I zfHa#EY5>>3n6*igc!wDoBVl@fxq2ZA{H4GQT3RjUhC^4m(dX$~obS6PmN%gwyDiMW z_yL^+a%2QNS*|~(00HVyPzYTmHi^Bs2b*ny{8p?s=j}ayJAuB40Q@Pp-6leAqSo$1 z@JsTLY`TJ#T&Vh>8-^8~!N!J530WnftNFfV^C!!x zhmJ~WsiKisCLXeVf`4kY6?G5XoT_o$-)>KJpL6c!z&g`fEsLOFOhFy~X><86FQ;%J zliqgoNNdy?yOV9{-~LWw3_=%Q{bv{ca3}Hz#TTte>}mIdRxmz?=uGq-a_xw==rRJ@ z92?a0HWhkrRqULaH-g9(Mmua)(EdX}oXX~=gDbxMV}-q_#xk5rH+4p?LI#(h!Jdst zr;*!yqo=V%3nJ%8$5bD1P`F7c-3;8=f3KJ`g;G z?PE_%a0zeEuQpfnDp$d+hodGKON19i@VaA>4XqDs^t%#6&`ulO${|ef7m-gMzL+(n z-~Zu3^$M01U)3SN>t^^5Nwag%R@6~z8@rSXtOTF@88mb16)Oxdbg!M4YPN(Hh;qL# z%7XomO^0B|1ojgqMdeZUY8S}i-)2QNMR;;Xv7%c%thZHEKba|IRSwgB7)73s%=Z*1PsgIpKo`Bjx1Xr2&oq0dV(0oy#HhLbm8hZ0aGB5UIz!SQ|1?^J8o!#PYpO2a*Ued zV7O$S!wc5>AND77%Pr|3rfc+7!40+!vIko9L=q|o5{ zOs|nI6Z(4HSS zQ+Qi4Twp?v6SSks3DzB=4~`fD1-u8n9MTLxoPMq1$BghUS(hn z^GQ`er(y+;@$5H^Lru2PctkEit(+$#E@9C34gY*YS3n#z`90lWVthsRL+@wZswiB^ zK~Bu(UT~Dxbv=(zG)&9gDhWP_d4`1(;ri_VVYxR z6|!ICx=cFE^Yxm4q}F{KtcT~?t*irmNVfDO*kS58bDAs5h5Hhd=vk*O1G~XNEnQ?^ z7Rt~4HY@loynplQQJ8RVX<01epti=P?Uz5P^KB^%8MN%tjbguwwzrM-+8(k=4n2;% ziKsr@9TI6z$zM15s}xF1##HD{Z9g4K63=Aj%b5)RnVm9^FMk12uuoulUP4CBz5SJp z(58EO_XjNfm7H>;VE@usT^&g;7JrblbQ+Op-y`7*D{l5F50X2?!CK5y^t7{=|BGXK zA5??a3J^8z+vr0`J2#%&B4XDV1-!WHBjS&9fW>-?cI5KJX6sbSR-t-k4PL2o6_x0@ zI0>+dM;Ko_lG54QAI;yI><7!6SWlN9$fJJhV%_}6Z_3Q(LQs@+PpWH(3IG!rLq9nuLMqj*dA z04~XvqBns{@nVp;NB)i4EIBv%wPfiN0=Lev2;qrK5v?@|T&(O+Y6#(a*fyo>oL_6! zmYGi>qcU{EKVtb)gTnXTN&%MT_iCD^sr8s1(fOH-cY@kN8)Z$-3(^U<{n^DWzYQxt z-=}>UGQ)zg9q7a{51{LYA*YYY+Hi_R*}P~V5(DSh(fh?%UYz{37Uglp%ncpr7O|oB zcAJ?c)9U=>R0AAKlsqmm7*y0uhLQ~epk2gvFg_v;Ur{Jo136g3<_7c@6Pzs1YKus>*>^EbbV3hGy)?SBuAW!^@Jex6(f^d-FLxij_dDY1;ZM2Z2qz@f zW|93+4uW4Dgr{_R5Tcj}ajvCDa|PJt7J_9-#r9u_b(hF%jUC==r`ysB9g5Z)s8D|= z$pU1#+`11f=LmhMHXC6;UCy-KwK+8X09@|rfWI>eJ`;LX)7L8hGw?f}**uquxX+uC+psa#(^>$3<~hYy2Cu;9?*=;>v5gFjE% z4;nxtnG2msf&CN9EI5vZWFEj6+GdVc@$;-ByZqXV2ey}jJwldc-z$13Xy1*ML`b_; zXF82mSC5EAOc`k~@0_cy1j(&=evhFw0SHQD&9tF!RbyTFyAN z!e}T(4(A|qrPcQWsrxY>BqN;l-+2%mQ>12qBvQu8L=356Kpvc#LDdgSL3@xHu>6Ru z^+#cxv^RHA8?BrczlwuX5>c!d8xJVggCjZK7^0bkv@U^!NWKsuRLxI|MN6Nyg}c+8 z4GPTcoT=Mm&JxvUC&z_i?T)=`Pla(JC3OP66P^yT?j=MPB7Mtb@?Z9X7|5|JqtLqD zWGKt$cXfi>J+{*@41j;e>eVPAZKat3frBqc%AzzN_1cu00V_+?f4LeBlsN+LQ^9E- z$tVlU*QCJzBvlh_tWVWvz7w3X>p>Q8uc5pC9iAor#}M3s0he|EPTr0prtPCD`S$=F zKXZj(^UWys$Bjt84IJVJ0^YNBt=T4$nL*-lk#ceP+k?_D)MXYQdPaBUemAEQyG$Ql zxo!xGvz?Be?STv1vnM2>7soFN>sMk-P_tc~ZPOd2-4f`-c-D5d>Y*DQ6zFiJcSj?x z#draD=B{DD=u+6bxc)vO5HZ)waPTHy%B3SmW|=?c@q~}YcFKQK{+9sxxAeqJp_q=A zGZZ8X{Sy-g*Jln^8e`Zy=mkE@3Z+h$w534Ofe@Qtk$3{>$2GC`+5B#=98_NVEO zenL}`VBL9&(?lM_EY_vI>oX;L+Tgx3q5V#U2h`y+8M@44-YndYBi80Om=Cw3*Y0P+ zgK7R-AAtF~9ln~N@y$~62QuW&V$dlSSw4)%R6(3a)Rg+=!=inB(x5MhQMzm-ziIIN zOD~f_viU=o4M`;k6bg3U6G$iKT=p2c{x!g*;4_l3X5ks)TJc(jvJqF|Rl2^wF^K@# zQhfvn?ykV?^ghb|!GlqQM&iM)y>a-iyE}iA+s;h-BN^l)D4?dH>KJ2^@tQ4myeHnz+7Yl0Y=>gLI{-vv zCIvnGafWs@*uM$yQ4b=056A(teU|8o{Rn~YLqBCbXu@paCGMF*nThQ^h;hMY2}5$Z z!w(;PPp5Zp{`+P{e0@rE^i20DAi~;l%6%mdpM*S8x@xvJa}9Vo-ZYGCogWe9OEWbA zB3IE<-f}lfe;B=QaQ$ujFF0x9MlH@8{SoJOfHHv3 znXnfV{1=%Y#nicwgV&-q5+7>Ma@*ogH1RE&zkg2HR#%+Lz3LY0=yak`TNQ*3~r! zRDqP!?+;sc-X;oL*S}#X^HaIG)5@L6ohg}d+Yb`8CWwG?!4X<}n5TtEzEy%oOl#dX z99HHnq{zV;eq5wxf607EZIaXx{;9Kn?ADLDalCiXG|D;MBlj`|>yb<@C4fPga3xdaUVQtUDV8-HNkPB&ggZ{MPB3Lr{nw zuSs6$OPx}pfZ)^1#9j zAJ}2fkIbO%UwhZ)FI9%lUg0zFK=iuP{M^Tm?YE!dqEyJ^^ z{@dg2et5fFqrYRooR%U$>*|kB?IJcviJ#6o4ZYYZdP{G~*AX-h3-gr?eMgV1Llm3V zd?-|({brr|4gluac=6Np;qVq!sq<0`Fcs605|ZMofX8RQWT&Svu<_G|mbn`}oZh$Y!d-7c0rkG8bgk}jzS7NPQ7<3AEwgVFfwo5(lgR8?`ixrmGU=K;*FnR}se2hF@@y$3A=N+-!~MwZEL(i$$^XjQ`~K+ET^pPWlLM7M8PJyx`Zoi5 zpN5PuP_SK#nUQ*mR@hOua`hTOF*ohotY0kz(RiXio*O70@N3;Y@mQnJ{>n*sFjYw6 zld{J~KCBO};|w*_q4`To?Ls<@bHssRhD4DN2}E|j*00C7tVOUC?NgB~9?nSymT<8< z8G^g}x6rF^#r*?EPZM!10k_7$&b- zq#DB&3L^}Gj02?)KSbz1IN)YFq`DK)U%cCfppcTpz<3(w9*?@ukAoa6vs2RPbM@pcyt z`O3x(ju)t68~ck}a(WP53^For*47k0E;XvW<`R|yD&Mn92CL)LyENW&!iE@*fA zCpJ$>=;lj@aFbfpn22)X)+u=O@3ATPy>F%T$nZHI2~pk!pivo9N}#y93bCjO#*E2# zy>dty5{ruC^72ee2se0e=?bTlHP|^?3xRG}qYLy^yTuaTGR}23&*B&PZoI&4Dng?k zqHakUn4N}DLD_`%bq%2N`kP2AytEeTaU9gGG>o`{aiM1Mcs)2-8Pwz1WNWe=mux-! zZbgJRav6=Cy(@p*)>g^im@H$w!F|~!C*V3{G?C@XbZkB^HVl%~{i6y%Pv%`}OPq{TOwix#^~wY; zEC)GAe_Qwuf&mbH|*#s7c-UV z)(ch{VA3*uL<$P%y1vnc#*t?uB?Ie-{wTB(X}linj0tmK!3&3PL7rlN4hKrFJ^aEt z>jEbw$*!DW1~9NC9!pXF5k>kftYi@F0YO%d%(jU^f1PICn*%d3|8OY$aOoL-J3`lq z#8$Ob%sp75 z_T28%@0pWr67jq$1&2Z{%v78^i;K zg~}Q!{ewd&KXoSsfn4u}n4^M?LEp0>2`5&GzC9r~0EfUb&pga<)`&0rX!P=Kf5#1a zkG1A|JpbvG&`b@@(9XdP5mj%y%XXDCwJ{cL&TsqNQKu`uWLA!kW>7@OCN|YB09j`g zR*lq8bA{G{H^pKL;8X9B{dhF83||vhvSWoSv=FDHA>EvqGanq5!22O=biTcPj`r{= zTzqIy{f6hgX#DNm%;E_Hx^ewsNavg?RL|$$d+Y%4O}BL|;SUlZqW7eJXF^VBA9U?1 zT3^cG$w&|zHr<&J?Ot?X;}X_YTJpVT+67h}y|!wh?RCyMI6odE-^@F=2o6dOLf#m^ zljM1w%zldUR3YSMB+?XM_oKN2b8=-u{Iv-aDdbKBp|%j@#}q*^Vg`MDxfB1ILuf__ z57&9KD+XeDv+cqKgB9jl|C#3LSru;u1BU7;!bL4?m!CGT=SsGC{Vzs78V&?H&gd0oqnPwL!Y#+|9zg0#e&Y+8KAwpLH zs08_Tsb`F>N&3KwU^*>K;&IVJR=8Ue68f{IzXjFyxvmG3copIIZWS~BB2X!*6Ai=>p5lRY`Htpc=l zTj)=pd79@>t(&bb_Zur=Fy)IMli#Lz)hUQ3ahpsWgxKX@S7&BSBL?^%Hl#Ixt)~^y zT2@$9Q`9RGb4SFhe|r3Yo~bh%c=(*ctv$1R4_8+t?azfBou1SJcG#=TQ_VW;e zUnrGO@Mt_Hoad`=#^Uil2d~|Q!x<8$D)vtY>QPliNj(YDI&1$eFDA9;&-%|tl!Mk3 z`**>&iiLXKTaC_-IfEAG%aN#lkmDl_k*Ah6ehh|K!uGkrZC2G;Ufg?uP_>5VL~@qLCpXn+TNODJCaBIa+lUe{6-?Wi3%d&FUR$x5F*?^Jd#k&? zp|_}5C*Jf>78vswXF19TT(f7r_f}4Xe#4BTIT-{sU*200tjVGtu0}(x^rC1= z#dz=Pw!5udU>f+61s_E|`g7a*cDblo1$XhYJuVuwz8_2N`t}vX+#tcg7C%+22>g$J zPBUfwtV;rqqi0~k70J)ISI=YO%U+PYH;6w_6OgpXxG;@2y=nMK7*NzfU&JuFUroI(7jMnj#7h;oOD8WMg9pQ_ll$uePbx0uj z3MQ2I+)cr&jis50MIGtVDvkVeJktInRPfN-aU^ZgzG4kigBDp}Zba6R83tKreNvq& zj__5)aHvA}Jl9<+h`O`qCwNc6-Nuq>MT~Jqt@0l%1jYe}lzP;;MycuhaQaceMORi?$EPeL;?PBr@^C6qbB@dHtsxVYMH_~5h z5dV&l0?lEBjCCEVe;BqXau!JcJBp8N>~Er1+TooL6v0d%f^pCNCBfKiC`gL>kV}Q9 zW)LL8Y3dl0M!$$bE}nYX@^QdlMNiDPMzdR#9Ju=U4h9BZ5@a#CDXhP1zZ+7C!Ca3L zQ1#VJGm}#|gqez#<=J}`ycRwxNNsNcun6kk#7BPTvF?hGXk=FGk%Z5O_;~oFHw9Rd z(SX(e2z6LhW)j)%fonMmr1Nq29WcBDnmPzup5K9(43O#fQ>G3|Uau*9smIkDXvoFQlMUkXGw9l=`&< zrn>hn7bk(ime{4K=yy9juAhlT!LknkjJrd#~lNAb96!^}>o_T_M=QvRV zYu3ta;&9I?r*~izlhUYuJ$vCX(eH%F6JjjP;J_g@QatL6T=QUCWAjawX`&q@)URhn zuUL@(C9m?A4*oR;43Xo_rjCZXYLb0zyUvjlGV!Po_2UV{y6QyyHBoTwP9&Yr?yUgh=N6ETN78mYTOgaSG4+ z@7Gn6?Dn~xGhVg5nFvIBtNi<^><@j zpUO(;_>!d7Mth*(dp%{!-?=^eEkph02lNYKxR$t$3_*UV0ieG4($!jO2aO(OEvHn= zw0NK8TVkiz*j2kUwJ+!IwY9eA+HcqIW$PE{&O2jnEY zwW&jZZ|%q4Mvdx-g9>tl5%C6ux5J}o>1M7OQeXN?vWoz3-t) zFWs1hjCXhKL0Z!;L}cvaKyvpr>_cbXd*ZivY*^I&8U%)#fKW@?M$_Ph+ z$qkh()pcXC9Z%0VMh|LR`+)d9$Jz}?HHW2Q`UiO-7Hn}vpO^!3QE;5CPRQ`Frgpdm zp=J+X1ygV769FrQSJ0-Hz4AAn2=s8%2mDzwSsjJ&RyQw7bB4Tf&_^r!aHb|5vXT))-{gytjwBBVUsTRB-FViNmL-}3I3j}Rk zF3u0)g88QJ-ILT#=GX-F!y^Cc$gLrg1(zfRby+BesH8)#G(15)cvbt_u-qpOY;VU5 z!{+qbUc&n^sIyjGR<|(w+Gl%n3ddya<}*f$ZtKn_D`Dmhdcb$S1piWj;+M#hX&Y)2 z)i~$yQ_CVAi8^nwA2}p@Ss;jAJQ|UK94H9|&?wIm*}w+2$~udW#_V#f-1kfytt(h# ztmjXUa!rXbW1we;s1#gr5NI`Hzm!V-^ZB2)e;wcoE}O+wovw$^Taf+)*AduW@BrWT zb><@A1zKnT2oOx)+Ye57i*w)h0c|E0Jx30R0bU4S-q}2KTj_qmY+hfnDSZb3VT2OK zFw^>|a>X@)7(tSSBBx#%tId1}q0RYAP$%?q`_5CUy)-2G(E@HqTH=_SE zv=sJw+P$X`!Q$&`;0L(EYRnu_+s3oakkC;r-`$&8rSocbHFl|hEz$UYXa*D#Y+{xw zp@Yi#9X>jZ&BK;%i|+-?NO;h@eg2*`KDt%Ux0_%f8OndqB09N1c?YfFuADFFL`%&n zV?d1P-6Z%`Sey$;A7N2dJoS#Q8hD7d)RHk$uwQXB0kL|4A5AWdvu7y*~*S6I>K>hj@W)yu{mDUUx%y$(I7MqHMAm*rj zs`s+y{mK;9HdY|X?;?r3Y&N*3E*_ZgtOQd0;<)**EQFrol*uE_bILQE&1-K zUp9y8>w`6`AEZe1%}-v$4VPQhZ(4{Q_aUe>r(suf%z{S^t9hvgw<1pd$Pd(gyX9Xp zUK+A(!?iQ_DSc>Np!&7uP1ykVJ5>T^mRyI~+jPK-MQ08{>f?8%Ud*wx2ZtK4H}%+?yMN4N8b^CUaf zH2}?tk=!L=$Zn0xU=#OPmP3*)m4^sm4T1LS4eb+5$*3iQdY8jzdPFfadj5N~|` z#iOKSpIH9^4^DED0gRmSgXxwszX`MpIhn}kRGSwQRo77y`|CyF-DRVTpqtw?%q@lX%z(IEH->@4#Hb#b{! zp>B3_K{RIWIsx?6yxeWZu54lW6D!Y$9Z^KWB_d5!Tri9-XAQ9{FGw0H$-P~Cms&bQ zLD~lu(J|=HPN+Od3C6uC*Xh1Cw(3{7UG`W1c6q!?ikl(})(q{NBZz3&k@sl+Xqgw` z9Z|apgc?j$JnMNcqu|ey%Vkm@mmDGdXs2r_8Rjj7_FS*%yX;Ab&LVA9q%H zm_I21({H;Jyoi+`#o%|_Q|5_ezmx0z=IVCX95b+u=M0>pk6#AN9oh~-6u;t5%~UX^ z-IvKE4CxqG zeF?iYrqR`=a%F;J2B|hBu2iTjzobCx0q&36AyHi$KFiX{2mZ(kbOD#se6tbp5};w- ze#a_SfM`ESA`asH(*5jqEL`-{T_g>4Kf4~q*U~M3AoD$IBk$EDE=y25Sx#a0M)Ym7=!fygkwz5iiIOsi5=Q;Ei8haD0GHH`f zt9F^{V+L2|C4jYp-6oSlT&Q8K3WRpw?6?=QyHh;K8b6~D6a0BJWCF=*Pc5Ap=J4H6 z7>SShAZ<1fA)P}mT8Tjwq&xEmG~Z}vYvw?r+Z6!1Db)|ui&Cas#N+JCyOO6$Ol9=q ze0UVJRCHaiy`R6;^n15%`B`OuZ6mEpI5E)`Bfw5e*nRFTxRSY_3%YM5-UB<1`Xhqm z76%2e;pN1V?~Ne}^J&j8a8gqyYDFZ@0U$LfVbvQPrRULw@I?1^I!$6fAorvbwt0}o zA*c4f82`$?6Gkcc${I$w~>qaQ~-cI1Bg8qH|H|a%kzHE#Tw&DjL zSW|C_)pqD|^Gh7nNE!>I#W4h6I*Bv#^;zz@ZC;a-K*47mj!8;h$v`5)+_cXpqmhbm ztcEeg)a-zX9QkK%wKT9a_?e?rUaU#^8Rx747fmeP{3=Xt&M5M8Tn%|Zb-qb4SVjd4 zyMA7ca*kjdenn#e4)P*N|3?^Te76^)NVUA7ssLh?O451)9d2U=re+Kpw=hre!WI+DPJy3tu_#iLuyGw&2R8tjmu7 ztb^hXUySN-rwUbS7&O2M9I9`#U6y`SG1qDSn=OEuJA^N9?*f%qLu=!me=;o=-b=@)YGn9LKJE2bVC@7&I3JteYp}z+D-WzVX0J z?_+(bbX@e%ZVCpbO6;X(omnc7!8KYH*LNlpOS(Z8UWUh}7P0G=1~%7o=Icngn7cs< zA%CRE!Uy%+oJ(-lq-=oR%gb?p2 zvN^MTW6Om>%G@-=>G~4K9G6#kG3CDd8yD8$)vKrAr+Fs&%guy}+B{xm8a9uex~MHV zDysj{4bff*eUmx|VZfeNss$h$;aQQiR+~1)xr^g2o46?zY}6 zPIjIl#XNc(KwTm@kEP&taF?B-fNSNfO|$N=$I7O@OYqoZRDOcWqHy&P?zmNznwdL! z`7r+cMlOKEH-sN`OW69d`^ZzOkXx^vPWObh#aiEwMD`;S;&YW9?QkxZ@%9~hK@*K3~&?CJCUj*1RSz*AVgQ3Id z46^ylbpmV9#i$JtCQW&68B6%SV-dp$=lXN#(Btn7yk+llA^acej#AR1@@Q=8>A*fX zCILEO7ewAjxCr2JoOqlYigvz*`yA-i%Quf4-KdvT5h3SHjH59=z{HL*(P3Ae&|*c*7Y2v3OeximM(q=IZ~G%{|^90K)Js(+{J-^R=CTDduT^3$f7G& zYA2O*$Ol^axc{{$Q|h7H+HuLC=kE9@rI|{WERTFEkcM&Om{|&eVW2!#+JFP6&j8E} zW*bwYaSwmKDL5f3h1(5QW?}#j0*nNZa4xwch%M4$&6X+MUc98YUN<@+MJq0$>{o?y zTyhYufC0?kqL(NSBRtou)XjCUfE<$k*1Kp8%jZ=4cs4@8-&mK1CFwF(Z9lX9S~=}r zUcyH06z>v>Az2TFzz{4SQ>1-Ay*I6V1c-Rtc4}na9EH907H#U;lwe@_ib)q<*M+!6 zcPZS%cyUq#&XLgmUq&6*~bNYFukAO|Ch3hxW{2kMi6#~NB(g-P}l??jw!GEqr;MiL6n$@ zj&j;*!d&>SNUby74IXLy=Q8WK=bR?*<-zO%=x01E4Ug#v>Q6{dWE=1W@ zn?S0k;Ys#$&Lq}L2k*eu%>?^Cwp-+S)1-{Z=locaTa%k4=o`+?mRcJE=C~=8;dE|L z$F^so!V-9#Ia_s7IgcP(gRV5rv9*|&=l@&#M)4tiPH5OUj!~{S(K_dLy#2pFhjOf> z2`jkJ_baC?@E(ax3`81L5$BP}ge=EinyE+4YZ>G!;VT$Dl|3A73Zsl2Xt3+>8nz{{ z5A-SF($Y7KC2rIuDK(8&oZS6=n+mUVmrs${t>LqdjnYQp zFb|0;_jnNdK$}zj@-g@jbaa#&howfsm`W+uet-BDH$W1Z{Q|1>Z^La9UAI$sY!XjE`c@}if-Be zqFXrHkIi>YFqrgD2(lrucKACkZb29Iv|6UEADE~%YcBGqj!Ue7J^`pmr7EVQ4_prC z>LekARWfpEKjm9uvhoseEprcat_KJ$!ynyzACW2pNwRbozF26WSna!Nn(vq0*D7Gv z<}|Q5Kc^3x)z`$!s9{k`>5EVieNX3pBK>#d2-7_TS&Ubt+89BeU4+2@`A3bo>VF=D zlJ`aV|883XpsZ$NE$vZWQxf?#Zr#6ZUN)4$vcFgfG^no-p2uKezd4{HFGOr;d}vs< zn^Ixpbjjp=`5_Qr2OGkX^ewLGT7jN!gC9A$b^<#ACY{++V_LCSMoXV^>;6ccTpWi= zXgQ)qjOEsZN3~-kZnkYSN0A&kj%Amze$56G9QqIUc$?UM=gUW^^F%qL474Kvh@MUi zTRws-9Mis?gGT85iC+U(K>sXs5eqYIH zyOAcAh=ed;re_UZ=Yi=&2lVEJuV{xUxh;WXn_=?k9CHZ_n);75)AEc=P}eql;!*~g zslN4Xw$L^~uv!{nDX$b6Mat*)-`XR-xr0{!Q22_#;3up2i~pK(^KFsfw$eSQmpg!D?3bg1PDqKMqn&PHjRU|$CHkM zOnUX^lMHdz*s@Ei+ypy&PLqX?y`p&!p@KdK8+}Se!Q~WDFqc5Q0QB<6_m6VJjo9Otx7*63@dR>}QwQN)b!9zq*9L#LlR1k)OZ z#p1ubplgm{(Z~%k#;W#PVjue}A|pm-2XOrca&hrD+G*R zj;+(G(XPL56^)`5)rXgmsfT6V|`djo;WH;7l zC8X^tq0}qKEK=eHVmEB;e$*NJzoCamUs-=$js~%D*OxXWJ;y0-&)k*mGMQ#jUNE5g zDs7)t7xc0jXB-D=4H>FgUupLZYylwQ$X3zxTl93P&7V!C& z5>ZmjNfRsm#d*$qvd(rq5jN}RHnw52lCyGpwHXNzUty3Rw}+s8?Q!tfR%|F*L5^d7 zb89zEA~lmF zA9jTxt9Jo4A-M1_$-8@XVUEZw5({GEY8(sXdCD3KVOjy+q}+TrS-QFaGBOSc+bXK(-}6t z;L#AJG-CmQy~KAvF`^Pe|14?`$y!-k1M*GwDEt-(N%cNpZplnOs{93slUG$=D(RZm zAMd?c_~<9>0o_BlZaqpWkrV~t2ZqR5M!|wa7c6!pB8d6dR-Yd zbpo5|eh8S)R?FGw8CSZo+u|JG!ZFaO!O&Dh2iY8fL#gnYE_F zW#2Q;z$m^bA+z}tgzo|N(t#akUfbikGx&Xr@vd01VC<(eH@8;qy<1pn^nTRKA@5|I zYTyTu&W05#q|T-3WJN0VW8< zH(0+q5{yMKKh;ze1x+|86gXlX! z8L#M$FJkPrmqyKvP+0S!p{VE6acDngz`4piTw%IrqH@-73{2^DX#AA~sc}2IBzy@{ zXX}>r1A`?uP-%ds%@q!LvCXFog4ts2G0hl@JE^VlG5F@-X^2-OUzR=SQ(g*UkKeHf zUTIxVSXBpEKiF3$I&eB>Flf{4qHtSh?2HzE>dK*F+@uDY8m3y$<{wYT#Ocl*W$_Yb5V`XwTss+I0{Nzn}=P4VD96y zrGeUrFY5%RI-6(+Ew4SKD~UF<EdS+z5$+%pxd#uv9VW%pg)-v=*Io3`417aSP?&$Q zXWJDtG881m`$il1^Gbu_IWXv*cbjSoMdzNN_nVwSB?G{`;c*N=OgTC2y=8rN5gLWv z5|(%WPO)kpM%;M)GmHmC2`CzKLvg}K0d21m10+Zj4K=60hMVz)DrSc*nA!Pe%Rpen zhT4xIWR+U0qRAf$uLDcb8o;s9X#~JOBkH60$6Z-f1*aGCfaJl5j!@vZi{RE6ZR8P% z>9hd!&Fy2E^JF$>w<>Q2d-Dd?M`OwPj15p6UYKIJYRGLOU-lB{POpJ(2-BRJ`0GpM zl0DQ=VEg5&y<0-D@g)nSuND~kowRLp+9!t&YCqmjYQ17So$%8l@=A4*l~T?>drsBTq; z-&I&8i(nN-B{6U@5_}fU>}51%Eq4sxqzTrwJ7jY=<;hS+tGXMBulg4jp&pQ=2uGq5 zwe@y^-y>V{N1|bq+z7wp`nI0u;5X|ocKUbp`$~I7;=wOJwrVXVv}Fg_3#qS74&1Q7 zi^_2}DiM)1A8p!wuH?zfn@sR#&J;AB#Z&o0c80FZ&tp!t5pb}vG=-`@4l@gT>0j$s z+4z)+o0+~d*sn89`RegPmHc=1_MF9RSu5=?9yg+D$SQK@xVs{b-{;TU``XAp1cFF>I()tRfI3v_nbHoANr^@S!$X=Nm#06#<3V z(D*M^Go3*MX$`O}A|d4z|GxRQSsg>|6a^v_GSzH;gu4$eyojv)si`DI99bA2^gAoX zszgHF+Pp`LApIG8b>7X#{9U|J#{u=C!KS^vA|v4|F6I5tpY`3@`@AkNd)JOw7n#8_ zvvjv(ua<2ix_`sR!oHuEyu~?8Rg3hxrhl)-co$*dEmKFGZJEoKu@6oj?b!g{{TJ!M zFHF#WN?kofeRc3OyZj8ud${zbbQP=bWNWprDo3dOyR7uQU%7I})cP1$8b$$PVXaEm zlxSI)kILNTK+Yx_*ua{Jb8qJJlPv{U`f~$u-E_q?uI_V6)O`sxW*H{A=Wqu;+zo&{eBA(y2Vl=qs$b4rh>6G zwPr=+nTVSMN5`rm&0DDOG{6g2e8=5J$o0sp=}LeRwYDnqOs|Lb?l3_UINQkVL_vTM z$N1A%XuIn)KHP3PvL{Y|zb_WZ(Lu$YhYXMBcno;gOUSi9+F~w!YnZ!P!$T}}CTnOL zu$<+~@e~Dx^BL9a&q;`P5&8~*+(9NkF`1zFKaJ_$zhA^+-4mKa2b~$4q!3^iGKNYx6Ufq&`KTWMZ zN1hAiLba@b{{HaX2G=m}sEVX}61D62S{YomatqfqC$m9^>R-u}tL<3Vm3-<#7|!}{ z642$ut|s?n91gyOIXr_=1mmD7DJ}W{_e4y2**}cYIah%_;#T)xp66ldyjGXh@!m#0 z4#}0y=X4JJGkM||;_FQxtI9jWkV=8)@}}20I-`D(BJC6LyV}FQKHi*u9JYU3C=9@G z{Qg$cPO=)ocB!GiKvsSgR_t#}*FrZmTT4D&6}y69o1=BhK;RjxQ-o^H5}Aw~Ly2FY z44hOmSo)+pP_6d}DR@?=6s|^e|J{&P+J$gzI(I=d+Xvx-KR&z79trhRw@uw}g9(ca z^`U-&KN%qTRFlfl)6Zy*rS}XAVWA2$x=UqYRoig^`(W%|y~+uEaV>nPFUS5i^Cp2O z;9bZK*jJm6dF~M9U+f7b#g$Wc%%HSz~eN;%CkF+Mr8;>~n1}S@*vUOjSo`d=1gBpGd3JjBSJ|=U*z|WtRStv zM9*z2HxL7EN3*{7egs0PlG{Y0s37xsvVa;%Yo`e8jI$PUUu#P(NXts`cF&o#?9WK)iE^q; zB&Nyth>ccOS@$m2_E}@TnN(q1U0iU5b;d@xB>Kk?(fZI}Yc(<%^;58BF=Q^to>akw zYu}^*N8l@ENF;ziEQN&}(FGwQw*MSg^e_Cpgl|eN=0Ce8_s?e^mhEVJ_a&jktua{K z-PSn`4o4Vd_dkoHceKrA>m4|?5x&H92;!YjN7Jz;`?$LJXxp3 z*72wFvMIbWQCNUY0a05v=iaom*rC^(^nLW)ky-VFEgDoJ=frSXCi+(>D&fEIY4_;& zb08w6K#8B8F{ZekKxe2(mRm6D>>pUbRBq79AlBk}&5z53N4=Cce4yTmg;ZODX!hS0 zM`c3=n8|)QnR-8i16A+<&^My2U8rz&7r3mzt(@;Jjb+M_<@A-q{XqG{b1h_U0MGMM z{Sr6@FH;=DGYy2Z_ipr>nkA0~*jqXtM16h|7d`rH*2VzJEIn(43F1SeDY*WB(@lNq zV9K;9eKY5#)_;I-Rve|vnHFq3F*1x;g<%*&f7w&JzEeb>hCYHGzUDh>eud+39J(Dn*KZJH7hzN7p)Kd(8ABLUvyv$RdGX?iEr3eChZp;^+s3 zw|p=wGl<-8-&l$!SgJzsaYdSBw?lpecCvnJr>&M`QmFM0v>thB#ziW0@T%?_f5U$r zj>?BUT7sIWxh=hZDsxe`9gr;QYEwM?ZM5jfGNv=;KR;~iReKauB1T*ChHmClOe5(J z$uV9;pDsUGQO}M3ROxDlRo<@qED55|x7tzxbL*4_k_&}<=?i6}{X&wl$Eibt|G_K& zE0^P410@@HD3L!VngkNgBu?30!EMU!&|Iz4(2b^pjPil$0^O%wh>6|nG4ZVxyrLVZ z_DWc(-AeQM*KdLB1%a8teaBvSw%${-8qO5j_!(Xc9xEL1r-^!N+8&u(5!7kr`rEfTsCv~LNZ0G&> zq>`&6XISPMjmP+eqGr3?8fXDL-wFq#N;c2nH}WRPt?zc?p~4j7;4VN9Lcd-l21t-F zJsuyjoi+`}-rr=Lw^W%nLdI51ecN^{GZP!IuC2ulFPN)qzLu?xPySr5L}4u_-%3JWAf0b1j;dK_$fKIw#qAqBv z;=NZUz6dv>2x1QUgP810SSJkXO#btwOvW3fh-PY}_EG^uhpsxJH65V&yd zUlxYs{3)IC_$Wh{y=to+s3ZjAOQ?}*`=r_O+_el zC;0i3AOTjWD|&Mf3;%wg`6m(5N)_zWa(037K+4t@L?=$j2Qw6gpYn3Hwu7#AhGfI-$NiygV4e0MgqRnIgSr4C$G)=PBebHE)m5FLlLWcSK4|F;4gLg&7Tk6?itXT6tBpu91CZr-XB`O_i5 z1O?J-te=`*_xr%vR%)pu`e~e-4J@f3#?IF=N^~?prCG`g*nM}hOtv*w3-k?;-m`uA z7W)1^`YTWsR*?qFf$o_mo*r-=1!hq$*x0S6p)(a?c)zH8eYjg`I|fgcx3yDZv3=*Ds53GTF6;4 zWH?HxTLVnU$DPljibp&2w0w68Dms^2q|A7Mb&3QI-8ZsdYl=W)5b_G^x6T0ykafHU zx%rqsf{ptP279`RY8PbS4bSPHa?A&yY8%9MKszK%u>wbywqq*p>qt$1q|=`o^(^4#0SYvckq)5Vk4sHaNLBw?*s1Ii8f5M|N?oo`q&LS_QC!rkwrn zj%Ep6lT5gyytwFp+A2A}Qq+ygJ|ATr;xQNCBnx!28@cdba*RQ^I2OB=)AyD%4yGp4|&8bniGD-Ovm z>Nlo`Y%>cq7@9IoI@rs|TDfo#U(%HE@G|bm`kicnD_n9^v{M@K7+1rb2&VDQpB^Y% z$K(2*o>_|-NLeu$Q^K=lCNq3*9e^BBw*)K__)J9YS4==?+aAd{o3`@~{caZ?Q7SD> z`}KL)mI8(<-m4xcHS}Pd&S9Qg2^J1UGehr&Dzi*e50LjMdH<4|p>lR(3Xx%t8%KWI z$ahd7f)K+&5W>jISt8mcN2{nAv+g-=j1U3@sx=i5+U|&}cg7%JPKg z1g^khl({1F7$H|z8X}%FyItLI!Gv;V8Y$*aE!& zXBM}nrpQaW8u$xL(x?XVv?L}5DuvJxZ&~QdcKDGJ)Kj7HsMKyfd)}xXWlDizqHbS* zP*E93iU;HTBb35y|0H{!)4Bs(_Bl8pXK`gi>@;=8YWDB9VfQ}y=kun>F-mcZx|0|2BlbxZLEMHZITN-8|ya6LCJ0DY;_5YcX951%(rlUsY? zj~^d?FjP+4udNwGeDs?o^w|T6M^QRiVLHEpcr@MF?BzOaS4zmQNIxKW+!Vg?*`Oc5 z>8x_l;#J5o0n8<@mGootaus5nEPK#T(e;?8WNO7#s2X~Wf{INysue*PyoZ5Zg z2qZa6yd%yzBAWf75cn^76P!)1=Ua}R(g^`RabvcC-K?)jzvh`*+F%XR){HQABS;4f%x`2-=AQhS^SnZ9WSCL z0^9%q001avfG~F^R_-%!e!=En&Gm6ZBG{P{B!TaW&C0zotJA>Lcc<9-um9 ztB)Z3zMD8#T4`ZQPUOSI65mTsy1V&PB)6W0`T`5Xot9K2g`UM8-7$=AY!XXVc#v*# z7M|buqLHNb`m4pw$`tddbgkIX&J(_`dTfGC;D}mlOZ+tB8*5p1v+v*Js2?5};)9cV zqO*F_fNSvEo+}aJu^eoc$*E%|f3;5HRlua4Q zyN%)QL#$gvdj_2RP=K=4DCV$7m z@k!;yF2}l!4(_|TkQ+yc-Qx5KF@N8W^AqS7bR}`DLHUnZ84LYU;+U_ zsz}?eh?`hAam|U*E)i@7L6XQYK6q~XQa9eMcIE)g+CzRsX~AW5000009iI990Ie|l z>5g0bo6d-!bDcnNK^)S>fEsj3xRb4JtK0tTd`NR(_y;MLZqZbrzFjbTiLPbXj#Uo4 zUt9pXd$dnlG$;7BIh1DbknVkc)-JQHndY{lR%VNMYlx4qcCO+3T#oH|z;3-18e z7d?P{bw+VFuT+Y3mc-gpngFP2F|(-2esdbLvuD6kx1d#1#(CVN=C}GsM<3U(xE%_p zz+iHwf7g6A#K~vesXAhZxxrX%dC3>X-$f6XeT3h>T>_3O%M4#0rEm6EwXJVqdWT~w zp{M*}4~C-WqU4+BJkV zGyphjm2HaM#U{VSfeT&QSn#2M{d{*@8yz*k5@r!!4q}asz{Yg6&bqX^PU0vBNXJK@ zRJAlK`D}z}L)v;IhjR5RhKJ+803)FG26F%&7xzSBO=8}@zmT`Cu2or+mQ}LMR@V@? zUgS<5%plMn$5_vZ3!VeQh!LThDg0wk=SegI{a=r{Y=NI;dJ$5T=g!u}9Z1L}Gky%7rTmS;i`YEcBvLM$YdMZ$BKc|P+rQTxbJ?x(|9wJ0Yyed|07)$- z7qZ&yg%jrqW+bw;7oMNsyMrUkpS%sG+kDBtfyH9ETKMu_{~pxIZKY_#zl=$!1c>bO z!W`X$m<-$Sgn_)30gu39j!xueeGh<)OP|PS_tBfCM?2|Tfx}%Spa01KRq}K++)wC@ z-M0f@`a=3iTHUMbJm&-gC_442?OFWlev5QV3p3@;x6^th)gRg%TOe#6 zpjisUBDog^VCPw%hdSmQr);6+J8F)zJacSf0EmeGbRA#_ho7tjh5(5NSI&d10TEaE z&~<dGcF5&+;d9p7ZI$VV)q*iuLbdXSJYKs3ZB0{*?#~lj(9M_Sl4@zmKiPzkx zLd}rAig5`V6XpKiPy;EUH`Kj2j%*%ibBDD~`b2Yj>857-W-2c^S5K;POSmmdz7)?w zo;SIa()(XiuS*jT>TV0Ej9ep}e-?{|7qgyW9~sSIE^;iIdn`mXdV zMAL^MVZeT_QW~#hO{H>VHOL*qqI2Rt;%a^?AmU~EF6ylPf^GEed6XAJZzB@a2jJbt zZf*YA)d~sP8tQnj#tPkU^>gGB-Ort1_h>0n=U8=9v`4M%DSFk`fV)AHtfFTy0Ucr- z)Mc!qj`#R;Px?O_D#CdN&b|Q|FYuOQts7-C;#iGzOj>L4EyJ1HX~xdX%^oftV5(=X zTXdfU#=XE2x)k^2q}+T_fn&2y+A$St!|6&~zHGt&7?Dh2ug?>=ivD0`nT@3!9-sM` z^X@1OrlLNf4Us?uh}k@i$7P%V7CMVcdJ0wO7JcrktEv>I`uR<}*V=|0EjKQl<;p+x@D-d65z5=& zx9NOJ7!~e@IX`VRz^{jIS|4rr)Q%zkmtNEj|irlJI zQXe4}6+GP{7r1E=%0Hw29SS$5cP3#x05J^64qTVP{~icsb>~U}1I&m}c|TIcvJBEVLWGV0fvy5#+visB0r; zIDRqTUWugXg^x|eo!$ZawyJSQysC3@?;AQAo3arSNf5s_O%ZWI`bB-r9|_>#-U3XS zl5Ko-Pxzu#)L6VJ(M(2#@FOBoh)vbu@mq8kM=iRRtj~kMn6%3iwZ9)QohRXk*#MhJ z)!#3Xy%${{v{f20KpY6o}rTS$e z{)o*M|FQaUHl7F4ZkWEZN_co-MuAsYitt=3GLw;XxC~U|?ugH?odCk#OQJ(=s^rjp zY$W>oTf0vb%9Wd7KUQH9iJ%=ju>k&h@~&^Ac<8+pPPK*E!}IVg!zMQE z&@Z4=|Dgb{Sy!Qvw!>!G1*+9Z!UQVe^9$ zScXA$aj&63eRO{?pBJnF#r`l(LSeI0fKTnf#&0j*OpxE)TM01=Yv+_nPdR$-A$LL+ zu8a>KCbuBAwQY#Z91%#iV2x#j;XTl1uKR^i6h*Y4qyiHwo^6CcB{a&QU9~BmRDph@ zwO=~tYU}ms(6L$fcvDi1KdjETH#>&*TSfR{aWXQR?38O`s%!Is=41m%Bc@$cUwhJV zppc$njkfb6tO`SETGQ^F!#0MT!6@c@=&eXF-gdD(}@-9pvtco|i^#&thb6s~*hw~`1 z`Oer0_B8k0SQgPh2jPQ6(++nqec;hzbWDcAfi-=1#Hh-K;d>oKCuHKDHyNL54`H}M zEN#}MoMpu3KIsu%nVyXDrdJVHzOIRW8%+4gZLf%Y$pL;P&hcikkM8HlLqFGdacqj; zWn1dC&aa{q(Slplg_j)XP6DeZ!<{5xvSZ^Dsv1YIV)wjk;9K&-uKq8}h8}1(>870R z@`TCI8XHJsK+rb!fRP1T>oyWva12Msqdta{>SH$YL*pn4C3rN9ZcPIA#Yx=R-Dqo8 z`Y7|#qS^V$0jI89nGwdY!`#8#YsH%i|8z`8u)=K4f!hvFz5cTT@JRD&jkUMA7?-2z*!Rj_kIt^`Z>waaCWgTjyJ)~nRS-f86+G2()kay}wc}Y2pQ>T; zaxf9q0Dp*eO*azgwE#v!$k~z_y1G*@7^nSWCQjsPt>R{12S~YlbtAeO1XW)X5bT=Q zH?Mt%XVD#C>VmXW+ADmx67rJJx~&sAZP4zkYU5xmVN|Nny3``tqY#rOmqK&foamkH zX;sr5 z#o=9<)^15PBKx(XVSd6Alemi`Y(g_pYq2XtH)(}MqN8LEt2OhT)&qz2Fd}& zNi>Pky~ zBlDk*LUBJCm&bVXYS=;_SOCTE78>W6F0z6o`fRETk4RGkYN6O-b<|waPw14Gvn|G} zh;9$~@I3Q77Q7RoUptUt4~Y|$ud>jg%j+V)U_Z2F$<%dO6|sv7RcAqkD`co9*Y_8V z@=(j_S#nZTtIPW0$Lv%a9?vl~)2jY!r{&`DK_JD9#J>CDmi&l)GpV#nG z-V4vP-A1ohUn%HDn8Au8Q6opS2h9tNzNeo$qUr>P&Al&4zbXkp4c>m3XPuX+3W_0I z&D!H(*!H!SJ z+fd`=vt*jLa8bfQLNVW+4pyylM4*jBi_U^i)j$#=#pf<}N8fz>z3tCEgoLs1xHlplU&t{?`_n`Ab!#vaUc?piOF(U+o>lZ-9|h7AazosNDVaJP1Q6d|59tUw z!Cke{AD8MzWi0%UJnI-o4xzy}71N1k&#%2&Wi++!%EfNbI;sZ~H-=(>9RE<(_3$r{ zu@{*GiES*AGoY7O$ zU(^&>P_M?$6Evo{JtyxhX@hlzFyWcECL-kM*=kex)W=oQ7zhdf zFFTqd^HBqmyN31fM1AU|#>JYNXPBRT!NAA~>gpMU(2^Elp91X?b4Gj{&|!IR(pc&T`Mp<&7P^aXP&03Z z%c)IxnaL>uht!$etHO_f!tA-GL5ATGjOw?=3cH%v4;W>BOWGG3L(tOX#XQ#x9=ewX z#Yd|0pV5U*5xFHSHi#T>x^v>O*V!LAnA8tc-2rIk*^KHyV?*Gx;)>k+zyGby5iN&p z<=Ct8<+|yI&F)t(A^a|#H`0v~?oYW-rgW^d_qM8a|EqG~H=7M8Aw(1LIW5#{N(kKh z4(hrGGdTsLWE6n>5&j#~7Kv|`Z1Rc9E4pd{;9RavV&$HYZu=hSYY|G36d0=}xHcy5 z3!!m&END`;di>?j1Qr>2>nY?m-jz0Su&eaMXL@bGZh7%s^sy&bPG@Utp&w^ik%A|M z;X_Q3wH_<1MNYr|fgl0get?0p`?8bneiyACzDVakBQ=BWWoS0K#4<#1gZvJk=HD9c zN%uhc8QwTnKPPzWtjctJ&}JM_eX21>T_5d2+9bM^GBudM@u|6tiT+q@v(1)z)_AL)k zA%E?Y7CmCssh+XEE0|Yh-py zG$g6>@Q8j>6MnA6Kc*`Y>xIt&k?{~)ks*29FFU0ExSwSj_do%aXpUN9FG;>{^p~H$ zy%&+FS6T@5((jS(IQujUlz2vF`$rV&~FrIai{QTVb5dzCXTbPx;L&{@|;&vVdoGy`YHnO>lm zdi&7psMR~@TOeu);)6Bo)dQAE#!CIX0ArUYsZH%MPAme&`Z3Fq0`!m*)v${|7bt-! z1vnZs?s2H+K%iEd!H$72M?hJ2vIR@M$8aD%mMH5Ps|@FP6Hmb-kKM*N)_W;GZVYPf z;5F{4i7o+V>=SU=`TL0z=bwC^{0*ij6Gsc!S3lnVJ2fS%jw>nn1)Y0;+@;C3N$pJl7xKG3&R4FeSf%=p8A&{|+vM7|$RDVx75 z1tuT21uNbF78eyEl>Y?4TK7irz3*PfLPvmm6^2GNU3OrHVX8S`tZV?vd$_b)>|WFwo5GfZ!P=uj%G+MV+q(|as*pn^!_THyQo?wD5Xs|t{@!uI2MWYCjaP)N)o3nhVGVTc{*ekRt5C=Io z$|J%or2U)2EUg_jY7>uZSDJ+6R(+^LchXPj!KZ(HIL%n?be{WAa%aUD;gpuD3A>%4 z)a+i=7T?q~lyqBn2*@SA!up}Y4p@3epR4l;f{!@G#A6ilWUDkE$FOa@(>k!t_j>%` z8i59;!*VTl{*^3cjXE0BWOPqQhSn07=H8NNl(0oQUMhTC_HySmf*o zMJ{cm2+AJAn ztr#f2Qg)G!vUi+w!Id0psSx9$^OorX$GNU^_&4yF_Z=r=XG32mAC=OE@swxSW#>Rn zbYE*W;*voo-gfkj2>a;?_C9>&HB#0sdVwJIfGD@kK|zvRVpwec)AmBX)f4d2mnIL1)~GBZaf*oMxp0ZhpoNS-#su+vbJHu7 z56G`AC&$|X^Xn#jWK{1HE&lud(nDpHZ|rahPNE1?pLs_V`3oXJla34D8bNt+FOehI zAclJw>3Lwb`l?Y>P%_K&if)vL5h>zB&Iso*uRExxeTqC#z6MnOxo7xLU1*#m9%+Br zRTBy`mY7|D1Gn>7hiS|P((iVo5j&SHWE}d2=D(1$M{$Nlii$3%cyP%b5@Zqqe7lbEFM6jqvD&Uqp4EQ zIsouyw`eo|x!>v7GdYOIpI6AqPuAFp@aW(4;Ye>-aO+CUP9Zrdszy>gKc_9vuf9)w zbv6t?vs&Es5WUUl-KmblW<_X~Lh1|W@NGABNY;Rf>Sq0_gYY^BiIq{7vVJs8IJ*>y z^NGCKi?{Dla~P3};2o0MU(EW&0?T%mjFiYQNsmNW{q>}3>Ug#+7QV0N5jf12oD|C; zeTtnnKC~s~Pgn4mHvS?aQ8j6V4t$pF-q|u6avRgi#$#vKHHT0g-j^sY9)*b%!CB-C z18RI7F*wc|d1uZxq<8)jlvkj>AxX1A41=Fw1k`oE@hC}_LWO9EThzp-u9_;}FQ3^)Iqhwo466LM2ka%3 z^%ESU(Ne<(1JleF>A4g=P05JPYxV#tu$0s;t&N~+dJFdwWsG=t#IvO%}_oJmB#z~yw;Z%UUylcvNll?(T@8vrr3EYzl{!3U78G>FAK zkbagH22(D#(-EO}Y0*sPK(RhKjmdh}*eg=l0hQhHM`iL-UNLyWA@5Y_IYg8diR;gO zgLRv9n0T=)EHtyo!2Sadgg!(4auY3?IG90NO$vJ>Y(X&U!bZSYEES?MuW%~!yTXTD6u(v(N<8nba zTPkop$zb6>f)+Adxu(LCxOhlNwzHeyKG!VhgG0?KkLy*)S z@V}66%S$`eC|li;t^3#Dt6Tfy-LfKuWjcD|pG02pjo(H*F4@ghpaycB?x~K2Re2CN=csD7(;jX*9x}yqN z{TNqSB6(^E@(CU4n}f8nVtgJ8nz#2@w*kJTQcdKQ@J$(L3Xxk_J!F5RGT+1ShK}$Z zK?e&0v-dgIA~)Y&Rs3i^;CN=cOi+9SnSJKJ{pD1%YJ(Mku!RT$uD8S%nsBRIgs;#m zqmO*@7Q6+e9h}EA!tSf`gqp$6PyC&~q*|;KQzf7`l1VwkII_CE7^|F~4Y2C%aP8N> z<&uke?=EIow}+Vs#pIQn_qYqo41Q#~`zH0iGFm)>>Fjjb`%7x^XBg&rET9CzMbKzOx@)4dmD8j7G`hrIB*;35DG7m0s6v5dQX{s} z9uLXg{iy9)UqMXd3?qUjhbI*z<$9!0npU$rp5D&Lb@PkZh^R*Cc;1CP2y zGWWZ+;5#4y001gU3?m6V$cnh;^Z*b#Ir2&bLIXG{dH9%|m-r#1qohU6gw}g#?g&W{ z<3txL2P#RK_r)B&jbi;;lgN3dh%JZ$gxVch#VZ(zSo_=RdWZ{%h-nwIv*}J9n@Cr& z4=3F}HEFg!i9l4iB77v|3_mjAo};LPbyxM5$Tj(zkvTo)Hh^JvxEsWQE8-f}VO!-z z47(CF^tw4wm_t&H<_f_P1&v6Wk4kQvw@@U(>HC>WoeR)O0Ze5c{g!i_IvnAw31_q8 zUt?QUf5YoPHCkmlnHQf9+w=l?VwR67DH9uh{sGP1bI=Srab4bt8IxAI|JSf_9yunH z>`k^Q=$qv`wWXKtGtLVuc1K3KqC$C9^I9v5=UeYwzY_(Ec}MtbU?(RLz* zMoOq)kBo|u`vHc&F9&MObob501{UkjAJ3T!E8EagU%X*3ge8~Km8yACNOQJrqmNPz zQXR|}xhd*L<25o&rU2@{ElYFy47pe_X{LM!kJ8J~D+N&FcegVvm7(%xRB(5}Q{7RdT+V zU&f{xBBy{)HtmJ%g4@ArHLy^T%Xa@liGk6AxIF0?-DL`iUrMHKY?}7sOV!uS|!O!r5OIg~>F*2y+6sQlgX-8x0;HnL@W1J&?%V zdMML?-TAQhsILJ7GpW1(Dc(Odd(2?7-u5Gym#Xond*T4-zM7My{BX_bEfPhjN~F~` zpqsxP>+a+~cq+-Gc*ofR_!pRD zVqZd)Qu)*uX@XTtD|h+knyxAO3d-`ePKu<8bp|_+P_sCZFa0V;8guU@^7IlgcU|AS zl60k0pzbC3B*k9YKkWNN_PW`Q<VuggJkU>UVOo$tn7O3b3v^U~+FTWcc&t=W};l zFz`sv!j9T1;!*veY%Xu2Q?z4gaiNneG5gxcM~GudKX)A0K)+GIM2DNUhr7;bcRF7_ zo})8ugN8nin;W4N=BRG$|CW+__ZJFEm8|lj^^At;y)}foiUgvK#@fu=4MV-u3H6!S zPLoi><>j+V9a4S3W(3t#4T|==Oh z+a-v(N{U+KcM#O-h62>n6sBudO3BDSx`B)l;P-7M+yjSn2qu3@0a@z@stBYY-fyxu z>3_XE5z`O<$1UiYP3M9DgMu_&i+j=?Kfs@?e0U}ncgEGCpoYn2vKb;unLw#$cnHS) z3FxX1n&Tp_pzybyC0GXyCY-DKg~`~ESrEtPW|hNbP#gnnmzcLu#NUPa*a0h@5U~I+ zWdClpT-4}a1%9<1Yh(qVwCFJAmleChGKY>hYcpA9#JQm^wscy?9dJrPm={MfQ#-$h z7iOtF?g9m|u}Nm(anMO}mnOrk%cD%9Z+`nUWG~Yosa|>NV8@XTU4pK!G(o+p~p9&9RJF*6=elFUbiR2W{pi9!G*6= z3li&}olN+nPjGkLJL67Up0SQjNxw^7a0lgXyPm1e;0TvC5eth1smzQZ$127YiwT-{ z=|}i>jo(Sgwy~8WOYl>s8gr%XbPKiq6gFO(fR_ub?#A+pNQsA&p0nVAJ2cBUPuTWg z^Jc3xmb~H;j=SI^Gn#&yc&53=`Qx9;e<1QCUy*9Yymj@reFQjVi8r-Kr=H8k#yn(@ zE94hQi=Y5R;2XeMM>RknhG-DhNk%F@Z_3L%!7d6&zyK#QL1?{Qtq9<&ao{L1Rt^%I zyUnR=?8A6nvP&@y&ERt62Lkn~+ls>=InoByrfSi&eLAQJK#+2kqN34(8S0u(^k)V) z{`KMZ@2A1INZSAa00GMtfLqJTZf4kBX-ycOR<)}ww}tji&TzWkhL@U(MN!^AJqJKM z=r*!4?0fOnFP4$$i?s#(D-$8OL-y9c8MK*a})$DUaP|sWc2z zfC%UX1bf#OtQ+JX3O&Mkv+F9E@G~VN;tul7yCT<6`QvrxSFzbqv@ES0#mSkmZD0EY zO#}9^m#C8t%n$CQtiz>-;=005?yDQCfPl9UpthZy6zvN$$U3KP{I5@V&}R0%dgSq_ z)BefeI9HS5VBarfAHx%pEfC>`oZ}TbxG~T%Lxd zeQb0bQ+XU9mE}X}l$v&UR(AHIQV|Xt=>jF5dQk%jl#>UyoSCN3YTOX+4!}5Ce!&1M zue>@I>ovavzhg?i1(d(?gaL-%8ei#kd1|?*Ca3d8d_3VNOE(WVk=Tz`U@1pmacZ{< z{FP`8*}PxNxa?P8arNMm?L;M~U;!U6ywf#_N_fLMpl)VgjvOSH1D`QQYYO z0~3cPca3{{G4g_vq#}5ekT;pdDLppp29U0(r0$2pC;>g8LqGvwz{UDG!?Fpl?edJEf1F3G$Zjhb*s&w2Ky~;!LISIT;5m;`6Zl9=L9u0u9o2nr-{(R7?&r zJ1FvY0FTK|R|s8}X-$BEaNn6mRiYWz&;&N72DaTO8A2Y=B}3Vgez)g$`{e>9$RJ27 zv)4Bjt^QT$ZAOB`rXcs16uxB~v1ScVL&QZ6=z}Wv7T?v8MRS|Z?*sOSZ$4Q~IA(kI zWo-+@lKeb-vT#NB*He=bI@T5i@Pw8lJFb*TUHz!sEATs{HPhXIum3HR)QN_D*vFEd z74by&p_{fw*N&OIkjs=r%{y~qr-h$@U;wCwUT!% zM~}tya8k4Bro2S~oPQ*76t#{SrJxVL46G;4stGT*V~4sp79=+T14XLXO`#hU*9{q# zMY$*NQ)EJGRpwoEC{UnP`Zqk0l-tX)bGcF#Wk`^nJIJ2qwqpKBp}8l_79-KJEZZz~~@ZV0%(F(`rEnU_w0=--zC$6$ZHw+D#a6nXRa<82V%=QrYR2--8 zsON%k+m-$GNmgj{_W%IlChei{R^hy5>h$)Ta&C{BkD_K9SRD*gPDVbA>U4lhjz=ye zm(m(cMr$=-x|e|(lVH0`vWtWzC`0~d(wW_vCR@Cj%KK|wg^Lt|{jph z-x8cDqmrc6FPV76TC0xBLZz{~s-qpV_70syyF#t03AY4E@?j9Cp4_`UUV^jDmo5tU z8OU*v#CrSn^CV#;`-@8&Epr@yt(74Co&UyLv@_KUNu!S(GT{j&xAR>Wo%J*Ba^lgZ zh42qqX#1I`inO%?imGrpcw7cMFt7Y;Yu3msmM!qg9gsT5!_r^3#Rer-WiyA`It9Vwua z5*dX_J#+;b&ktfm&o_s82Xz9FnyDr3d>SfqrRU6O{thi~Od%Ot5vtr{l(c|giL9`6 zK7XOc&s2!H_1f9(lX7|tT&=SfAt-@MpKM1K@f3mT^=r9E@X6$XteCMw8u2Q2*mk2T zX9jiD>04WiFN0@{jA})i3#bt!v$s+Y$Ff6$^73MdeAm-v{#(=IhoQ=EYbZiBPnC{Q;g+P#%EG9IHn zRS@=j%p`{Q>K+Ckw7EEHD`7eL9uB|(R%QRJ8`}xj1-TmNqKOZ8R^dezBZZ&~T;CxX zxpoND1Z0JOa733xaDfV@4SAr~({ERhLe9>+%$fjq?iqu3r)_WO&(UMQp-OaGw?)91 z(JYy-%^t|A7%jnF>>)+No~s9jubTlLqf0mL;5~LO0iMQB5Hbb0N;1v)tq}GA00}Z? z*n(sM?M-5n3!X-p>q?_9a|rL-wYt}>Mg+x$@fSYI;{#;1LMi85-k-RK5l*SsWL30? z{^D#M-g@u9!%Xk%-#%-uYVj-PY;;6hHe4Y z{T4lr!WU~?{eGBdjY+YoeOOx*#o~V%?CQv^*T&s!i&vYsZc0o1;xnzaq2{QGqnbe} z+d8DBECkMQ>#-^>b73de^ zhjz~Znm5*^0F!eUhs4OKeJ7>!;RI@NxuksT!uz*+ybClJQ|=tTqiaBrqs@>%Jy57) zc}u%P&=h56l(LEzTnobv;&}fB#oxcg7MDs#H;y{f5Ta-FYVMMiavkCve_liYh+QQO zmrcsqcmDFw67$VcF|8&bcJA|eaC-1oJ`InQ9<&3-Do&^dwC=}3I{yufT(dUt(PCwp zzGJcq7SnRN0^yl}ure1mlw z`(7Mt@p3^|Oah(zQHe23XU<7Ax90AS`n`6i%qLLmbwb1iVFWV-ZT0wkSS?!PP()qK z*2B+xA(SI!>f0YNJKP``0P;OXz1X{orkEE`>E@#UL*EWc*B?z&sPn@$(3}C}IdID7 z;PFQ)BOZF(JAmRC)2t~@BBs)@wASwp5{oi86JI^Ipa2VpewmIo9M6qgw~3zb3GjeM zp@6O$r_bRMxaosPbi(B8G93n~PRno3h56xMWnFX+>7^8*!*1)bYp2FYxw=;cz3-QN za1RJxq!ky~$Q70c`VnS(1__V2iO%C~8~@LYYh^7h)N%LL)1v+Z6(`c`rV=)hL`3_A znkkNl&H&Pn^up})RO~4=0S+igvaV=mU&fyl5MfQ#%J28^0A|E5-sGR=2!2tPdMz%KD)WEY!VVF~cZ5YgY<+RwO$qn>jRu zj9Zp(6>HD*=Z;0ywb;vs_=wJi)oC5r{#MKEs+N5s>QckxMQbS)&?mv)sOhr+Lxi;3 zx%qcE>a>K`z0F6|JUenLf}E{DGHYMsu@Ge;MF^>q%)CmIm*H}frD5U8B0rxx0ZoN0 zKsq38KPpMzBv&a%f!La)68`eLSZ>W$S8lW(F^oZi9iu__{w%EEPucX1+a0eFuy`|fEsO0Av$9~4-}{MT zO&g=(_>fy~@ES%bAV7k#7=ks0nCeU+o4yoJM(FnrS$@U5pNrkn1e(iR+RP~9=Z##9 zO(6%y!vKV#ZM+ltY7ZfH$uPJlE^YxHdGIi-&Y%iF-ggpp2YRrPii`Z%>HU)?=4q|= zTmyIP(NyVmjUxa=NLBBpE794_4$E241rmsM0xc|aM{hH@&2rTEDwkSxYJ zT`-|Zs|fW3QA7U&Iu2`WFtYZk?H7?`I|z56ov}tZru4VtYrb9QPU~(`+-|=io}&-V zeHc*d{t)CpmD^-efOMSzC2j9@F!@MVK#Qq8q}QQ7&Q75==A0NfDtEf<|$|kwWqGG#haR=9L-q+4ciW(PcOVS##BKwhl43 z*Dc9MICAtQbzKin1=b_eMx^l{qwWO2oLLn$1aV*^z&Izh zRn!hML>WxL%vN6M`?f#7_meMS2ru<{alpup`Gj%S+`qQ%P*ha`<0DW?+f2LROs7I% z)8q7&NZAP9E&hQ#df-%8T<}v4u~4EXTWYmv4~at})uPMH?Z}GBsRSaB2djkvf(HeJkJbu2J3Vri^w5fu(iT8?2b96E_|{`nb8BK&#R?Dw33k z2{Bs07(V@}ax`YF?9J(K7tXdWHgopEmgjNWFX6w-4S+`9o@ncy57-8H^bUw2p;>xa59<-}l9 z5UkAH#Px7_RjKPA3Dw6fR1sBHeQ% zA8uXgax4V4h`4%Bt%qUw({2gkQ~%3|X>I+vYQlMFn>sSI@`B_f8#h2`p3~l2;f8Ct+t*2-omnp3EcJPR?wa!;m8oVo04%O)aViQ zH|TRoaiinLqY}<=M4q>%Rf_+n4Fz8w1i{A1;=ID@GU@8&RDV2|4*bi#wDW%>qU%M> z;_a>)S!gsM`fb2-f%|z&15`e0n<%?_3)_ZPfzqI1(SHj@?NO9*-%{@~CUlRWZ~LA^ zO@bF$AA85vf}0|Hg3A5_1v^~oP#YH6VH!c}@p1JzfbP5f7+0_QYZ+E35Eke)M_@?) zMcG+<$Ue>5t{}4LV4;X&<5$)*blE-cV;6fXCl{5)1K^!}^&x-z%Hu!=)njb4rtOrF zO{F&Si1W7Y{O4Z_IisbtsCy5?`9iz`V^?jW!A5%-a-+ijbZhuDe;)N|* zFP7Veh%*MY;HRSMvgQ;T-x5=0;nqH<*b?i#6PFEKcfQSdL$VTCk_Red-n8HiNq;9F zmesJ){Wc=g?%!rnZwuiv9E=D1TV+G5m@{8F^-M``PZ24YKBhIvLzW+GUgR+OEV|^c zk2kCAu#`~|5dv8Zr{BQ)R@6$2FdC_Bfr}4Q+3dq@-7&h`m=2?{FZ+p`B=i@&c#OXu zvhAjM9A4+*rO}JXNO5Hn&Z9ss=l-T0dMWJ?4i+JLp8g5W-U0n&g7ioz8Q<-jHzO5j zSW>rwHnC(@Ef6b3FNzF{(Y-osA00-8VG$i41xt=5Z9g(b4Ci)((LOS5 z_@>80SGs?H68Qi>5cXO$+kN46PV%BXK6m?`DKlO9zDEZ6d%BN{VUBNn<>07rzDa3b zv=j&y?akNthrkiT%I!~44v)r zxOy7r05_Pi3*|{9!gN**X?&w-zex)+xU;aHX@Ya7W5xr05}lj({lmF4rSJ)0egP1m zd_^E)FeCtnwMU&5DY$bsyBU)NlOOMi@KN2#(mltzL-AXH1^xHfJ|g#0z!&?e!vDDLY`=mT6adOq6h|tPLT*RF?@}vk zeOWbuH6j8}r6+V&Ug|?9*U24O^$?%!YE}}`WZgN`!knCq z1VnP4L_2GXM7~|2T}?G(7aC6jY*o2!^qx?%x_186ABlUL2}fn~OFV1@sqx7&n!w)R%})IPIi%JLGfX%DqiH*_u3 zqLzT`-ReU8w7wR5u+J3j_3K_~9I_(_j(!|1->2&ba$qT0yilk^P7_OWJ@UXN&y zT?LYTtzj8km!rohOICFg+BoVC`iY)J;K%hB9D6PbVtIbzoLLs{zWF@8Y!1OkJ!~@| zaGD*IxDW4nM;!>j42ml5T9qa)x*Fi9Oa0$K+4%skUX{~dvB$Xr=bOauo99l2oZ(I% zkoSjNR_q-!+3j}aY-qk`WPqUBi>A2%M56UqMGo$cNS60u?EIv_kJo6t#0w!_B3592 zw$@e|n?$+8=c-yet$Xx*1Z-u1H3uki_PES7HrhmecH>s>J_l0m%hD=kTcEfy1UXL5{sM=Fu?@s(jqTt=FPVMZ(n#5;b zjmOn?YY_PgwjrtE!-yx^uH2cSwJ!>)lbFu$k;>^IDUGfe!cI#)EFOq?*SZY}9F*?4R2F1=B7L;>EX7 zllz<6HBbOU+N2~9jeNuRPs8RB?r~wnff}Vi*d@xHiAz8ZB8&5aw>Dh2BY|iosCfMujIV-pfP{I z^ZVy>P8KXYrJIbO50W6vT*mZ9MdbI5%yNUtt^LcxQjz z9e0*c`f)Z*Q8)B=)L7xpXc19q?=E?xiL*q+7!)UXvI?3*iBwNQRj;lq}wzBfRqhd-@Rk3gjCFX#6EAB9HhRD%2*A0 zeTng_!Bm}2P)|C_<9q!2Im;wP!wH7Ed_zJOBqv9$+w_V$5*r<3q`)Y9oxwLb(PB~` z$yOi=6(dqfmrxObP};QO2y@iXJ_peq7HToBe^WqY{t+gcG-N~WB( zhI~&Qj6iBO*2(sE^ZNo%s@X~VpE=B3t6cFG*7!V!9?u+Ha^QU$$H5i}Y)L4)lGC?i z4VyI3*=r2^PTRb}IKA&A?_%-~Q&*#~svEd~Y_t7?33JeJ1=>{i=gAH^_uDDm^K{HH z9yu>^ZiDI7_6Mx;(A6wfeWXr|FO*%1HzMb*HYJr$&sZ`BVqEZ2xLR`Vq9AAh{4MT! zYaQK6hw9j&uv1Fwva=_fxvOJC`=t)-8D)?6pUCt$Z#o%6LxHxv)|&k=n^@Mib_}Aw z$QVRNc>3E2EG))eW3=R?IME5vHZw1evU5-E89D&Jmpsu8@1e#eQW&>?Qca5>LHTW0 zcF_>KBzkthO@HLu959Lbv!RItTcfFk(FyV0)V*i&!P5M|g|##cGn~l#pMcMf*KiI* z-}?&ieanW-O}}m2!$22RFWWfcvBHpNEzZFOsHHM&{~iNZ<>R@jDHSmz*QGpeB${nA zz>>pDpmb6-)X&hu?uSXdttt}qz-9h!Z!df}9qYlt2w05Iwpm@q`iC zIvPt@EuW6~O$9`%BMRAKrqy~p_v*rnustE%4}Z{Cqxlt8R*YAWasqiu9`Z)y1!a&i=!r|` zmgBRZe&z;n$u`_G@LJuaq6X{;W;7J;xVdfp5B>E@=}7ia9Q45i!st~=fo(DwjkL(t zn}G~h@i*5s9Ty1RW{0(rRVqzN2SQ0da=J=5uUOJoP%Go}KyDiF0avvBK!=v1Ik&Rv z%RTu#A@(i7^7sEQ35Zjb@r(}1DhJJWa2FtrfIAdu50N{ifPgke1G8=MX~-)7aZR*D z#sU!$npW<1fq--}I7YzALH!Pp@}qA7W>C+J88adc>F*~2Juc+e#t%;{Q=_V2Y)00B zq<-&JDcy1{#|ru{8D3ag6eyleT1X{(b_Ig!T^s~a$W7aTU@?!hCikrq{YmM6IL}&2 z(gHSdx3cPVDpO-QD=3nwi()Wrk--c$su6u7Wd1_E)#;^yV2IW7jq; zkA&e)GDPCZH}WVAz=|}U%eg@=-e1YplW_a8+8neBzE-2pVE|+RXFqu-US#~0(1lFj zAh6A*zxp#75W~7$ik&EJaWO{|pr90TZpMyn;yxfH_Dmvg=_dp8`gyw~*W#>^VCIpw zqt^gfu9;>k+usjmkLhlFj-NU;>E5M$QhmXkAA^i4hU1=#rFQQZ6T`;Cfn_N-0O5$! z9v`@ZJ>?oY&hE@_%;!=8MN(^~Cwu<1v{_JzEzTtywDx)F9Ot6jF!kRoQlj7tJeu`S z({-P&Zvz+BghzI>5mpq*5+7s$#Hc#3z@=UP+c%NHge)hcb1t8wRFMn85e`^O#p7yyhPc zrk{VV?@;ZyasVEvy&&sL+YXv`Rc2*}uz!0?<{s-huGJ-Q9vT)Y`%&}CDtLjzhDv+| zt7I625YyhCJ!BGF6ZuV=>AOSaad!}-me>eqNf)f&0Jj-VCv?bBY_HiCLMDDIM> zgD&7kkSz0@=)wf7BC3D)9Q$X&ywKyD(2g$EV{g2-P834pPp(dmm~}fj$|}O$yHdYh z+Rt6btfWzd!R=D~ub4S(C4c!IgR!d=)_lkHzN3r-VlfK02$y$T)WDn4Un%pDvzC!j zFtVy){be_NJT8mtM%Ons07`i7X;=OMI!ZNsUbMbN_vE`pohH4Ao4agiRRm8UHjt4! zPbRm(!_r^gIr+fqXNtq7btu}iAY$e+zxyxDOP}rW@NcpYmj65v?GH2trk%v8u9JId z(5d+y|CN@nX8#1-ZQZUj$+#*eX!7@V1Ipe`T*&;rkOV|2-ujsgu5^~sK`qgdFwTFM zhd}|+6;2PKx|ZkL#c0Gw{bNJ)XXV+<(OqtUHYEy0h?nG>wjLBM^1S+xk3RdorwHt0 z3~j(ulUJ?MhGhB95`*)ar%|kwG1Pv$CHCCdtloHjKLPB&AIYIUxN|nh2{dG0DR|*GHUR!W$pd%WHAb66Bbe}=Wfh@=<^0CGT8CiF7`U;HR-6bb!D_UB>&fXnDq zG9IA6e1dOj0SHf@n&>>Aj1fVVq}!{h`X z-myvIK}l^F_B7lzf4Z|(%%(ABP=D+hZD{4|_DJFIDD3wI+^}F}VOapxfz*cil^2V? zgR#w9cjyzQB@BN`mLSm$QO(Tuoe-7ar4mfoxu2&CXjnB};~cg$hZTo|%aMz7M8{=F zB!W#6KD`(;DBlDSR?RW)4SLvg6^f6&5Mp!_`7*+X|Ky<1Vv0`Cl1| z1-|q7x*Y|`8otMOvM8Fa&M&hC&+($uc8;jqJZlm~fm{0DzSIerMHZ6#BoG(>F%Whg z@ktQaicIioQ)5)TTN1HTEZq0OlE>Haip(ORLaa1#yC53e_zS`Lq8dq9r@>>*0c$aj3Ttoq!1h@)icrqGRmf(94^#QzjLmp>v-qQbmC3*+dYig=! zu&yxLPmsjvuEr#!4K1V7laC6+)bV`v5}%-bWH_|ppk~; zeJVapPH=<%GYcwR(Z{$9Tp&tPCWKeGwI2B9NFEaL9Oq;;aiGGbhTWbs=~XfgbxX-2 zRGWl4kolaz1IjzCuKcV7_B`3nxT0vu$z320h_}gzOA5Q{|C(69jW=3R7QBSdWc@KOOJyK{&K*mQ(ZJV^&P!RDQEEpI<-vvdg+C#VFGz3Xs4!uyl$h!yn#WHCooJ zw+ruvZ)zii*RzpN*EI|(vT1<=VcgM1zR8W7a{>9ZZeVt03EF6yti_jd1<2%oXyX)< z2b|uDb1t+QXmzlY&^t}o%FBbHU|%$zq(R9PsaU-64||PhG32M zlo@%s$8n3FLl90VNe8735m%@J#ZK6CnEPTGA-#B2t$;mxruQTgnPUsLyq=v8dRXXr z(F0qm-QcXLfz;({>PbZzx-kYR!o7g6o!` zPK~~1%E+W_YR>@Xf^ovK=#*wyAe>d_?4EAu;H_ialV?nkZ9EU@KH&bU>kDV^ogU%* z7ROqO@@*7ewSw?J-Hx!L@Qi3|{i<$`ZmX#(f^Xyr=00mFH&81&CN?+!OZ^fVsOKcW zP3{%f8vI2MDj48JSMM~hZBx}-bZ!#??Z&}eJ`CY7o!V(bme!d2Won4WvR6T3`Ufwd z;2pE=qTwK8-K0*4BrAvp7aOy9r16!S4HO>VaknWt#xTf1yZzP z64Ri`Or&V-rY~e`se`kUVH`@)zk?EJUAl`y?%!1)x#W#w-&E2N*Q^>0CY$F4Gdux8 zkD)DE`XIy|L(*B~SsfKfAkB4^cYc3ziG?|67Qeg3CP@F*H^{N81RZ$+X}VFBg^=}; zAG;^Y`G$31M+JD!=gnl8QEh)atp@+RTPOet;P}P;Kao?k+J5Ol`DQS%FY9Mapa=84 zbvWFxa(3)T}lYs6fxb`4m$~_kAV-xasI`s1vi2S<3gdmsxp;~ zHyR%P*AH+y6%X@7dUNTnguhF}HlVuRWq2HVFxFmhO6dmWFjTf^aoc zPI%K{JW1LYMxDGpo_l)d1_w2<6Ig=y|ijp<m~Z9tbkE#pbw5=5*>Ct{$)MD3H$(WM=_&!cLtZ(Ly$BH_}kzv-@O@0jQ zF;v+DEM+wp2Rgfa@vI#p4@{#^4<|iKPPDsp3?mPxzyILmV1>8 zJF=M$jWmN3p)KPx_YBii<}y8FPE^sd@t~cRhM*50qetMbfN{hImV1K$GF7ilC)DIL ziJngQFyG4ue%f;Ez_}$pu@X*7|IRH(B$RG4=@yf#mSo|UP`+ZzQJ;C#3&@(n;7+8< zQ+q_UFW&xcO8ZPRDx!)9lOv|}FAw8U6sjW&ECu@FoY^>+r6eW*e!Ke~#+-Ws6RZ!& z`1e#4$+eY2Q=#-e;BW)$IJbQ`@*d6^X3Yg;JBV+ep_=J)?In(_5JNYC30y$4uSE=I zo46M&s+NbZcDAV|7?N-R z!o^Kn_`}E>_TPJaEsx~l#paxsSd4fFeXWf3bzmbSdtI`mmG5uJHO*B-g4oLBh?|0;kro?Z_SS`^ywF>xsi*(} z$s)SN&}TLR!iZBi4tV}9fg!-b2yrH0(g(qZTI>{c^#>ddk${3+8~eLPQ3PNTW@esn ztsd9WL}W)0m>g_+&gjbIcVgnVOz5{?u^$Wx{Gr z$^_tO;gWEKW8|MC@H)QFUuq=K9>c>l9s)c+kgb$!UP#xIeNI~E&Z2vf2GlfmoNvug`@8>>rCjp-8hWf%F29 z@B}ELh*E%A;v868LU;^P$;H1q#lHda5Ew@t&e-C3v@SJP7vRH!y^_iq&H7sFB1p8T zq*Wc`=}jincg=?2iOppwrgS9)R%;L#zaU3};Jv=CQ2fZH0ao(DQCx49NBOC8PP;g6 z{}H%l$mM>()T^ffl>i|ouCVrpse4-P&22bUtUC3i2jZ~iX+`a-7N zdycl`7FH?4$m&B`l{&tSUXPQP2-NU2UNdrm$041RyO4ZZJ1=EmileTrMq+pa7@h@B zpAcig=S~ui17;9vjE-4#AjLlemjBKtemniQSl+Q28FRnNadQ{QmPBZI@tv~p7XWfVjlVSiC8n=3>`P7Ps)@4o64)B6E-`ZfZ$Plq z;E=<)i=Pj(w(YOT=16@0_XFMKaC=~EdmW>WB}rum;J*gF{0f!>}?nZ(quoiHUPpFLS^}ZyJUzmQQ7W`%0@xNrS_GHTR=@)lOWKDNY_pDMf06wXja`nx-a)zyJa56T7j zxW3KTclbVeQd*MGo{`BBZ=s%*{usE8%gr*yQ;m-X?sNv_f1FnfmxAN5&JMTJ)s1YE zGt$Dmlabn;PLp^`a4U&I0@XL$U?Wu`SWkS+zk$ed6Di2Z#NAW-e6ZoUzB|sw8-ZH@ zxMb!Z72*JY{h-y1Ij4Z5ed`Pr)PO)FTRe2js)R9?U_9RmsODg0gG)Ovvld&RA%uGT zJ^a6_Yvx2D5HBB$!}?ElA<3BOM(_bld4lM{_v?_ayfO7pc?PDcb{-z`QZUmHY$;NZ z#Zy}Q?Sca2{4@_5+o+@a>>d_B(+$@q^VLaK0KW^kMF9mY<3eioynq6SpiS5O-96=! zTAK`1xqWUimoPJA=|&X8?zXaA!Q`xo8>jZJ#~N2FwCv?OU?D z;^b_wz#lg;+6tz26E>!gXNefp$G>51SQ4rKIV(4T-$=abFCp?ux>3MQ9VZu<9=p_{ zVasskF&G{dCr=W*tF;o5Q+-3l{U0MLgW*sZ^SzZ*DI zk|33N9uFh{&>dJn9lX+xA(p|LxB$wh!+WU7#HZsYKgQ33WE*yR+KA;WT$VaphPu!O zL(10eXQmyB4VH#@jMtsr{&7xy0eV;fp+xUfs(r#uAhz`&ooB$6ij1LL&ImggY^kvw zJME(aUvjjd$jKX*uYFD$XvTVT#^8<1WjG-n*nxrK(}@xs4eC5kU2ZU5DRfR`mctx3 z{NKdH4^P1mBkW>0GV9am0i`*IFY~Mb02rNE!ClwmUh7^Js^LwzBADb>J`*+1{1c^` zWAuZix0l-0pC|XYlgQx=eX~#kM$R=x%DnjB#PQ=MlB-z)KLJqV&jCl6Fl67{Y9t(5 z>E%0K;kP^rne{mJ^Y{aXMv9@$W(qYi0XfWMEw45xR|Ol1=z6PY#BzdlnHDCDwkJ@j z3eJcoSGuqm!lEAucL^u&IGoHIBfd1c4v#^ARP~`k9%gk&=p~8<0D89)qGeR|oAeG% zr%ussWZ=WLy{Q@VUB`Pp3XojO7)KfACh|LtUPi0~}=Lo@=ylH%=X z3F#-GVJa8>>n}k2{Fka2W>e^f8NyxcVfpIQ2?I4^{YSTQy{E_!SB$Qfb|={<8)d^J zJ_vVAXcdyfaCBipy2cuwF91p{m(qz@2?Z9!GAWfHtT~aYBatusa7U*8X-rEh0QIKy zop*IoJ7pxfcwgi6yH))GvU@lxfNtuc17jRD{EAlZ@yFX+Xj^8Z9IpFlLQ>&q#i#E6 zzeXVeCK59fY|Nq#zW#v!nb~5Psj5kqN#|2MKU?PE>eo++uM!`rx_wKKY;br%7gGf` zVrg}s0biQ%|M7+bL{#2M=lG^wxs+4Tw4ON(`!!`8djy0cQAl~2=$zl4JDz^p?#UFL zWfn}QIDn?Tgl`a={kQFCj7vO7_Yo0azyJXxa*j$6ya>FKIHw7OSP(q%@+4tql=$28 z{A<#T)%#x}GS)6jBqVLAwa^E;ar}k>z?=NdV=O?^tF#6LLn8nK0p)9Tx71&$UDO4j z5-3UX=INO!%jyL+y=?3aO@?{mKv60Pt5Wjnw+9}+PZ`#GOc%ylRQ#zxrfMnv%!13K z)w98x3LoEpmsB)N!o+nKN08h> z1xHTt|HyGo&rv%TfCXOOH~YdeC&N<>6-Gw?AIz-JT%-|RMgYPS3*K4@7gBT+b1Dz}(3$}`_r z=)et;YjAd)wI!emFr6X|1vWmz;?}Che@x@R3#erl7Qsvp2RIPh_rRxr?XTx5lFL4I zXAvxQT(#(kujN41xQ;r9FRB8*h~?kL(L2xVtC{WjqfyOtT}ReOr6jRgJ z1DlBS3MQK>c{D;#^uZmt@5U~NP*qwik+;DjmBL@cD6?3^N;h z{dNVktr=@z{^fQxV8!J})042#;pC8j+`H716nI$qyHe5mEgpVT1p2R#qB5-`19qaEH{zr;A8Mpet zG2552^!C*N7U@w^P!Y!Y1ry~f%7i4y6}pVwN%7r&3|aUxUadd}qBIr4ti9Q^Su&t? zm13;CxejBf#BeShP8fy3b)r+gC1Qj z{CcfyY(O(j$MrtGuc6p{^9tY-!H&~;<;sFeA^&Tjf)kPwg$LlUtCno9o{(`<_Vp0a zpHT9Ceyv-Bv|vFyeGTpEl&2VeX#*6AXl{*T`ztq*!vh+Qp` z{*v-Y;OiE!7o6ht5u1c=&!+9zkJ($-3gE`vGw~9HDYcR-|pt9 zkjYk1nne}$9aFwarl&$fK-o@PiG;9bc2H6-OC!Sj)f6CnWw^MRRQkW!W`mwrHkB`#Cd`$zC_0CL=YUcF(QR{dq!oT6Z8*n9*b3VKT&Tl}-Nw{1b-&3+wQX4XfP9^D z4_O~4{2?^J*a>Z+pk(|zv^Pv)JHOIwP%ldX8@qam#}0`y5oiE7m+rRbb-EgSjq|53 zB7qbY=^2)0?V@IaR2G+i+mWdy=dW%vXRQ{w&|ua)G|!b04_TZ8`>_EBIBa6&Wwu$5 zVzzW{qo85rbRuvGMvsC|Cb*n&Y)`Qtf>{3CqF5YNAP5R$_~J#=KUR>0=5*fQ@DtP| zf6$tv+-mwz{G{Lq?12dXbUqe-tc60MG-d=>T(8vy{n6`MTPCkLuH$V>tn0JCJFHGF zSYAo`7CpA&d)&vJTJKm^e_LnTNqtd~)pV#407 zPwT!jEe6M6jNzo>{!XJMr8E822+eE$2C6SuWpP{oQ5~3IaIn3c-u8nuNXkNBqHjL- z5b?2a`yS+642{;XYs&l)2ZT<&Q3kXmTR=;a7X&2s6pHjU*!ZSd)Kg-8rMiVRn43E< z)eAe1etZ<_rksaY2Dt@|`wKDB^L$u>AUg~KkA*;WOw1UQ@$}c*BTBYHkai4DHIOgw zD-t~gbkqD%G(Oas42q`2_n`K5izt$IVL%KYGj!O{!M??1aX#cZ9H`q zxf+nttvhTQs;%k_T9kSmY$st?SqEycS&)n)Wirf`#yx`WrIdW&6G;I3YaXY}kZf|| zW`!s{+lsW%5UBrY8Xgmt`ptCJv7uh&ppd{@7kl~D#1B(efY$dqN~^>|&kWzRg;@Ko z3s{QEPl_abnQO!U_r`{l$8FtfL>iyA)Y%*~AV2PNj%NvA3&!xVd~nzkN1B0*TJ)(b zK=T?r8rTBXG*q!@1;ZV?`IQ$kOy+cM*Zsh*L|#Zw+7%a{)i<54m$d$sAG_fKL9Y)P zN0Z)Upx=wntcE^Cw&Y7S!NC4;H+IO#;KSt$dT)j+_<8kc2Fr3mgJ*C9BmJz_m?M?d znPS>FSrIfH+SdwZXZzA(UcL8bV@tJ9RrSorK z9x?FzKLVMsLYt{9;u`&CcS9XWJsLHF4}tmlKN)vC*>+-j4)tAtfOQ$9sn0KciGn5X z@xh)r)lZ0Gw8;IqR1Ko}tp0s%NUa;7uZ!Gjy*!dB($Z_W>&*wd&Qd^?pCI|`$2iJQ=8T>{RuPL@7are!FM*GOL5KB;S z2S}%(r6sbm7anK{#=P|KMAhY0epL|c;&LA+mX4KU&<;oRE2^WDA$)bgLJLHp8cx#T z!!-aDa5q(U^~tip{g{t!no&`XCKUhm2r9rmOa}C?dI-ZoI~VUNyd%z6?_^sC=f)b> zh;wA<#6eS20|xX@5JW7;lzSt8Lyu+NG=|o)s>+`MiSWW5magUz_>3YTi_UCpDzFdp zCLYy%Ky?~ejN*ssK_s=3uEsE>FHHs{a0^g3zZW3yQ55R|-A}Di?JLlBTZAbBKyrdp z2f7RumSdoHQ>%E?L=6mxGqC z%^!QDseK#`7?g=A%{X1PgJ@ay>>kn-86~sJ$voHS8Ll{Z?63wrwGg$~K3dtq)Cbw( z97Lk!8i08fA1$b=R&~fnSO&y z_XJqkWFeg{Pp4ox-eK*Q;{$EIz&#b2@d+IvuM_75C=|9$kD$#orG>KH2^LoasVY(t zTHqH(II}aQ@|Oqe9JhB*pLB?#&IxN;ti-TaFNyOhR&sBnD~-mM>{n*Yl_-6wjct~y zGDtS+&&sXy@z1s2%?LARUS|&5g1?6020A5cN~lkFV-p-dQmy|=*&9nm+?3t?kkFGL z`=_zs6uKb{m0PgMD)`NLeHl_R4C8ZD9z7M*-fj7tRPCDzdTf~Kx8o{?JF;@$s5f

8PF&YaQhZC7&4N3w=o${xSFK-3LlC(#D>&rRgCG3t#o+?M1fv}rZudpuV^%X! zW;c`6$WMhPcL|AYeQ%I0Oh4Q+*^J+Vn}gTxy?R{OnI z#NeeYd_kxl_#G4^r+5dcX4))O1@9IsWb6ruN}r5qdNjR53>@GOhb)kNHN*7h=t<@P z#Z~FyVjN7-0zXX4yhLrXw$?0!8E`$6yey*@kJ}j>CI_C1h}UK2b!a7 z@$X^y$t0-KxLyc`Cx}{KWl)R+4oPpzr|s9ANkH?NS2r3sp14TJ2Sp zU2lynG9RAPwGU|`5UrVwg?@oBYJ(N;t<+^+FpidFk+zjYgiB0g`*7gX9F&iUqE+Dy zuKx}+kwf50`^@`Eaf0RU?H$?>F16umS_T|_DzGY3_?Q0?-z8O3@5=w^5>vD`BZo~h z^W6rd)S9F`16T>+R1JHlx+!tE06@9`8OYvw)eM4wH~njpe>^b3B=7Ocn^q|6pl}CC z)6pDkLI^vyK#AEfO?XHFUX(Ju)mv2k!*S?#7> z(*;|iO$C9{C<^O)?2KN1pu?X%ddl8W&y(qaQ`KI}<~GFwZvr<>!onbeZDI~?rt7hXU{0+d^4@P{ zneJvxtbl>t3{*Fvv`&UYfd~;lZ)(0~t()#fGe}?G2RYXV zeOyugyLrQio@!IX(XxbGb@J|HJ(wCKbp?=%W%Fc2iMTKjb|Wm>T}VCX3C;Swfl`1g zud`~-Q1z`kvE>aL9;3gyThn*Wsu&kNif5CI zg7z>1IIXH+!*qU=&sW98EzC70Y$%Y#;Dn&Lmz&in)v-y#3KbU+)t-Ud%-Nh=5Q*Zr z4s`Cao(4?A71Yl=D5iAHc3ou(6KdBRr=|VkK@9~vgh>8#$1D-R5!PAnkjU@iHgJX! z;QW5~iD&>dWC?5)nUm!~n_D~OFPtA=>>Q)0H#bWG0W{l=xb}B8K31bm=e<2{3upH1 zsiLJNb3DP2P+u(_^xdOI0&ChJH8A_*6WRi~V^Z_cstZ+#%YIQ(bnz=4NA1{^Poy1d z7%j8PSkm(dc{kSObsl56zD>#wNZJHEhr(VhlO1pe5dwS{of9_xWO`A{z7kfAr*cLx z6A@Ioo$+90(y_i<_L>;o+P=PCQX(rzBS~To?J*?&Y)bwTlis+Xc|&NeC^_b2jHyY5 zp(HH~ap^|eoy%Yyu`RnGWna^SznoWFSSl;6uJO~zE7X|P3luNB-=5XYT>#bovKLS= z!xTLzKb@rOgJWN{kx!setV#QCV^u#q<0%RUXjvi^j*@>uXgb z^ViR5Ms$wjEYw*?!Lth{=un$~$byjV^m0h)Eve7=3KyLL=v0G1OJ}~fYBI!AE)V&x z#d7ujpHN1qnG7-u6o)XBKSsi^)Hl~0=1E-|5)2XQC$KftDr@8-AOKX}*#OK=ay*<^ z6m|SWN`doT}zAm<# zbc;GH%55~yzH(gw>r73;D*Y8gN$U=PR*g)^lL}SZI&tjmV(ktt1po4*k2Mjmg4HsO zV+l7v!FTE`&fD6&GovUqgTG$ z#oSpt)B5M2H+3^`ojnGI<+!X1|dYU~l%EjzY>vga5g{|p` zv<&M&0QZ7`IsKX1E*f4C14;bcNc}V01~Rdwu(WR7uJzL2T6-ZO#wsBMC_D1BdeKxt zR6q{9bGmFMSoTPK)CeMJd*o4$iJ3dRqm6t7Jqjmd;%1x(Kp7u1+AS^;Sl@1e$?N9t zk{f8^pbR8~dzoX0kCAv%33^7(@R=+Uh)ss*R)OgrdG!#b*aWvi8|NWU(t*6-Mj9%Q z)HQ(iUrZ|Yztg(luoxzbtbvJO^f{@?Lu;(;{1uWTnb5JbOo)dN^EGYRL><^|+>a6d zpgQvxF^mJbM*L3nT42hPIxy_s?B;&c6J8GqJcr)N6IOtlUJ)z*esKtW*sT`b{@UlL zWaS5)S^Dur(-U3XRj+NBM>y(+_=)n>Gop8P#L$3=L|Tl5O97b<5O8aYe~A8^k?DqmV_L3l=Jo~ZFqlk)amsO$ zd3Ni9?Z}2b&vSI#=uvpo(o8H2s@c7iKEQ$FIdjqF}Oj%cq7u_Hkb{H8;%X&Y2OjSA)6& zB&)|DuO41KQARF6`ok!{LLV3NA`eT@bx_e1rLfKtYBI=H7N)%3DSR5Gc6mn zlK_{G<;7_s5S*=P`o7cofB1e{Cc`_ZsC>zonJsqmvg97Jf1h!f^khDn3TzFmtHsyW z(i^I*KTa8hWDLm>>BhK&K0S};j&5k(ASU3<`;3{@cn|4b&))aCx%Q$kx_;{2T;=qj9nXCikxVWq?RgLi=AcUbQzL8A7 z*J+-_*<6r6rIws{`fuAUPVZSLf9Bgy^SX}*IG)oa;-3*k$;TF!y}{nXa)7`6=4^uawKkVn9^OAw@YM^Krz@ z4%@ZvxQQuV2}*Gy?2H$&9ZC~B3{SI|8rEx(c+KB7UV#10%bxlk<9+(+{3nQaNiE#1 zskkO8g$Q&op@@jK*7F@W#XTjQk!sX$<1AkvprCXaNcw=OLkc_>2wBpNsLCGd1b3Zo^ zy>PHKfBY9`ESfq$7U=l6$FS_Or&$YZpT!6TWO)_$eSkRVWYECXjy;F8HA;ERm*pp+ zhC6!E)s;G7n=QQA>}_}^jDYkGhlU>J27!ce)8MLT>7weOGmhrh!C#q|`s#zMQlqJ3 zKg2gs{5hsUi1dQ+vd0V4jk6OP6q-x1##6Lh>%my~63?tdUm#r5TJrxD!-Nni-*(fZ zkxfk#X|8JIlhda^6n|B=JQZXxgNaue_T5Ud*WL71+MJQqk@Kv_ThuH(&~xoi=IYK* zZJ{26sA3h3BCd}pg;~>nAZ7y}pP-BSSmVuV^sG>0kpMF~@7|p@G>@!+XW6Kq00uB4 z_$=koZp1>tko#RA0H7T)i4=t{y`bVdFY6>9sYYNK@NKWXA#v2S{Q%F~t6S6z&;u~j z{_~srXoufh;+ILVvNgQH-)es7hC$*T%yBx|58Iz|qbS6k%%7{rKV#n*=fA3hqrURt z*5Q@5qI6z1G1w;7G7Kduv8!5j8s6E=^vN<*vuioECybhWOW}}AeD|-nNKXl|5~^Ma1JmF-$YNN%&-9z^2Jay|BE7DQ z#;Lf4nD{xp45(eAD<>|=h$_v1ualG!t#*9VOf*f`vu=zjq@m8|tgn4Q{5cxZ~@JHcto};17(p{g8jZ!{;1f?b`5m+n4bfy574j`IRwCm6 z0yAO_t`~0VDb?+chV$pIGk#OF#pfqu@V(A($FaE<2`AhM056^leYTXS>KpUg+L}nA zekTEK6cW)GIJt2ngy_r}cLV-C$V^if@yT#mmbeesPeoFLhCQGEVa_s-2!R2VWEI9Z z6RpdtoOSu-o%|pYut~b+CV-<1{Zx}^859}KyTQ9rfG>!(fy)rskJ~hH2x`n|%VEwp zrwwW$e6;zPR)0*MBTai(kAIvKg}{HDXC6@GR}t{?CT%eM2gSO}*59GOrX;3z&xbZ9 z`1y z{gdT4m6-~xjPN>QUO51*m#*vklcb$T`%0He$akD_R*{)j{2zTc8T%LWZG*w4eDAE} z%Ar&TvQr8tXWY_B76WwMjN#kIXtII#1^Bc(9wkZ#LD!V!e(uMRsnl#t#$JW{bQ98I z35BB*<>nTSMk@1E`O&q-P$!@aZhG#$Qt7<^c3;UHYGjZ6IZUty{FGh1KbR3E%IlR* zOI_!} z7o@vWH<`u@IGlc<@Y**Uc}#d_IAra9fEC3)ZSWl$dE5V zkK@-w6?%g0O9kB>SZRz^4Tp+w*X$(~Vm8S0COa)5$t1tbkm9dC!|(lgGNrM_z3F_c zY{`NWdNlK9XghiZe$@=Y(pB_$8XZ4ZC2AowtzIVwc&@5!OlE|NFtE(TnW3zeoiaZ1 zuJtQwa zbIZ#JOg+ypI!m6MCFI0z9E&qOqv}e!*x>1L05eCE3>$Qz&(CXM-Q6Y`N{T8gw**NZ zu-NnuO!fd@iElCJe=iJ64fr#IvKg82IphcuK-PVDoFA($t!MqC z`Q=rWa14OJ{VPTc>YZC#M~yGLb7)|OhQGIHMV)L*;C<8U!nyvuDZ}5B42N2P8@JYZ zmx@b%#)d9?{Vnbqhse&}eQ1dE^iln80UR>XviJRP-tITTh7TH0ML|`umv@1-V z8XcxV@73msVwfTqPI1jSe?T+u4RtS0upy|GSJG{%Y;6F@Q^8k}K+$j_Q6H|hj5lpK zoa^apU+cvr!Zv%Qp0BUuBHNU2m+_A0Ff$!Xkk$}I@Oq-Tx2nTPhh(Qj+z2`8PDez2 z=VYmd?OIJw^ngM0`Db|B;gu+4i=}&qR|EVGwGBd^3*)=qX`-Q_67z@p8!E`1C+A_|&+jcbLGI-&_Dv@2J<@oQe9PCuaTxkf5AU;*UP!AzXUtCb42{S+1J1 zUTog+4`5_#zK&0-K@PNuXd8>{T32v!6Yn?IF%;+B{9k|{H+YdPBHM5*I=G}$P9}$% z(FK^UA&@yp(LPhIS@1pt_5MkRs%M|Ks>3A9Lucl3eoZV^3Zp=^L!XUL#>75E(75oz zQnbiG0t0k;4eC+h+7RHl!k&UVxCRdh(wkm__@Xfp#$>2`UfwN*>-Mc(YhM5q=)kOD|E=J$I{#oayHV^Yzr#5LEun=cU!hS z!I)T^XjWbNW(L|u9!}5eQmfi-)Tg8Dh1v!~sA`ng7t3=jbcj0S!4c>WVs2nd*9y4Bfki3_$@yKydo(U7gDYYg! zs$g+jNf_!1>pR&5Fti^MG;sIp(NP!PCqbgtyr#(rRfIXaxioe6LmH{~I(`0aui^_z z!TxHi9kdERQ>9Yg>km)7Db#?cp@D?J{=hx@M;`2#u9=Jq>QooglWL!plkoU6vUtXK zs`y3@=Ew|K?IVP!RH8d?FrGxJ6H--^;kdd@Sa{pG?48O~SmWPn4F^~M3w)UfNSq_D zn*aJgW8@I}mnJusC94g~v_lyB8tR}IE&0lzW?D?B%h$RlGLD^BG%KMB6s@>^{T-qKDm-~%ZsW8vwisi> zfJnm7zj%xNl2^;EFe(8apb*BQ#J5;~#u=`(_QmBS&jAVBaRQ@Cp<-$TW|Ow*a`<*_ z-N&M$hKvsDk7aHghTAyUWu?OKOS~!)$U2#f>6Z&Bz_gm#FvbjJh%GVDGYGB(PgHwN zb$*K`Ka5E{3GHk0px9yvdwRrk|37p`5Pm(H?c^G=Q_PC0a95ly`HMKOmCShiTo7YH z)puFC&t`n)|2`JyPWY`w;8toSw{T?Zcs;eC+CaP$Nxz{Z)l$M*tua5hNG0WSwSZsb zJcD14*%N}!baCiZUN;-$|C5vZ3GGMMa~86hO!jV?PxMF z`%!bP|9b|VO^ygN-y)F1NZe39d!QW8?7({CI{QdEOc~@1<0-nMru3iVD0J_odQ;(x zDe0M;1|n<&pwAjK1ZbVxjKyk(1h<61v|-TZ|qJ?#WUB+4yq%`@4*7PZ{ZHkgTB)RAT>w|~g)$vO(RuMuJ{nW0_bG=OIQkL* zVJI6*orh>>5brUT?WflJYG1#TM%j98vTn@NfVDfLpCa>$weK=Wu;6qSj|ScvP34?AuOi!4llH zLn@yiIcUaH%s01fMU^MqiXc~|25*v1_r9N-_+6S;9T!GxeAQH9@el*s@dnis_le^+ zb8X&iFIVqxARR*gr%SD41-7Ah#E$diz< zR6E*z8uTc?mD*2)aTa7;PgLvwQUVqZUo+7llumUwhmUBOdfLJVnXik)KX&}o zI56m?l3QL*52SPI4BQtW43UfkQ9|d!Re8b8cr~D9g(`vuSyNK+q?NcA+Ce^A9TL{c zY|1TA%U8UuY*A1Hc&&|m2GDxT!Tmq=-VOlyq_I@Dhpe=%U?neETzrNq$cGoL4RZ%PI zw865zVT1O-s~Z6nZgpMHHeOz_%&C9Q+wp4_*asKeO%$O>TD%*x#Jq=9Y98`Qx59e0 z|Ndb!z0@(%2tctu?^^OhENh;%m4ir=))8Qx0})IR15oX6ddrKp*;H2PWv__g(|pU% zE+HVzYhG@)5L*N^kK5gDgCktH(ebn3z(I&7oQ07K%}vG(VMt)wamB`uSe$4C`B?ud zVYX7=u|)JkDYpv0!_oF|9;R@N3nHgby%Ndcr4mv|EP9oDa6nL!MO+5>@9-|#nK2gC zsA*RIy@aqmg1JgFC|>Zwu)M-;IgeCY8x@1)HOYPqrfJO=l(!Zt_(NdC6L|p?p&C%; z$a7QKnrC1gRStR?3h`Fqv8$j#Uqln4c8c;Gj}%w%_CKP9-5kZX!TNkhs_{PvMw(mI zX`ggOZHSlV(-#93a?1^*U(SD;DKEb^n>qHJ*mypab0y5wVi)1t@368cEhKRDPgivK z+Y<&ufYwB$r?6QjAW(z$*|$TaA;8Os7v#`DgF53iWiECxK^G{AWFrTQu?3ZO^vJLZ z+q-o-pI7u!eULuxG1r3W@UBf*%DsRtGB<^iw02I4L=_&%&}%o3PB2T2@ik57M5m+X z?yBl8Gr8c%0#2a+aUPg)g?5YtegKEvA&G-~tQRk}@oOd4wdfY7=Ctxui{eh*i6Twm z$6F;3O~&37`ynTagsBuAqjgw=B?t~+Gw|#N4?_mozPe>GJBGq3V2C@ciH*7K;{7pF za*s5yrTs_T4p)^5?njtfVYQ2n09#lNDp9?eQ5YhlvOj0C=MRrsh54ZzHFNX;c2?>L zy|Cw{5?PAY;qD_sky8I{>w3YZrAahC>-_bra@+nwHgQ@!uf_x>l!NxBeD zdgT$3I{xTc`MevGJt1exq zWut~9NdYPpsJN+tHRx_sQ0eUlXzWS1=KJlZp4vr#k%)<)`J!E*&Q;>`66X1^Vixxt zA62hoTN}w>?6IiAk82*z!^0L_ly%AMi_E4sQW{9 zQE)rc{M{n2sf1HS=KpQ$TgihXtSX;8L(PeCx+$P@>I!8spEZUlJuRK`rB2x#kYtoy z4N+frfrmbp*9Anfj?o6}iJI!G{!xCu*zQQM5sDrD!rm5)IP#7<-HT&22E<*^{3bKf zYAL)14A=h;a;Kq*I0kd?M0}f%K4t|>qOX&KR2#Fy~n8U;Zl@1M7W?TW zUzn<9gqHR6!mn%nJFW`?83T;)Y0IVLSm|A#83Xj~>$6Um+oQ~EUSk6=&O!j1sS0B~ zEx;il45w*W6^dGI|GK5f9`UFGcjJQP0QzynL=MSLeVnfT!W3%kPA>_f(!`YK*~;(i zAx5sm*Q)y%mQbL56q)QSEi7!&teW*fy8Ly&rT1r&lb%HDh4wKq5yV6e$xf6L=Eu;^ z#Qxx_9g9=p2PVxYGy`4RlroV=LQ@XSXmwM`GyMR;p3cc+S^;4~T9c}YN%#V*^s;GU zef^wSK*k@p$}lN+Sh|C_oHu-dZ7C}XYOf!2%&`GI6Su2`A5Sq_!w~E%?^Y)+^aS-GRu&DuCk2zVlwP?~~j# zSR?%5#yB|leGv^W;i%5^g7W!#4~~xNo|n)m98GcgjY1pH!jnI$LW+|^qPPs%;9y3U z6EmY85gXArI1#TeFMJ+3`Y79KDsLsv0LU==Tq_6~-c`=6(6WrIYD@N^BG<)>vU?0j zv(96;uFX1MZX)>A+aM(lNF2J?ftQ5ZZu*da5r;ek2NM;pbf#7gRg0;EB~e}CL{4pe z!y0OI9vQIqE?snY3dOfpi`c?+F5G|MKRYi8-%D}s^vxOms}{SXL8k5c@;a>*f}KBO zkagzhA03o8#;ib!QBBln7jPUgBsf#|1tYluo9FY&hVf#Rg!%(_e{hpfiYFy|G=Y-g zv{N`Fvt1gnZEv(to+$ieqC=UCIfS=X;XW2HI-gBMwrRp})AS-Q6WiTI%oAncbm4yj z%iLs1#hXpz(MtOVi~s-t3-HC95~u(HVi>By`7b6LNFL15cm_eX6o?0Q)U=9{$%WTb zArzY<=B^Qr1NehjQhv;58Tma+I$q;a4FF=0Oe@uJ^y(Z3DI;+N>QNgZT~bh{wERpC&vyLWFKrgHD1!xdc3Bq$Q3Iqh&Kt z6*LEZJQ>Czb~J<-CTi#x>GwixXRyQO9)T`M^> z9$J}Ei5O`gQ0Wi&MNT9=KxS$HZazLRo=T75sB@nQH4*7|3~k{S0onj1|2nNiwOQ~J zSc#Fe2{WROcbWn`2F|$W;YM{w407Z3=@6a(Hq>5qglHk-dL_=xj9t$wZ24h-aPD&& z$6L`+(>p +#### Aboveground Biomass (AGB) -### Difference: Woody + Annual minus Woody +##### Annual Crop -These maps show the impact of adding 50% annual ground cover in orchards. Positive values indicate an increase relative to woody-only. +![](../figures/county_annual_crop_AGB_carbon_stock.webp) -- Density (Mg/ha): ![](../figures/county_diff_woody_plus_annual_minus_woody_TotSoilCarb_carbon_density.webp) -- Stock (Tg): ![](../figures/county_diff_woody_plus_annual_minus_woody_TotSoilCarb_carbon_stock.webp) +##### Woody Perennial Crop (Orchards & Vineyards) -### Aboveground Biomass (AGB) by PFT +![](../figures/county_woody_perennial_crop_AGB_carbon_stock.webp) -#### Annual Crop +##### Woody + Annual (Orchards & Vineyards with Ground Cover) -- Stock: ![](../figures/county_annual_crop_AGB_carbon_stock.webp) -- Density: ![](../figures/county_annual_crop_AGB_carbon_density.webp) +![](../figures/county_woody_annual_AGB_carbon_stock.webp) -#### Woody Perennial Crop +### Density -- Stock: ![](../figures/county_woody_perennial_crop_AGB_carbon_stock.webp) -- Density: ![](../figures/county_woody_perennial_crop_AGB_carbon_density.webp) +#### Soil Carbon (TotSoilCarb) - -### Difference: Woody + Annual minus Woody (AGB) +![](../figures/county_annual_crop_TotSoilCarb_carbon_density.webp) -- Density (Mg/ha): ![](../figures/county_diff_woody_plus_annual_minus_woody_AGB_carbon_density.webp) -- Stock (Tg): ![](../figures/county_diff_woody_plus_annual_minus_woody_AGB_carbon_stock.webp) +##### Woody Perennial Crop (Orchards & Vineyards) +![](../figures/county_woody_perennial_crop_TotSoilCarb_carbon_density.webp) + +##### Woody + Annual (Orchards & Vineyards with Ground Cover) +![](../figures/county_woody_annual_TotSoilCarb_carbon_density.webp) + +#### Aboveground Biomass (AGB) + +##### Annual Crop + +![](../figures/county_annual_crop_AGB_carbon_density.webp) + +##### Woody Perennial Crop (Orchards & Vineyards) + +![](../figures/county_woody_perennial_crop_AGB_carbon_density.webp) + +##### Woody + Annual (Orchards & Vineyards with Ground Cover) + +![](../figures/county_woody_annual_AGB_carbon_density.webp) ### Tables: County Stocks and Density (by PFT) and PFT Comparison -The first table summarizes county-level metrics for each PFT. The second table pivots TotSoilCarb density by PFT to simplify cross-PFT comparisons. +The first table summarizes county-level metrics for each PFT. The second table pivots carbon pools to highlight differences across cropping systems.. -```{r, include=FALSE} -options(ccmmf.quiet_banner = TRUE) +```{r, options, include=FALSE} +options(ccmmf.quiet_bannerj = TRUE) source(here::here("000-config.R")) ``` -```{r, message=FALSE, warning=FALSE} +```{r,load_inputs, message=FALSE, warning=FALSE} # Load county summaries data county_summaries <- readr::read_csv( @@ -122,14 +128,14 @@ DT::datatable( county_summaries_table, extensions = c('Scroller','SearchPanes','Select'), options = list( - dom = 'Plfrtip', + dom = 'Plrtip', pageLength = 10, - searchHighlight = TRUE, + searchHighlight = FALSE, deferRender = TRUE, scroller = TRUE, scrollY = "60vh", searchPanes = list(cascadePanes = TRUE, initCollapsed = TRUE), - columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1,2))) + columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1))) ), class = "stripe hover compact", rownames = FALSE, escape = FALSE @@ -165,9 +171,9 @@ DT::datatable( density_wide, extensions = c('Scroller','SearchPanes','Select'), options = list( - dom = 'Plfrtip', + dom = 'Plrtip', pageLength = 10, - searchHighlight = TRUE, + searchHighlight = FALSE, deferRender = TRUE, scroller = TRUE, scrollY = "60vh", @@ -175,24 +181,37 @@ DT::datatable( columnDefs = list(list(searchPanes = list(show = TRUE), targets = c(0,1))) ), class = "stripe hover compact", - rownames = FALSE, escape = FALSE, - filter = 'top' + rownames = FALSE, escape = FALSE ) ``` + \ No newline at end of file diff --git a/scripts/040_downscale.R b/scripts/040_downscale.R index 6e98c44..6d22391 100644 --- a/scripts/040_downscale.R +++ b/scripts/040_downscale.R @@ -46,8 +46,28 @@ if (!exists("ca_fields_full")) { } ca_fields <- ca_fields_full |> + sf::st_drop_geometry() |> dplyr::select(site_id, county, area_ha) +# Normalize reference table to one row per site_id and warn if duplicates exist +dup_counts <- ca_fields |> + dplyr::count(site_id, name = "n") |> + dplyr::filter(n > 1) +if (nrow(dup_counts) > 0) { + PEcAn.logger::logger.warn( + "ca_fields has duplicate site_id rows: ", nrow(dup_counts), + "; collapsing to first observed county/area per site_id. Examples: ", + paste(utils::head(dup_counts$site_id, 5), collapse = ", ") + ) +} +ca_fields <- ca_fields |> + dplyr::group_by(site_id) |> + dplyr::summarise( + county = dplyr::first(county), + area_ha = dplyr::first(area_ha), + .groups = "drop" + ) + ca_field_attributes <- readr::read_csv(file.path(data_dir, "ca_field_attributes.csv")) # Determine PFTs and map ensemble keys (e.g. 'woody') to field labels @@ -734,9 +754,9 @@ if (length(target_woody_sites) == 0) { dplyr::filter(site_id %in% target_woody_sites) |> dplyr::rename(annual_end = prediction) - # Join by site_id and ensemble to align predictions + # Join by site_id and ensemble to align predictions (include annual_start for SOC) mix_df <- woody_df |> - dplyr::inner_join(ann_end_df, by = c("site_id", "ensemble")) |> + dplyr::inner_join(ann_end_df, by = c("site_id", "ensemble")) |> dplyr::inner_join(ann_start_df, by = c("site_id", "ensemble")) if (nrow(mix_df) == 0) { @@ -750,7 +770,7 @@ if (length(target_woody_sites) == 0) { mixed_pred = combine_mixed_crops( woody_value = .data$woody_pred, annual_value = .data$annual_end, - annual_init = .data$annual_start, + annual_init = if (pool == "AGB") 0 else .data$annual_start, annual_cover = f_annual, woody_cover = 1.0, method = "incremental" @@ -767,6 +787,37 @@ if (length(target_woody_sites) == 0) { dplyr::select(site_id, pft, ensemble, c_density_Mg_ha, total_c_Mg, area_ha, county, model_output) mixed_records[[pool]] <- mix_df + + # Also save per-site treatment scenarios on woody fields for comparisons + woody_scn <- woody_df |> + dplyr::left_join(ca_fields, by = "site_id") |> + dplyr::mutate( + pft = pft_i, + model_output = pool, + scenario = "woody_100", + c_density_Mg_ha = PEcAn.utils::ud_convert(woody_pred, "kg/m2", "Mg/ha"), + total_c_Mg = c_density_Mg_ha * area_ha + ) |> + dplyr::select(site_id, pft, ensemble, scenario, c_density_Mg_ha, total_c_Mg, area_ha, county, model_output) + + annual_scn <- ann_end_df |> + dplyr::left_join(ca_fields, by = "site_id") |> + dplyr::mutate( + pft = pft_i, + model_output = pool, + scenario = "annual_100", + c_density_Mg_ha = PEcAn.utils::ud_convert(annual_end, "kg/m2", "Mg/ha"), + total_c_Mg = c_density_Mg_ha * area_ha + ) |> + dplyr::select(site_id, pft, ensemble, scenario, c_density_Mg_ha, total_c_Mg, area_ha, county, model_output) + + mixed_scn <- mix_df |> + dplyr::mutate(scenario = "woody_50_annual_50") |> + dplyr::select(site_id, pft, ensemble, scenario, c_density_Mg_ha, total_c_Mg, area_ha, county, model_output) + + # accumulate + if (!exists("treatment_records", inherits = FALSE)) treatment_records <- list() + treatment_records[[length(treatment_records) + 1L]] <- dplyr::bind_rows(woody_scn, annual_scn, mixed_scn) } # Append mixed records if any @@ -827,6 +878,14 @@ if (length(delta_output_records) > 0) { PEcAn.logger::logger.info("Delta predictions written to", file.path(model_outdir, "downscaled_deltas.csv")) } +# Write treatment comparisons for woody sites if available +if (exists("treatment_records") && length(treatment_records) > 0) { + treatments_df <- dplyr::bind_rows(treatment_records) + out_treat <- file.path(model_outdir, "treatments_woody_sites.csv") + readr::write_csv(treatments_df, out_treat) + PEcAn.logger::logger.info("Treatment scenarios written to", out_treat) +} + PEcAn.logger::logger.info("Downscaled predictions written to", file.path(model_outdir, "downscaled_preds.csv")) PEcAn.logger::logger.info( "Total script elapsed time (s):", diff --git a/scripts/041_aggregate_to_county.R b/scripts/041_aggregate_to_county.R index 2656b1a..71cf333 100644 --- a/scripts/041_aggregate_to_county.R +++ b/scripts/041_aggregate_to_county.R @@ -11,7 +11,6 @@ PEcAn.logger::logger.info("***Starting Aggregation to County Level***") source("000-config.R") downscale_preds_csv <- file.path(model_outdir, "downscaled_preds.csv") -library(readr) downscale_preds <- vroom::vroom( downscale_preds_csv, col_types = readr::cols( diff --git a/scripts/043_county_level_plots.R b/scripts/043_county_level_plots.R index 9999497..19de3f8 100644 --- a/scripts/043_county_level_plots.R +++ b/scripts/043_county_level_plots.R @@ -31,22 +31,8 @@ combos <- co_preds_to_plot |> p <- purrr::pmap( list(combos$pft, combos$model_output, combos$units), function(.pft, pool, unit) { - # Cumulative-share mask: exclude smallest-contributing counties whose aggregate - # share of statewide total carbon is below MASK_THRESHOLD for this PFT + pool - mask_df <- county_summaries |> - dplyr::filter(pft == .pft, model_output == pool) |> - dplyr::mutate(share = mean_total_c_Tg / sum(mean_total_c_Tg, na.rm = TRUE)) |> - dplyr::arrange(share) |> - dplyr::mutate(cum_share = cumsum(dplyr::coalesce(share, 0))) |> - dplyr::mutate(mask_ok = cum_share >= get0("MASK_THRESHOLD", ifnotfound = 0.01)) |> - dplyr::select(county, mask_ok) - dat <- dplyr::filter(co_preds_to_plot, pft == .pft, model_output == pool, units == unit) |> - dplyr::left_join(mask_df, by = "county") |> - dplyr::mutate( - .mask_ok = dplyr::coalesce(mask_ok, FALSE), - value_plot = dplyr::if_else(.mask_ok, value, NA_real_) - ) + dplyr::mutate(value_plot = value) .plt <- ggplot2::ggplot( dat, ggplot2::aes(geometry = geom, fill = value_plot) @@ -58,11 +44,6 @@ p <- purrr::pmap( ggplot2::facet_grid(model_output ~ stat) + ggplot2::labs( title = paste("County-Level", pool, "-", .pft), - subtitle = paste0( - "Excluding smallest counties whose combined share < ", - scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), - " of statewide total carbon" - ), fill = unit ) + ggplot2::guides(fill = ggplot2::guide_colorbar(title.position = "top")) @@ -103,6 +84,21 @@ mix <- dp |> wood <- dp |> dplyr::filter(pft == "woody perennial crop") |> dplyr::select(site_id, ensemble, model_output, county, area_ha_woody = area_ha, total_c_Mg_woody = total_c_Mg) + +# Log if duplicate keys remain (should be rare after upstream normalization) +mix_dups <- mix |> + dplyr::count(site_id, ensemble, model_output, county) |> + dplyr::filter(n > 1) +wood_dups <- wood |> + dplyr::count(site_id, ensemble, model_output, county) |> + dplyr::filter(n > 1) +if (nrow(mix_dups) > 0 || nrow(wood_dups) > 0) { + PEcAn.logger::logger.severe(paste0( + "Duplicate keys detected prior to join (mix=", nrow(mix_dups), ", wood=", nrow(wood_dups), "). ", + "Fix upstream duplication; refusing to silently aggregate." + )) +} + diff_county <- mix |> dplyr::inner_join(wood, by = c("site_id", "ensemble", "model_output", "county")) |> dplyr::mutate(diff_total_Mg = total_c_Mg_mix - total_c_Mg_woody, area_ha = dplyr::coalesce(area_ha_woody, area_ha_mix)) |> @@ -116,31 +112,14 @@ diff_county <- mix |> for (pool in unique(stats::na.omit(diff_county$model_output))) { dat_pool <- dplyr::filter(diff_county, model_output == pool) - # Cumulative-share mask based on absolute change magnitudes - sum_abs <- sum(abs(dat_pool$mean_diff_total_Tg), na.rm = TRUE) - if (is.finite(sum_abs) && sum_abs > 0) { - dat_pool_mask <- dat_pool |> - dplyr::mutate(share = abs(mean_diff_total_Tg) / sum_abs) |> - dplyr::arrange(share) |> - dplyr::mutate(cum_share = cumsum(dplyr::coalesce(share, 0))) |> - dplyr::mutate(mask_ok = cum_share >= get0("MASK_THRESHOLD", ifnotfound = 0.01)) - } else { - dat_pool_mask <- dplyr::mutate(dat_pool, mask_ok = TRUE) - } - # Density difference map (Mg/ha) - p_density <- ggplot2::ggplot(dat_pool_mask, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_diff_density_Mg_ha, NA_real_))) + + p_density <- ggplot2::ggplot(dat_pool, ggplot2::aes(geometry = geom, fill = mean_diff_density_Mg_ha)) + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + ggplot2::geom_sf() + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + ggplot2::theme_minimal() + ggplot2::labs( title = paste("Difference in Carbon Density (Mg/ha): (woody + annual) - (woody)", pool), - subtitle = paste0( - "Excluding smallest counties whose combined share < ", - scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), - " of statewide total change" - ), fill = "Delta (Mg/ha)" ) ggsave_optimized( @@ -150,18 +129,13 @@ for (pool in unique(stats::na.omit(diff_county$model_output))) { ) # Stock difference map (Tg) - p_stock <- ggplot2::ggplot(dat_pool_mask, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_diff_total_Tg, NA_real_))) + + p_stock <- ggplot2::ggplot(dat_pool, ggplot2::aes(geometry = geom, fill = mean_diff_total_Tg)) + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + ggplot2::geom_sf() + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + ggplot2::theme_minimal() + ggplot2::labs( title = paste("Difference in Carbon Stock (Tg): (woody + annual) - (woody)", pool), - subtitle = paste0( - "Excluding smallest counties whose combined share < ", - scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), - " of statewide total change" - ), fill = "Delta (Tg)" ) ggsave_optimized( @@ -211,30 +185,15 @@ if (file.exists(delta_csv)) { combos_delta, function(pft, model_output) { datp <- dplyr::filter(delta_county, pft == !!pft, model_output == !!model_output) - sum_abs <- sum(abs(datp$mean_delta_total_Tg), na.rm = TRUE) - if (is.finite(sum_abs) && sum_abs > 0) { - datp <- datp |> - dplyr::mutate(share = abs(mean_delta_total_Tg) / sum_abs) |> - dplyr::arrange(share) |> - dplyr::mutate(cum_share = cumsum(dplyr::coalesce(share, 0))) |> - dplyr::mutate(mask_ok = cum_share >= get0("MASK_THRESHOLD", ifnotfound = 0.01)) - } else { - datp <- dplyr::mutate(datp, mask_ok = TRUE) - } pft_key <- stringr::str_replace_all(pft, "[^A-Za-z0-9]+", "_") # density - p_den <- ggplot2::ggplot(datp, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_delta_density_Mg_ha, NA_real_))) + + p_den <- ggplot2::ggplot(datp, ggplot2::aes(geometry = geom, fill = mean_delta_density_Mg_ha)) + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + ggplot2::geom_sf() + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + ggplot2::theme_minimal() + ggplot2::labs( title = paste("Delta Density (start->end)", model_output, "-", pft), - subtitle = paste0( - "Excluding smallest counties whose combined share < ", - scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), - " of statewide total change" - ), fill = "Delta (Mg/ha)" ) ggsave_optimized( @@ -242,18 +201,13 @@ if (file.exists(delta_csv)) { plot = p_den, width = 10, height = 5, units = "in", dpi = 96, bg = "white" ) # stock - p_stk <- ggplot2::ggplot(datp, ggplot2::aes(geometry = geom, fill = dplyr::if_else(mask_ok, mean_delta_total_Tg, NA_real_))) + + p_stk <- ggplot2::ggplot(datp, ggplot2::aes(geometry = geom, fill = mean_delta_total_Tg)) + ggplot2::geom_sf(data = county_boundaries, fill = "white", color = "black") + ggplot2::geom_sf() + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + ggplot2::theme_minimal() + ggplot2::labs( title = paste("Delta Stock (start->end)", model_output, "-", pft), - subtitle = paste0( - "Excluding smallest counties whose combined share < ", - scales::percent(get0("MASK_THRESHOLD", ifnotfound = 0.01)), - " of statewide total change" - ), fill = "Delta (Tg)" ) ggsave_optimized(filename = here::here("figures", paste0("county_delta_", pft_key, "_", model_output, "_carbon_stock.webp")), plot = p_stk, width = 10, height = 5, units = "in", dpi = 96, bg = "white") From 47c0aa1405741e5a98f75e05c158926ea37e2796 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 25 Sep 2025 01:02:31 -0400 Subject: [PATCH 32/70] Update file paths in mixed_system_prototype.qmd and refine variable importance report in variable_importance.qmd --- docs/mixed_system_prototype.qmd | 2 +- reports/variable_importance.qmd | 141 ++++++++++--------------------- scripts/042_downscale_analysis.R | 32 +++---- 3 files changed, 63 insertions(+), 112 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 5f7f7ea..0c3d0df 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -571,7 +571,7 @@ ggplot() + Here we present county-level maps showing the impact of adding 50% ground cover to orchards on carbon stocks and densities. Positive values indicate an increase relative to woody-only. -These maps are generated from the downscaled outputs that are presented in the [Downscaling Results](downscaling_results.qmd#results) page. +These maps are generated from the downscaled outputs that are presented in the [Downscaling Results](../reports/downscaling_results.qmd#results) page. ### Change in Carbon Stocks (Tg): diff --git a/reports/variable_importance.qmd b/reports/variable_importance.qmd index 4851e84..e27a64a 100644 --- a/reports/variable_importance.qmd +++ b/reports/variable_importance.qmd @@ -1,114 +1,63 @@ --- -title: "Variable Importance and Effects" +title: "Downscaling Analysis: Variable Importance and Effects" format: html: self-contained: false df-print: paged toc: true +execute: + freeze: false # force re-execution to refresh figures after new model runs --- # Overview +This page answers two questions for each model (pool × PFT): -Diagnostics reflect exactly what scripts/040_downscale.R and scripts/042_downscale_analysis.R produce. - -- Variable importance: per-ensemble importance computed during downscaling (040) and summarized by median and IQR across ensembles (042). -- Partial dependence (PDP): top two predictors (by median importance) using the saved RF model and training data from 040. -- ALE (Accumulated Local Effects) and ICE: optional if the iml package is available; computed from the same saved model and training data. - -Notes - -- Importance units depend on the RF backend: %IncMSE or an equivalent importance metric; higher values indicate stronger influence. -- PDP assumes feature independence; may mislead under strong collinearity. ALE is preferred in that case and is centered at 0. -- The mixed scenario (woody + annual) has no standalone model; therefore VI/PDP/ALE/ICE are not generated for that PFT. -- PDPs are generated from a single saved RF model (first ensemble) for illustration, while importance ranks are aggregated across ensembles. - -# Importance + Partial Dependence +1) Which predictors matter most? (Top-5 table across ensembles) +2) How do top predictors affect predictions? (PDP panels) +Notes: importance is aggregated across ensembles; PDPs come from one saved RF model and are for quick intuition. ALE/ICE details are folded at the bottom. + ```{r, include=FALSE} options(ccmmf.quiet_banner = TRUE) source(here::here("000-config.R")) -``` -```{r, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} - -fig_dir <- here::here("figures") -imp_pngs <- list.files(fig_dir, pattern = "_importance_partial_plots\\.png$", full.names = TRUE) -if (length(imp_pngs) == 0) { - cat("No importance/partial plots found. Run scripts/042_downscale_analysis.R to generate them.") -} else { - # Parse filenames into PFT and pool: __importance_partial_plots.png - meta <- tibble::tibble(file = imp_pngs) |> - dplyr::mutate(name = basename(file)) |> - # capture any pool token (last underscore-delimited token before suffix) - tidyr::extract(name, into = c("pft", "pool"), - regex = "^(.*)_([^_]+)_importance_partial_plots\\.png$", - remove = FALSE) |> - dplyr::arrange(pool, pft) - - pools <- unique(meta$pool) - for (po in pools) { - cat("\n## ", po, " by PFT\n\n", sep = "") - mf <- dplyr::filter(meta, pool == po) - for (i in seq_len(nrow(mf))) { - cat("### ", gsub("_", " ", mf$pft[i]), "\n\n", sep = "") - knitr::include_graphics(mf$file[i]) - cat("\n") - cat("- Left: variable importance (median across ensembles) with interquartile ranges.\n") - cat("- Middle/Right: PDP for the top two predictors.\n\n") - } - } +summ_vi <- function(pft, pool, top_n = 5) { + spec_key <- paste0(janitor::make_clean_names(pft), "_", janitor::make_clean_names(pool)) + vi_file <- file.path(model_outdir, paste0("vi_", spec_key, "_by_ensemble.csv")) + if (!file.exists(vi_file)) return(NULL) + readr::read_csv(vi_file, show_col_types = FALSE) |> + group_by(pft, model_output, predictor) |> + summarize( + median_importance = median(importance, na.rm = TRUE), + iqr_importance = IQR(importance, na.rm = TRUE), + .groups = "drop" + ) |> + arrange(desc(median_importance)) |> + slice_head(n = top_n) } -``` -## ALE and ICE Effects - -```{r, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} -ale_svgs <- list.files(fig_dir, pattern = "_ALE_predictor[0-9]+\\.svg$", full.names = TRUE) -ice_svgs <- list.files(fig_dir, pattern = "_ICE_predictor[0-9]+\\.svg$", full.names = TRUE) - -if (length(ale_svgs) == 0 && length(ice_svgs) == 0) { - cat("ALE/ICE plots not found (package 'iml' may be missing during generation).\n") -} else { - parse_fx <- function(files, kind) { - tibble::tibble(file = files) |> - dplyr::mutate(name = basename(file)) |> - # capture any pool token (last underscore-delimited token before kind) - tidyr::extract( - name, - into = c("pft", "pool", "pred_idx"), - regex = paste0("^(.*)_([^_]+)_", kind, "_predictor([0-9]+)\\.svg$"), - remove = FALSE - ) |> - dplyr::mutate(pred_idx = as.integer(pred_idx)) - } - - ale_meta <- parse_fx(ale_svgs, "ALE") - ice_meta <- parse_fx(ice_svgs, "ICE") - - # Merge ALE and ICE by pft/pool/pred_idx - meta <- dplyr::full_join( - dplyr::rename(ale_meta, ale_file = file), - dplyr::rename(ice_meta, ice_file = file), - by = c("pft", "pool", "pred_idx") - ) |> - dplyr::arrange(pool, pft, pred_idx) - - pools <- unique(meta$pool) - for (po in pools) { - cat("\n## ", po, " – ALE and ICE\n\n", sep = "") - mf <- dplyr::filter(meta, pool == po) - pf <- unique(mf$pft) - for (pf_i in pf) { - cat("### ", gsub("_", " ", pf_i), "\n\n", sep = "") - for (k in unique(mf$pred_idx[mf$pft == pf_i])) { - row <- dplyr::filter(mf, pft == pf_i, pred_idx == k)[1, ] - cat("#### Top predictor ", k, "\n\n", sep = "") - if (!is.na(row$ale_file)) knitr::include_graphics(row$ale_file) - if (!is.na(row$ice_file)) knitr::include_graphics(row$ice_file) - cat("\n- ALE: centered effect; more robust under collinearity.\n", - "- ICE: per-observation curves; heterogeneity and interactions appear as diverse or crossing trajectories.\n\n") - } - } - } -} +pfts <- c("annual crop", "woody perennial crop") ``` + +# TotSoilCarb + +#### Annual crop + +[![Annual crop – TotSoilCarb: importance + PDP](../figures/annual_crop_tot_soil_carb_importance_partial_plots.png){width=100%}](../figures/annual_crop_tot_soil_carb_importance_partial_plots.png) + +#### Woody perennial crop + +[![Woody perennial crop – TotSoilCarb: importance + PDP](../figures/woody_perennial_crop_tot_soil_carb_importance_partial_plots.png){width=100%}](../figures/woody_perennial_crop_tot_soil_carb_importance_partial_plots.png) + + +# AGB + +#### Annual crop + +[![Annual crop – AGB: importance + PDP](../figures/annual_crop_agb_importance_partial_plots.png){width=100%}](../figures/annual_crop_agb_importance_partial_plots.png) + +#### Woody perennial crop + +[![Woody perennial crop – AGB: importance + PDP](../figures/woody_perennial_crop_agb_importance_partial_plots.png){width=100%}](../figures/woody_perennial_crop_agb_importance_partial_plots.png) + + diff --git a/scripts/042_downscale_analysis.R b/scripts/042_downscale_analysis.R index a76a5b7..12504f8 100644 --- a/scripts/042_downscale_analysis.R +++ b/scripts/042_downscale_analysis.R @@ -3,10 +3,6 @@ source("000-config.R") # Variable Importance (VI) -# Lighter defaults in development to avoid memory/timeouts -rf_ntree <- if (PRODUCTION) 500 else 300 -rf_sample_n <- 20000 # per-ensemble sampling for importance - preds_csv <- file.path(model_outdir, "downscaled_preds.csv") meta_json <- file.path(model_outdir, "downscaled_preds_metadata.json") @@ -24,7 +20,7 @@ downscale_preds <- vroom::vroom( ) ) -meta <- tryCatch(jsonlite::read_json(meta_json, simplifyVector = TRUE), error = function(e) list()) +meta <- jsonlite::read_json(meta_json, simplifyVector = TRUE) ensemble_ids <- if (!is.null(meta$ensembles)) meta$ensembles else sort(unique(downscale_preds$ensemble)) covariates_csv <- file.path(data_dir, "site_covariates.csv") @@ -90,13 +86,15 @@ for (row in seq_len(nrow(spec_table))) { mdl_path <- file.path(cache_dir, "models", paste0(spec_key, "_models.rds")) trn_path <- file.path(cache_dir, "training_data", paste0(spec_key, "_training.csv")) rf <- NULL - x <- NULL + + # Here, the cached training CSV corresponds to the design-point covariates used to train the RF. + design_covariates <- NULL if (file.exists(mdl_path) && file.exists(trn_path)) { models <- readRDS(mdl_path) df_spec <- readr::read_csv(trn_path, show_col_types = FALSE) rf <- models[[1]] if (inherits(rf, "randomForest")) { - x <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() + design_covariates <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() } } if (is.null(rf)) { @@ -116,9 +114,11 @@ for (row in seq_len(nrow(spec_table))) { } PEcAn.logger::logger.info("Creating importance and partial plots for", paste(pft_i, pool, sep = "::")) + clean_pft <- janitor::make_clean_names(pft_i) + clean_pool <- janitor::make_clean_names(pool) importance_partial_plot_fig <- here::here( "figures", - paste0(gsub(" ", "_", pft_i), "_", pool, "_importance_partial_plots.png") + paste0(clean_pft, "_", clean_pool, "_importance_partial_plots.png") ) png(filename = importance_partial_plot_fig, width = 14, height = 6, units = "in", res = 300, bg = "white") @@ -147,24 +147,25 @@ for (row in seq_len(nrow(spec_table))) { if (requireNamespace("iml", quietly = TRUE)) { requireNamespace("randomForest", quietly = TRUE) pred_fun <- function(m, newdata) stats::predict(m, newdata) - predictor_obj <- iml::Predictor$new(model = rf, data = x, y = NULL, predict.function = pred_fun) + predictor_obj <- iml::Predictor$new(model = rf, data = design_covariates, y = NULL, predict.function = pred_fun) fe1 <- iml::FeatureEffect$new(predictor_obj, feature = top_predictors[1], method = "pdp") fe2 <- iml::FeatureEffect$new(predictor_obj, feature = top_predictors[2], method = "pdp") plot(fe1) plot(fe2) } else { randomForest::partialPlot(rf, - pred.data = x, x.var = top_predictors[1], + pred.data = design_covariates, x.var = top_predictors[1], main = paste("Partial Dependence -", top_predictors[1]), xlab = top_predictors[1], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 ) randomForest::partialPlot(rf, - pred.data = x, x.var = top_predictors[2], + pred.data = design_covariates, x.var = top_predictors[2], main = paste("Partial Dependence -", top_predictors[2]), xlab = top_predictors[2], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 ) } dev.off() + PEcAn.logger::logger.info("Saved importance/PDP figure:", importance_partial_plot_fig) } @@ -188,10 +189,11 @@ for (row in seq_len(nrow(spec_table))) { models <- readRDS(mdl_path) df_spec <- readr::read_csv(trn_path, show_col_types = FALSE) rf <- models[[1]] - x <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() + # Use design-point covariates for ALE/ICE as in the original script + design_covariates <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() requireNamespace("randomForest", quietly = TRUE) predictor_obj <- iml::Predictor$new( - model = rf, data = x, y = NULL, + model = rf, data = design_covariates, y = NULL, predict.function = function(m, newdata) stats::predict(m, newdata) ) top_predictors <- importance_summary |> @@ -203,13 +205,13 @@ for (row in seq_len(nrow(spec_table))) { pred_var_name <- top_predictors[j] ale <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ale") ggsave_optimized( - filename = here::here("figures", paste0(gsub(" ", "_", pft_i), "_", pool, "_ALE_predictor", j, ".svg")), + filename = here::here("figures", paste0(janitor::make_clean_names(pft_i), "_", janitor::make_clean_names(pool), "_ALE_predictor", j, ".svg")), plot = plot(ale) + ggplot2::ggtitle(paste("ALE for", pred_var_name, "on", pool, "-", pft_i)), width = 6, height = 4, units = "in" ) ice <- iml::FeatureEffect$new(predictor_obj, feature = pred_var_name, method = "ice") ggsave_optimized( - filename = here::here("figures", paste0(gsub(" ", "_", pft_i), "_", pool, "_ICE_predictor", j, ".svg")), + filename = here::here("figures", paste0(janitor::make_clean_names(pft_i), "_", janitor::make_clean_names(pool), "_ICE_predictor", j, ".svg")), plot = plot(ice) + ggplot2::ggtitle(paste("ICE for", pred_var_name, "on", pool, "-", pft_i)), width = 6, height = 4, units = "in" ) From 9e3bc0b7757d137d45d1663fd840646fec5afad2 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 26 Sep 2025 17:45:29 -0400 Subject: [PATCH 33/70] Change default mode production=true, refine README and workflow documentation. Misc updates mostly to formatting. --- 000-config.R | 2 +- README.md | 144 +++++---------------------- _quarto.yml | 6 +- docs/workflow_documentation.md | 98 +++++++++--------- index.qmd | 5 - reports/design_points_analysis.qmd | 2 - reports/variable_importance.qmd | 59 +++-------- scripts/021_clustering_diagnostics.R | 39 ++++++++ scripts/042_downscale_analysis.R | 34 +++---- 9 files changed, 149 insertions(+), 240 deletions(-) diff --git a/000-config.R b/000-config.R index 6dd2e6c..4551257 100644 --- a/000-config.R +++ b/000-config.R @@ -5,7 +5,7 @@ parser <- argparse::ArgumentParser() ## Set development vs production mode ## # Dev mode speeds up workflows by subsetting data for testing and debugging parser$add_argument("--production", - type = "logical", default = FALSE, + type = "logical", default = TRUE, help = "Set to true for production mode, false for faster development (default: FALSE)" ) args <- parser$parse_args() diff --git a/README.md b/README.md index 4500a52..8230942 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,21 @@ -[![](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) -# Downscaling Workflow +# Downscaling Workflow -## Overview - -The downscaling workflow predicts carbon pools (SOC and AGB) for California crop fields and aggregates predictions to the county level. It uses ensemble-based uncertainty propagation via SIPNET model runs and Random Forest downscaling. - -**Key components:** -- Environmental covariate extraction from multiple sources (ERA5, SoilGrids, TWI) -- Design point selection using k-means clustering -- SIPNET model runs at design points -- Random Forest models to downscale from design points to all fields -- County-level aggregation of carbon estimates +

repo status: WIP

-### Concepts +## Overview -- **Anchor sites:** Fields with ground truth validation data (e.g., Ameriflux sites) -- **Design points:** Representative fields selected for SIPNET simulation based on environmental clustering -- **Downscaling:** Process of extending predictions from design points to all California crop fields. +This workflow estimates carbon pools (SOC and AGB) for California crop fields and aggregates to the county level. -## Environment and Setup +Key components: -### Key Configuration Files +- Environmental covariates (ERA5, SoilGrids, TWI) +- Design point selection via k-means +- SIPNET simulations at design points [done externally] +- Random Forest downscaling to all fields +- County-level aggregation -- `.future.R` - Sets up parallel processing -- `000-config.R` - Configuration file for the workflow (called in each workflow script) +Configuration: see `000-config.R` for paths, variables, and parallel settings. ## Quick start @@ -32,117 +23,36 @@ The downscaling workflow predicts carbon pools (SOC and AGB) for California crop # Load modules (geo cluster example) module load R/4.4.0 gdal proj geos sqlite udunits quarto -# Decide where the shared CCMMF area lives (or set in .Renviron) -export CCMMF_DATA_DIR=/projectnb/dietzelab/ccmmf # or $HOME/ccmmf-dev +# Point to the shared CCMMF directory (or set in .Renviron) +export CCMMF_DIR=/projectnb/dietzelab/ccmmf # or $HOME/ccmmf-dev -git clone https://github.com/ccmmf/downscale.git -cd downscale +git clone https://github.com/ccmmf/downscaling.git +cd downscaling # Restore exact packages for this workflow R -e 'if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv"); renv::restore()' - -# Run the pipeline [Not implemented] -# R -e 'targets::tar_make()' ``` -## Running the Workflow +### Run Sequence -The workflow consists of these main scripts: +See full details in . Typical sequence: ```bash -# 1. Prepare environmental covariates +# Data prep and clustering Rscript scripts/010_prepare_covariates.R - -# 2. Assign anchor sites to fields Rscript scripts/011_prepare_anchor_sites.R - -# 3. Select design points via clustering Rscript scripts/020_cluster_and_select_design_points.R - -# 4. Cluster diagnostics and visualization Rscript scripts/021_clustering_diagnostics.R -# 5. Extract SIPNET output from model runs -Rscript scripts/030_extract_sipnet_output.R - -# 6. Downscale and aggregate to county level -Rscript scripts/040_downscale_and_aggregate.R - -# 7. Downscale analysis and interpretation -Rscript scripts/041_downscale_analysis.R - -# 8. Generate results documentation -quarto render reports/downscaling_results.qmd -``` - -## Advanced Setup and Use - -### Interactive Session on BU Cluster - -```sh -# On geo.bu.edu: -qrsh -l h_rt=3:00:00 -pe omp 16 -l buyin -``` - -### Job Submission - -This is an example of how a script can be run on an HPC - -```sh -qsub \ - -l h_rt=1:00:00 \ - -pe omp 8 \ - -o logs/03.out \ - -e logs/03.err \ - -b y Rscript downscale/999_workflow_step.R -``` - -### Building Documentation Site with Quarto - -**Preview** - -```bash -quarto preview -``` - -**Build** - -```bash -quarto render -``` - -#### Publish to GitHub Pages (gh-pages) +# Extract SIPNET outputs and create mixed-PFT scenarios +Rscript scripts/030_extract_sipnet_output.R +Rscript scripts/031_aggregate_sipnet_output.R -These steps will publish to https://ccmmf.github.io/downscaling. +# Downscale and aggregate +Rscript scripts/040_downscale.R +Rscript scripts/041_aggregate_to_county.R -```bash -# Build site locally (runs R code and embeds results) -quarto render - -# Publish the built _site/ to gh-pages -quarto publish gh-pages +# Analysis and figures +Rscript scripts/042_downscale_analysis.R +Rscript scripts/043_county_level_plots.R ``` - -### Adding Content - -- Add any `.qmd` or `.md` file -- Add links in `_quarto.yml` - - under `project.render` - - in the appropriate section under `website.navbar.left`. - - -#### Notes on Quarto Configuration - -- **Global HTML Settings:** - - `self-contained: true`, `embed-resources: true`, `df-print: paged`, `toc: true`. - -- **Execution:** - - Uses `freeze: auto` to cache outputs; re-executes only when inputs change. - - Some pages (e.g., `reports/downscaling_results.qmd`) depend on paths in `000-config.R`. Build locally where paths exist. - -- **Configuration:** - - Quarto settings are in `_quarto.yml` (generates standalone HTML). - - Does not use GitHub Actions due to reliance on large local datasets. - -- **Home Page:** - - Defined in `index.qmd` (includes this `README.md`). diff --git a/_quarto.yml b/_quarto.yml index bf00e34..f8266b0 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -1,6 +1,10 @@ project: type: website output-dir: _site + preview: + watch-inputs: true + resources: + - README.md render: # Explicit pages to render - index.qmd # renders README @@ -25,7 +29,7 @@ website: - href: reports/design_points_analysis.qmd text: Design Points Analysis - href: reports/variable_importance.qmd - text: Variable Importance + text: Downscaling Model Evaluation - text: Docs menu: - href: docs/workflow_documentation.md diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 9c07f25..350a766 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -1,5 +1,5 @@ --- -title: "Downscaling Workflow Technical Documentation" +title: "Technical Documentation" author: "David LeBauer" date: "`r Sys.Date()`" format: @@ -10,13 +10,11 @@ execute: echo: false --- -# Technical Documentation - The downscaling workflow predicts carbon pools (Soil Organic Carbon and Aboveground Biomass) for cropland fields in California and then aggregates these predictions to the county scale. It uses an ensemble-based approach to uncertainty propagation and analysis, maintaining ensemble structure to propagate errors through the prediction and aggregation processes. -![Spatial downscaling workflow using machine learning with environmental covariates](figures/spatial_downscaling_workflow.webp){width="5in"} +![Spatial downscaling workflow using machine learning with environmental covariates](../figures/spatial_downscaling_workflow.webp){width="5in"} ## Terminology @@ -34,23 +32,9 @@ The workflows are 2. Ensemble in ccmmf/workflows repository, generates ensemble outputs 3. **Downscaling**: uses ensemble outputs to make predictions for each field in CA then aggregate to county level summaries. -## Overview - -```sh -# Site Selection -Rscript scripts/009_update_landiq.R -Rscript scripts/010_prepare_covariates.R -Rscript scripts/011_prepare_anchor_sites.R -Rscript scripts/020_cluster_and_select_design_points.R -Rscript scripts/021_clustering_diagnostics.R - -# Downscaling and Aggregation -Rscript scripts/030_extract_sipnet_output.R -Rscript scripts/031_aggregate_sipnet_output.R -Rscript scripts/040_downscale_and_aggregate.R -Rscript scripts/041_downscale_analysis.R -Rscript scripts/042_county_level_plots.R -``` + ## Workflow Steps @@ -250,8 +234,10 @@ Outputs include `multi_pft_ensemble_output.csv`, `combined_ensemble_output.csv`, ### 6. Downscale, Aggregate to County, and Plot ```sh -Rscript scripts/040_downscale_and_aggregate.R -Rscript scripts/041_downscale_analysis.R +Rscript scripts/040_downscale.R +Rscript scripts/041_aggregate_to_county.R +Rscript scripts/042_downscale_analysis.R +Rscript scripts/043_county_level_plots.R ``` Builds Random Forest models to predict carbon pools for all fields; aggregates to county-level; summarizes variable importance; and produces maps: @@ -262,44 +248,60 @@ Builds Random Forest models to predict carbon pools for all fields; aggregates t - Output maps and statistics of carbon density and totals **Inputs:** -- `out/ensemble_output.csv`: SIPNET outputs +- `model_outdir/ensemble_output.csv`: SIPNET outputs extracted in step 4 - `data/site_covariates.csv`: Environmental covariates -**Outputs from `040_downscale_and_aggregate.R`:** -- `cache/downscaling_output.RData`: Checkpoint for downstream analysis -- `model_outdir/county_summaries.csv`: County statistics (means/SDs across ensembles for stocks and densities) +**Outputs from `040_downscale.R`:** +- `model_outdir/downscaled_preds.csv`: Per-site, per-ensemble predictions with totals and densities +- `model_outdir/downscaled_preds_metadata.json`: Metadata for predictions +- `model_outdir/training_sites.csv`: Training site list per PFT and pool +- `cache/models/*_models.rds`: Saved RF models per spec +- `cache/training_data/*_training.csv`: Training covariate matrices per spec +- `model_outdir/downscaled_deltas.csv`: Optional start→end deltas when available -**Outputs from `041_downscale_analysis.R` (saved in `figures/`):** -- `_importance_partial_plots.png`: Variable importance with partial plots for top predictors -- `_ALE_predictor.png` and `_ICE_predictor.png`: ALE and ICE plots for top predictors - -**Outputs from `042_county_level_plots.R` (saved in `figures/`):** -- `county__carbon_stock.png` and `county__carbon_density.png`: County-level maps of carbon stock and density by pool - -## Technical Reference - -### Ensemble Structure +**Outputs from `041_aggregate_to_county.R`:** +- `model_outdir/county_summaries.csv`: County statistics (means/SDs across ensembles for stocks and densities) -Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. +**Outputs from `042_downscale_analysis.R` (saved in `figures/`):** +- `__importance_partial_plots.png`: Variable importance with partial plots for top predictors +- `__ALE_predictor.svg` and `__ICE_predictor.svg`: ALE and ICE plots +**Outputs from `043_county_level_plots.R` (saved in `figures/`):** +- `county___carbon_stock.webp` and `county___carbon_density.webp`: County-level maps +- `county_diff_woody_plus_annual_minus_woody__carbon_{density,stock}.webp`: Scenario difference maps +- `county_delta___carbon_{density,stock}.webp`: Start→end delta maps when available +## Running on BU Cluster -# References +Interactive session (example): -**EFI Standards** +```sh +qrsh -l h_rt=3:00:00 -pe omp 16 -l buyin +``` -Dietze, Michael C., R. Quinn Thomas, Jody Peters, Carl Boettiger, Gerbrand Koren, Alexey N. Shiklomanov, and Jaime Ashander. 2023. "A Community Convention for Ecological Forecasting: Output Files and Metadata Version 1.0." Ecosphere 14 (11): e4686. https://doi.org/10.1002/ecs2.4686. +Submit a batch job (example using downscaling step): -**Data Sources** +```sh +qsub \ + -l h_rt=4:00:00 \ + -pe omp 8 \ + -o logs/040.out \ + -e logs/040.err \ + -b y Rscript scripts/040_downscale.R +``` -Land IQ, LLC. California Crop Mapping (2014). California Department of Water Resources, 2017. https://data.cnra.ca.gov/dataset/statewide-crop-mapping. +## Documentation Site (Quarto) -Hengl, T. et al. 2017. "SoilGrids250m: Global Gridded Soil Information Based on Machine Learning." PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748 +- Preview: `quarto preview` +- Build: `quarto render` +- Publish to GitHub Pages: `quarto publish gh-pages` (publishes `_site/`) -Hersbach, H. et al. 2020. "The ERA5 Global Reanalysis." Quarterly Journal of the Royal Meteorological Society 146: 1999–2049. https://doi.org/10.1002/qj.3803 +Notes: +- Quarto config is in `_quarto.yml` and the home page is `index.qmd` (includes `README.md`). +- Uses `freeze: auto` to cache outputs; rebuild where data paths exist (see `000-config.R`). -**Models** +## Technical Reference -Braswell, Bobby H., William J. Sacks, Ernst Linder, and David S. Schimel. 2005. "Estimating Diurnal to Annual Ecosystem Parameters by Synthesis of a Carbon Flux Model with Eddy Covariance Net Ecosystem Exchange Observations." Global Change Biology 11 (2): 335–55. https://doi.org/10.1111/j.1365-2486.2005.00897.x. +### Ensemble Structure -Liaw, Andy, and Matthew Wiener. 2002. "Classification and Regression by randomForest." R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/. +Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. \ No newline at end of file diff --git a/index.qmd b/index.qmd index 7d3dbf8..437ca5c 100644 --- a/index.qmd +++ b/index.qmd @@ -1,7 +1,2 @@ ---- -title: "Downscaling" -description: "Project overview and quick links" ---- {{< include README.md >}} - diff --git a/reports/design_points_analysis.qmd b/reports/design_points_analysis.qmd index 81572ef..10f2f1f 100644 --- a/reports/design_points_analysis.qmd +++ b/reports/design_points_analysis.qmd @@ -7,8 +7,6 @@ format: toc: true --- -# Design Point Distribution - Design points are representative locations selected to capture the range of environmental conditions across California croplands from CADWR (2018). These are the locations where SIPNET is run to produce outputs later used in downscaling to all ~600k crop fields. diff --git a/reports/variable_importance.qmd b/reports/variable_importance.qmd index e27a64a..2d73a54 100644 --- a/reports/variable_importance.qmd +++ b/reports/variable_importance.qmd @@ -1,63 +1,32 @@ --- -title: "Downscaling Analysis: Variable Importance and Effects" +title: "RF Model Fit and Importance" format: html: self-contained: false - df-print: paged toc: true execute: - freeze: false # force re-execution to refresh figures after new model runs + freeze: false --- -# Overview -This page answers two questions for each model (pool × PFT): +# Variable importance and Partial Dependence Plots (PDP) -1) Which predictors matter most? (Top-5 table across ensembles) -2) How do top predictors affect predictions? (PDP panels) +This page reports variable-importance and partial dependence plots for the Random Forest models used to downscale AGB and SOC. -Notes: importance is aggregated across ensembles; PDPs come from one saved RF model and are for quick intuition. ALE/ICE details are folded at the bottom. - -```{r, include=FALSE} -options(ccmmf.quiet_banner = TRUE) -source(here::here("000-config.R")) +In the first plot, x axis is the importance of each variable, measured as the increase in mean squared error (MSE) when that variable is permuted (i.e., its values are randomly shuffled). +Higher values indicate that the variable is more important for accurate predictions. -summ_vi <- function(pft, pool, top_n = 5) { - spec_key <- paste0(janitor::make_clean_names(pft), "_", janitor::make_clean_names(pool)) - vi_file <- file.path(model_outdir, paste0("vi_", spec_key, "_by_ensemble.csv")) - if (!file.exists(vi_file)) return(NULL) - readr::read_csv(vi_file, show_col_types = FALSE) |> - group_by(pft, model_output, predictor) |> - summarize( - median_importance = median(importance, na.rm = TRUE), - iqr_importance = IQR(importance, na.rm = TRUE), - .groups = "drop" - ) |> - arrange(desc(median_importance)) |> - slice_head(n = top_n) -} +The next two plots show the partial dependence of the response variable (AGB or SOC) on the top two most important predictors, holding all other predictors at their average values. (= marginal effect of each predictor on the response variable). Here, the y axis is normalized. -pfts <- c("annual crop", "woody perennial crop") -``` +### AGB -# TotSoilCarb +Here we see that vapor pressure deficit (vapr) is the most important variable, the second most important variable is precipitation (precip). Temperature (temp) and solar radiation (srad) are also important. -#### Annual crop +PDPs show negative trend - increasing vapr leads to lower AGB. While increasing precip leads to higher AGB up to a point, after which there is a decline before levelling off. -[![Annual crop – TotSoilCarb: importance + PDP](../figures/annual_crop_tot_soil_carb_importance_partial_plots.png){width=100%}](../figures/annual_crop_tot_soil_carb_importance_partial_plots.png) +![](../figures/agb_importance.png){width=100%} -#### Woody perennial crop +### SOC (TotSoilCarb) -[![Woody perennial crop – TotSoilCarb: importance + PDP](../figures/woody_perennial_crop_tot_soil_carb_importance_partial_plots.png){width=100%}](../figures/woody_perennial_crop_tot_soil_carb_importance_partial_plots.png) +Here we see that vapor pressure deficit is again the most important variable, followed by temperature. PDPs show negative trends - increasing vapr or temp leads to lower SOC. - -# AGB - -#### Annual crop - -[![Annual crop – AGB: importance + PDP](../figures/annual_crop_agb_importance_partial_plots.png){width=100%}](../figures/annual_crop_agb_importance_partial_plots.png) - -#### Woody perennial crop - -[![Woody perennial crop – AGB: importance + PDP](../figures/woody_perennial_crop_agb_importance_partial_plots.png){width=100%}](../figures/woody_perennial_crop_agb_importance_partial_plots.png) - - +![](../figures/soc_importance.png){width=100%} diff --git a/scripts/021_clustering_diagnostics.R b/scripts/021_clustering_diagnostics.R index 985f4dc..36ff1b3 100644 --- a/scripts/021_clustering_diagnostics.R +++ b/scripts/021_clustering_diagnostics.R @@ -55,6 +55,45 @@ cluster_plot <- ggplot( ggsave_optimized("figures/cluster_plot.svg", plot = cluster_plot) +#' ### Which covariates define the clusters? (Unsupervised VI) +#' +#' We estimate each variable's contribution to cluster separation using the +#' proportion of variance explained by clusters (eta-squared, ): +#' = between-cluster variance / total variance. Higher values indicate +#' variables whose means differ more strongly across clusters. + +# Compute eta-squared () per numeric variable +num_vars <- sites_clustered |> + dplyr::select(where(is.numeric)) |> + names() +num_vars <- setdiff(num_vars, c("cluster")) + +eta2_tbl <- purrr::map_dfr(num_vars, function(vn) { + x <- sites_clustered[[vn]] + cl <- as.factor(sites_clustered$cluster) + m <- mean(x, na.rm = TRUE) + # counts and means by cluster (handle NAs per group) + g_mean <- tapply(x, cl, function(v) mean(v, na.rm = TRUE)) + g_n <- tapply(x, cl, function(v) sum(!is.na(v))) + N <- sum(!is.na(x)) + total <- stats::var(x, na.rm = TRUE) * max(N - 1, 1) + between <- sum(g_n * (g_mean - m)^2, na.rm = TRUE) + eta2 <- ifelse(total > 0, between / total, NA_real_) + tibble::tibble(variable = vn, eta2 = eta2) +}) |> + dplyr::arrange(dplyr::desc(eta2)) + +vi_cluster_plot <- ggplot(eta2_tbl, aes(x = reorder(variable, eta2), y = eta2)) + + geom_col(fill = "steelblue") + + coord_flip() + + labs( + x = "Predictor", y = expression(eta^2 ~ " (between / total variance)"), + title = "K-means cluster separation by predictor ()" + ) + + theme_minimal() + +ggsave_optimized("figures/cluster_variable_importance.svg", plot = vi_cluster_plot) + #' #' #### Stratification by Crops and Climate Regions #' diff --git a/scripts/042_downscale_analysis.R b/scripts/042_downscale_analysis.R index 12504f8..7478ae4 100644 --- a/scripts/042_downscale_analysis.R +++ b/scripts/042_downscale_analysis.R @@ -94,7 +94,8 @@ for (row in seq_len(nrow(spec_table))) { df_spec <- readr::read_csv(trn_path, show_col_types = FALSE) rf <- models[[1]] if (inherits(rf, "randomForest")) { - design_covariates <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> as.data.frame() + design_covariates <- dplyr::select(df_spec, dplyr::all_of(covariate_names)) |> + as.data.frame() } } if (is.null(rf)) { @@ -144,26 +145,17 @@ for (row in seq_len(nrow(spec_table))) { # Panel 2 and 3: Partial plots for top predictors par(mar = c(5, 5, 4, 2)) - if (requireNamespace("iml", quietly = TRUE)) { - requireNamespace("randomForest", quietly = TRUE) - pred_fun <- function(m, newdata) stats::predict(m, newdata) - predictor_obj <- iml::Predictor$new(model = rf, data = design_covariates, y = NULL, predict.function = pred_fun) - fe1 <- iml::FeatureEffect$new(predictor_obj, feature = top_predictors[1], method = "pdp") - fe2 <- iml::FeatureEffect$new(predictor_obj, feature = top_predictors[2], method = "pdp") - plot(fe1) - plot(fe2) - } else { - randomForest::partialPlot(rf, - pred.data = design_covariates, x.var = top_predictors[1], - main = paste("Partial Dependence -", top_predictors[1]), - xlab = top_predictors[1], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 - ) - randomForest::partialPlot(rf, - pred.data = design_covariates, x.var = top_predictors[2], - main = paste("Partial Dependence -", top_predictors[2]), - xlab = top_predictors[2], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 - ) - } + + randomForest::partialPlot(rf, + pred.data = design_covariates, x.var = top_predictors[1], ylim = yl, + main = paste("Partial Dependence -", top_predictors[1]), + xlab = top_predictors[1], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 + ) + randomForest::partialPlot(rf, + pred.data = design_covariates, x.var = top_predictors[2], ylim = yl, + main = paste("Partial Dependence -", top_predictors[2]), + xlab = top_predictors[2], ylab = paste("Predicted", pool, "-", pft_i), col = "steelblue", lwd = 2 + ) dev.off() PEcAn.logger::logger.info("Saved importance/PDP figure:", importance_partial_plot_fig) } From f3a5a9d214da91afa19c27ecf47970865c6f084d Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 26 Sep 2025 17:48:43 -0400 Subject: [PATCH 34/70] fix link --- README.md | 2 +- docs/workflow_documentation.md | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8230942..a49f2ca 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ R -e 'if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv"); r ### Run Sequence -See full details in . Typical sequence: +See full details about how to set up and run the workflows in the [Technical Documentation](docs/workflow_documentation.md#sec-tech-doc). ```bash # Data prep and clustering diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 350a766..606265d 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -10,6 +10,8 @@ execute: echo: false --- +# Technical Documentation {#sec-tech-doc} + The downscaling workflow predicts carbon pools (Soil Organic Carbon and Aboveground Biomass) for cropland fields in California and then aggregates these predictions to the county scale. It uses an ensemble-based approach to uncertainty propagation and analysis, maintaining ensemble structure to propagate errors through the prediction and aggregation processes. @@ -304,4 +306,4 @@ Notes: ### Ensemble Structure -Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. \ No newline at end of file +Each ensemble member represents a plausible realization given parameter and meteorological uncertainty. This ensemble structure is maintained throughout the workflow to properly propagate uncertainty. For example, downscaling is done for each ensemble member separately, and then the results are aggregated to county-level statistics. From 3b0b24e8619bf981404bbf2c40853e2cc1ae5d2f Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 26 Sep 2025 17:51:08 -0400 Subject: [PATCH 35/70] fix link --- _quarto.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_quarto.yml b/_quarto.yml index f8266b0..09f8252 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -33,7 +33,7 @@ website: - text: Docs menu: - href: docs/workflow_documentation.md - text: Workflow Documentation + text: Technical Documentation - href: docs/mixed_system_prototype.qmd text: Mixed System Prototype - href: docs/references.md From 04065d91ec39b22058dcfb89f3ed6c2abd46dc92 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Fri, 26 Sep 2025 18:20:53 -0400 Subject: [PATCH 36/70] Move sidebar to left and add gh link; add alert callout to README --- README.md | 7 +++++-- _quarto.yml | 19 +++++++++++++------ docs/workflow_documentation.md | 2 -- reports/variable_importance.qmd | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index a49f2ca..c4f8f6a 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,10 @@ -# Downscaling Workflow +# Downscaling And Aggregation Workflow + +::: {.callout-warning} +This proof of concept is untested and subject to change. Interpret results as illustrative. +::: -

repo status: WIP

## Overview diff --git a/_quarto.yml b/_quarto.yml index 09f8252..d702238 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -18,26 +18,33 @@ project: website: title: "Downscaling" search: true - navbar: - left: + sidebar: + style: docked + collapse-level: 1 + contents: - href: index.qmd text: Home - - text: Reports - menu: + - section: Reports + contents: - href: reports/downscaling_results.qmd text: Downscaling Results - href: reports/design_points_analysis.qmd text: Design Points Analysis - href: reports/variable_importance.qmd text: Downscaling Model Evaluation - - text: Docs - menu: + - section: Docs + contents: - href: docs/workflow_documentation.md text: Technical Documentation - href: docs/mixed_system_prototype.qmd text: Mixed System Prototype - href: docs/references.md text: References + tools: + - icon: exclamation-octagon + text: "Status: Draft" + - icon: github + href: https://github.com/ccmmf/downscaling format: html: diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 606265d..32fce2a 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -10,8 +10,6 @@ execute: echo: false --- -# Technical Documentation {#sec-tech-doc} - The downscaling workflow predicts carbon pools (Soil Organic Carbon and Aboveground Biomass) for cropland fields in California and then aggregates these predictions to the county scale. It uses an ensemble-based approach to uncertainty propagation and analysis, maintaining ensemble structure to propagate errors through the prediction and aggregation processes. diff --git a/reports/variable_importance.qmd b/reports/variable_importance.qmd index 2d73a54..a31fdb4 100644 --- a/reports/variable_importance.qmd +++ b/reports/variable_importance.qmd @@ -1,5 +1,5 @@ --- -title: "RF Model Fit and Importance" +title: "Downscaling Model Evaluation" format: html: self-contained: false From a44d5a0c7cbb9ccb91399bd3bf0829d68d0a8113 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 30 Nov 2025 19:02:12 -0700 Subject: [PATCH 37/70] Apply suggestion from @infotroph Co-authored-by: Chris Black --- 000-config.R | 4 ---- 1 file changed, 4 deletions(-) diff --git a/000-config.R b/000-config.R index 4551257..0db152f 100644 --- a/000-config.R +++ b/000-config.R @@ -30,10 +30,6 @@ pecan_outdir <- file.path(ccmmf_dir, "modelout", "ccmmf_phase_2b_mixed_pfts_2025 # **Variables to extract** # see docs/workflow_documentation.qmd for complete list of outputs -# outputs_to_extract <- c( -# "TotSoilCarb", -# "AGB" -# ) outputs_to_extract <- c("TotSoilCarb", "AGB") if (!PRODUCTION) { # can subset for testing From ec9e30bf49ffe37c4a4ac3ae1de18eb83fbcf401 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 30 Nov 2025 19:03:40 -0700 Subject: [PATCH 38/70] Apply suggestion from @dlebauer --- R/combine_mixed_crops.R | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index 677455a..13f23da 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -8,9 +8,7 @@ #' and treats annual as an increment relative to an annual initial baseline: #' `annual_delta = annual_value - annual_init`; `result = woody_value + annual_cover * annual_delta`. #' -#' Vectorization & recycling are handled via `vctrs::vec_recycle_common()`. All scalar -#' inputs are broadcast to the common length. Length mismatches other than 1 or the -#' common size trigger a severe error. +#' All inputs must be vectors each of length 1 or a shared common length. #' #' Validation rules (severe errors unless otherwise noted): #' * No input values may be NA (including covers, pool sizes, annual_init if required) From 0627842bd57e60a0fc8e90a9387285feceb76690 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 30 Nov 2025 19:08:54 -0700 Subject: [PATCH 39/70] Apply suggestion from @infotroph Co-authored-by: Chris Black --- R/combine_mixed_crops.R | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index 13f23da..2b88456 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -21,8 +21,8 @@ #' @param woody_value numeric. Pool size for the woody PFT (kg/m2). #' @param annual_value numeric. Pool size for the annual PFT (kg/m2). #' @param annual_init numeric, required for method = "incremental"; the initial annual pool. -#' @param annual_cover numeric. Fractional cover of the annual PFT (f_annual, 0-1). -#' @param woody_cover numeric. Fractional cover of the woody PFT (f_woody, 0-1). Must be 1 for incremental. +#' @param annual_cover numeric. Fractional cover of the annual PFT (0-1). +#' @param woody_cover numeric. Fractional cover of the woody PFT (0-1). Must be 1 when `method` is "incremental". #' @param method character. One of "weighted" or "incremental". #' #' @return numeric vector of combined values. From 169a2293df7d1422c6226c668548e0cac2c40a44 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 30 Nov 2025 19:15:05 -0700 Subject: [PATCH 40/70] Apply suggestion from @infotroph Co-authored-by: Chris Black --- R/combine_mixed_crops.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index 2b88456..d8ee946 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -41,9 +41,9 @@ #' ) combine_mixed_crops <- function(woody_value, annual_value, - annual_init = NULL, annual_cover, woody_cover, + annual_init = NULL, method = c("weighted", "incremental")) { method <- match.arg(method) From 422d3f0d8e46964e0d8b0c34691e0bb55a6f164b Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 30 Nov 2025 19:17:07 -0700 Subject: [PATCH 41/70] Apply suggestion from @dlebauer --- R/ggsave_optimized.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/ggsave_optimized.R b/R/ggsave_optimized.R index 3bc4185..7930586 100644 --- a/R/ggsave_optimized.R +++ b/R/ggsave_optimized.R @@ -1,4 +1,4 @@ -#' Save Optimized Image Formats +#' Save Web Optimized Image Formats #' #' Saves a ggplot object to a file, supporting both vector and raster formats. #' Raster formats are processed with `magick` for additional optimizations. From e63313cdd9640e77c92dad04f9d1aeafab191f19 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 1 Feb 2026 12:06:36 -0700 Subject: [PATCH 42/70] Update 000-config.R Co-authored-by: Chris Black --- 000-config.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/000-config.R b/000-config.R index 0db152f..90e69bc 100644 --- a/000-config.R +++ b/000-config.R @@ -6,7 +6,7 @@ parser <- argparse::ArgumentParser() # Dev mode speeds up workflows by subsetting data for testing and debugging parser$add_argument("--production", type = "logical", default = TRUE, - help = "Set to true for production mode, false for faster development (default: FALSE)" + help = "Set to true for production mode, false for faster development (default: TRUE)" ) args <- parser$parse_args() PRODUCTION <- args$production From ea84ef9f33a65a0e1241b1107e2a070e3c59dee3 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Sun, 1 Feb 2026 12:06:56 -0700 Subject: [PATCH 43/70] Update combine_mixed_crops.R --- R/combine_mixed_crops.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index d8ee946..d19d513 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -95,7 +95,7 @@ combine_mixed_crops <- function(woody_value, out_of_range_woody <- (woody_cover < 0 - tol) | (woody_cover > 1 + tol) if (any(out_of_range_annual | out_of_range_woody, na.rm = TRUE)) { n_bad <- sum(out_of_range_annual | out_of_range_woody, na.rm = TRUE) - PEcAn.logger::logger.severe(paste0(n_bad, " rows have cover fractions outside [0,1] (tol).")) + PEcAn.logger::logger.severe("weighted: cover fractions outside must be in the range [0,1] (+/- tol).", n_bad, "rows violate.") } if (method == "incremental") { From 95971583d99f1a6d72837410346e809864b4223d Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 18:02:36 -0500 Subject: [PATCH 44/70] track compressed archive path; log pecan_archive_tgz --- 000-config.R | 5 +++++ scripts/030_extract_sipnet_output.R | 21 ++++++++------------- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/000-config.R b/000-config.R index 4551257..ae0922a 100644 --- a/000-config.R +++ b/000-config.R @@ -28,6 +28,9 @@ if (ccmmf_dir == "") { } pecan_outdir <- file.path(ccmmf_dir, "modelout", "ccmmf_phase_2b_mixed_pfts_20250701") +# PEcAn model output to be analyzed +pecan_archive_tgz <- file.path(ccmmf_dir, "lebauer_agu_2025_20251210.tgz") + # **Variables to extract** # see docs/workflow_documentation.qmd for complete list of outputs # outputs_to_extract <- c( @@ -65,6 +68,8 @@ msg <- glue::glue( "- raw_data_dir. : {raw_data_dir}\n", "- pecan_outdir. : {pecan_outdir}\n", "- model_outdir. : {model_outdir}\n\n", + "- pecan_archive_tgz : {ifelse(is.na(pecan_archive_tgz), '', pecan_archive_tgz)}\n", + "- pecan_archive_force : {pecan_archive_force}\n\n", "### Other Settings ###\n", "- will extract variables: {paste(outputs_to_extract, collapse = ', ')}\n", "- ca_albers_crs : {ca_albers_crs}{if(ca_albers_crs == 3310) ', which is NAD83 / California Albers' else ''}\n" diff --git a/scripts/030_extract_sipnet_output.R b/scripts/030_extract_sipnet_output.R index 5b818ca..6891300 100644 --- a/scripts/030_extract_sipnet_output.R +++ b/scripts/030_extract_sipnet_output.R @@ -1,5 +1,5 @@ # This file processess the output from SIPNET ensemble runs and generates -# A long format CSV (time, site, ensemble, variable) +# A long format CSV (time, site, ensemble, variable) # that follows the Ecological Forecasting Initiative (EFI) forecast standard # Helper functions in R/efi_long_to_arrays.R will convert this to @@ -7,11 +7,6 @@ # 2. A NetCDF file (time, site, ensemble, variable) # TODO: write out EML metadata in order to be fully EFI compliant - -## First, uncompress the model output -# tar --use-compress-program="pigz -d" -xf ccmmf_phase_2a_DRAFT_output_20250516.tgz -# tar --use-compress-program="pigz -d" -xf ccmmf_phase_2a_DRAFT_output_20250516.tgz --wildcards '*.nc' - ## Second, make sure ccmmf_dir and pecan_outdir are defined in the config file source("000-config.R") PEcAn.logger::logger.info("***Starting SIPNET output extraction***") @@ -80,7 +75,7 @@ site_ids <- site_meta |> ens_ids <- 1:ensemble_size variables <- outputs_to_extract # TODO standardize this name; variables is ambiguous - # but is used by the PEcAn read.output function +# but is used by the PEcAn read.output function if (!PRODUCTION) { ## -----TESTING SUBSET----## @@ -133,7 +128,7 @@ ens_results_raw <- furrr::future_pmap_dfr( # .Bug report: https://github.com/r-quantities/units/issues/409 # Fixed in units version >= 0.8.7 .options = furrr::furrr_options(seed = TRUE) -) +) ens_results <- ens_results_raw |> dplyr::group_by(parameter, base_site_id, pft, year) |> @@ -156,12 +151,12 @@ ens_results <- ens_results_raw |> std_vars <- PEcAn.utils::standard_vars pool_vars <- std_vars |> - dplyr::filter(stringr::str_detect(tolower(Category), "pool")) |> - dplyr::pull(Variable.Name) + dplyr::filter(stringr::str_detect(tolower(Category), "pool")) |> + dplyr::pull(Variable.Name) flux_vars <- std_vars |> - dplyr::filter(stringr::str_detect(tolower(Category), "flux")) |> - dplyr::pull(Variable.Name) + dplyr::filter(stringr::str_detect(tolower(Category), "flux")) |> + dplyr::pull(Variable.Name) ens_results <- ens_results |> dplyr::mutate( @@ -191,7 +186,7 @@ if (any(ens_results$variable_type == "flux")) { # restore logging logger_level <- PEcAn.logger::logger.setLevel(logger_level) - + ensemble_output_csv <- file.path(model_outdir, "ensemble_output.csv") readr::write_csv(ens_results, ensemble_output_csv) PEcAn.logger::logger.info( From 6fbedb536446dcf5fc789c949e23446a0d60a79b Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 18:04:17 -0500 Subject: [PATCH 45/70] add configural scales, high res export --- scripts/043_county_level_plots.R | 98 +++++++++++++++++++++++++++++++- 1 file changed, 97 insertions(+), 1 deletion(-) diff --git a/scripts/043_county_level_plots.R b/scripts/043_county_level_plots.R index 19de3f8..e6ec75e 100644 --- a/scripts/043_county_level_plots.R +++ b/scripts/043_county_level_plots.R @@ -1,4 +1,23 @@ ## Create county-level plots +## +## Usage: +## Rscript scripts/043_county_level_plots.R +## +## Inputs (from 000-config.R paths): +## - model_outdir/out/county_summaries.csv (from 041_aggregate_to_county.R) +## - model_outdir/out/downscaled_preds.csv (for mixed-scenario diffs and deltas) +## - model_outdir/out/downscaled_deltas.csv (optional; for delta maps) +## - data/ca_counties.gpkg (county boundaries) +## +## Outputs: +## - figures/county___carbon_{density,stock}.webp +## - figures/county_diff_woody_plus_annual_minus_woody__carbon_{density,stock}.webp +## - figures/county_delta___carbon_{density,stock}.webp +## - Additional poster-grade exports for selected maps (PDF/SVG/PNG) +## +## Notes: +## - Requires R/ggsave_optimized.R to be sourced via 000-config.R +## - Uses facet columns for Mean and SD in each output source("000-config.R") PEcAn.logger::logger.info("Creating county-level plots") county_boundaries <- sf::st_read(file.path(data_dir, "ca_counties.gpkg")) @@ -28,6 +47,48 @@ combos <- co_preds_to_plot |> dplyr::distinct(pft, model_output, units) |> dplyr::arrange(pft, model_output, units) +# Optional filters via env vars to generate a subset only +pft_filter <- Sys.getenv("PFT_FILTER", unset = "") +pool_filter <- Sys.getenv("POOL_FILTER", unset = "") +units_filter <- Sys.getenv("UNITS_FILTER", unset = "") +if (nzchar(pft_filter)) combos <- dplyr::filter(combos, pft == !!pft_filter) +if (nzchar(pool_filter)) combos <- dplyr::filter(combos, model_output == !!pool_filter) +if (nzchar(units_filter)) combos <- dplyr::filter(combos, units == !!units_filter) + +# Color scale controls +# Priority: COLOR_SCALE if set -> POSTER_PALETTE if TRUE -> default (plasma) +use_poster_palette <- isTRUE(as.logical(Sys.getenv("POSTER_PALETTE", "FALSE"))) +color_scale_opt <- tolower(Sys.getenv("COLOR_SCALE", unset = "")) +custom_colors <- Sys.getenv("CUSTOM_COLORS", unset = "") # comma-separated hex +custom_values <- Sys.getenv("CUSTOM_VALUES", unset = "") # comma-separated numerics in [0,1] + +fill_scale <- NULL +if (nzchar(color_scale_opt)) { + if (color_scale_opt %in% c("viridis","plasma","magma","inferno","cividis")) { + fill_scale <- ggplot2::scale_fill_viridis_c(option = color_scale_opt, na.value = "white") + } else if (color_scale_opt == "custom" && nzchar(custom_colors)) { + cols <- trimws(strsplit(custom_colors, ",", fixed = TRUE)[[1]]) + vals <- if (nzchar(custom_values)) as.numeric(trimws(strsplit(custom_values, ",", fixed = TRUE)[[1]])) else NULL + if (is.null(vals)) { + fill_scale <- ggplot2::scale_fill_gradientn(colors = cols, na.value = "white") + } else { + fill_scale <- ggplot2::scale_fill_gradientn(colors = cols, values = vals, na.value = "white") + } + } +} +if (is.null(fill_scale)) { + if (use_poster_palette) { + # Poster palette requested earlier: light green -> poster green -> poster purple + fill_scale <- ggplot2::scale_fill_gradientn( + colors = c("#d7f2d3", "#62b874", "#4a2f86"), + values = c(0, 0.6, 1), + na.value = "white" + ) + } else { + fill_scale <- ggplot2::scale_fill_viridis_c(option = "plasma", na.value = "white") + } +} + p <- purrr::pmap( list(combos$pft, combos$model_output, combos$units), function(.pft, pool, unit) { @@ -39,7 +100,7 @@ p <- purrr::pmap( ) + ggplot2::geom_sf(data = county_boundaries, mapping = ggplot2::aes(geometry = geom), fill = "white", color = "black", inherit.aes = FALSE) + ggplot2::geom_sf() + - ggplot2::scale_fill_viridis_c(option = "plasma", na.value = "white") + + fill_scale + ggplot2::theme_minimal() + ggplot2::facet_grid(model_output ~ stat) + ggplot2::labs( @@ -60,6 +121,41 @@ p <- purrr::pmap( width = 10, height = 5, units = "in", dpi = 96, bg = "white" ) + # Also export poster-grade formats for the AGB density map of woody perennials + if (.pft == "woody perennial crop" && pool == "AGB" && unit_key == "density") { + basefile <- here::here("figures", paste0("county_", pft_key, "_", pool, "_carbon_", unit_key)) + PEcAn.logger::logger.info("Exporting poster-grade versions (28.36x14.18 in):", paste0(basename(basefile), "{.pdf,.svg,.png}")) + # Vector formats (preferred for print) at 14.18 in tall (2:1 aspect) + ggsave_optimized(filename = paste0(basefile, ".pdf"), plot = .plt, width = 28.36, height = 14.18, units = "in", bg = "white") + ggsave_optimized(filename = paste0(basefile, ".svg"), plot = .plt, width = 28.36, height = 14.18, units = "in", bg = "white") + # High-resolution raster as fallback + ggsave_optimized(filename = paste0(basefile, ".png"), plot = .plt, width = 28.36, height = 14.18, units = "in", dpi = 300, bg = "white") + } + # High-res PNG for woody perennials SOC stock (12" tall for print) + if (.pft == "woody perennial crop" && pool == "TotSoilCarb" && unit_key == "stock") { + basefile_soc_stock <- here::here("figures", paste0("county_", pft_key, "_", pool, "_carbon_", unit_key)) + # Maintain 2:1 aspect (width:height) used above: width=24in, height=12in + PEcAn.logger::logger.info("Exporting 12in-tall high-res PNG:", paste0(basename(basefile_soc_stock), ".png")) + ggsave_optimized(filename = paste0(basefile_soc_stock, ".png"), plot = .plt, + width = 28.36, height = 14.18, units = "in", dpi = 300, bg = "white") + } + # High-res exports for annual crop SOC stock + if (.pft == "annual crop" && pool == "TotSoilCarb" && unit_key == "stock") { + basefile_ann_soc_stock <- here::here("figures", paste0("county_", pft_key, "_", pool, "_carbon_", unit_key)) + PEcAn.logger::logger.info("Exporting high-res annual SOC stock (28.36x14.18 in):", paste0(basename(basefile_ann_soc_stock), "{.pdf,.png}")) + ggsave_optimized(filename = paste0(basefile_ann_soc_stock, ".pdf"), plot = .plt, + width = 28.36, height = 14.18, units = "in", bg = "white") + ggsave_optimized(filename = paste0(basefile_ann_soc_stock, ".png"), plot = .plt, + width = 28.36, height = 14.18, units = "in", dpi = 300, bg = "white") + } + # Export high-res PDF/PNG for County-level woody+herbaceous (mixed) SOC maps + if (pool == "TotSoilCarb" && (grepl("woody", .pft, ignore.case = TRUE)) && + (grepl("annual", .pft, ignore.case = TRUE) || grepl("herb", .pft, ignore.case = TRUE))) { + basefile_soc <- here::here("figures", paste0("county_", pft_key, "_", pool, "_carbon_", unit_key)) + PEcAn.logger::logger.info("Exporting high-res SOC versions (28.36x14.18 in):", paste0(basename(basefile_soc), "{.pdf,.png}")) + ggsave_optimized(filename = paste0(basefile_soc, ".pdf"), plot = .plt, width = 28.36, height = 14.18, units = "in", bg = "white") + ggsave_optimized(filename = paste0(basefile_soc, ".png"), plot = .plt, width = 28.36, height = 14.18, units = "in", dpi = 300, bg = "white") + } return(.plt) } ) From d8c57e5b7f5d62339a83890d9387585870030e8a Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 18:40:45 -0500 Subject: [PATCH 46/70] update ggsave_optimized to also support high res * color-blind friendly figs --- R/ggsave_optimized.R | 71 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 67 insertions(+), 4 deletions(-) diff --git a/R/ggsave_optimized.R b/R/ggsave_optimized.R index 3bc4185..1448036 100644 --- a/R/ggsave_optimized.R +++ b/R/ggsave_optimized.R @@ -11,6 +11,11 @@ #' @param dpi Numeric. Resolution for raster formats. Default is 96. #' @param quality Numeric. JPEG/WebP quality (1-100). Default is 80. #' @param bg Character. Background color. Default is "white". +#' @param use_cairo Logical. Use Cairo/svglite for vector devices if available. Default TRUE. +#' @param use_ragg Logical. Use ragg devices for raster if available. Default TRUE. +#' @param png_quantize Logical. Quantize PNG colors via magick to reduce size. Default TRUE. +#' @param png_max Integer. Max colors for PNG quantization. Default 256. +#' @param webp_lossless Logical. Encode WebP losslessly (overrides quality). Default FALSE. #' @param ... Additional arguments passed to `ggplot2::ggsave`. #' #' @return Invisibly returns the filename of the saved plot. @@ -28,8 +33,35 @@ ggsave_optimized <- function( dpi = 96, quality = 80, bg = "white", + use_cairo = TRUE, + use_ragg = TRUE, + png_quantize = TRUE, + png_max = 256, + webp_lossless = FALSE, ...) { ext <- tolower(tools::file_ext(filename)) + # Choose device where applicable + dev <- NULL + if (ext %in% c("pdf", "svg", "eps")) { + if (isTRUE(use_cairo) && isTRUE(grDevices::capabilities()["cairo"])) { + if (ext == "pdf") dev <- grDevices::cairo_pdf + if (ext == "svg") { + if (requireNamespace("svglite", quietly = TRUE)) { + dev <- svglite::svglite + } else { + dev <- grDevices::svg + } + } + if (ext == "eps") dev <- grDevices::cairo_ps + } else { + if (ext == "svg") dev <- grDevices::svg + } + } else if (ext %in% c("png", "jpg", "jpeg")) { + if (isTRUE(use_ragg) && requireNamespace("ragg", quietly = TRUE)) { + if (ext == "png") dev <- ragg::agg_png + if (ext %in% c("jpg", "jpeg")) dev <- ragg::agg_jpeg + } + } # Use ggsave for vector formats and raster formats other than WebP if (ext %in% c("svg", "pdf", "eps", "png", "jpg", "jpeg")) { @@ -41,13 +73,17 @@ ggsave_optimized <- function( units = units, dpi = dpi, bg = bg, + device = dev, + limitsize = FALSE, ... ) # Optimize PNGs with magick if (ext == "png") { - img <- magick::image_read(filename) - img <- magick::image_quantize(img, max = 256) - magick::image_write(img, path = filename, format = "png", compression_level = 9) + if (isTRUE(png_quantize)) { + img <- magick::image_read(filename) + img <- magick::image_quantize(img, max = png_max) + magick::image_write(img, path = filename, format = "png", compression_level = 9) + } } return(invisible(filename)) } @@ -63,10 +99,37 @@ ggsave_optimized <- function( units = units, dpi = dpi, bg = bg, + device = if (isTRUE(use_ragg) && requireNamespace("ragg", quietly = TRUE)) ragg::agg_png else NULL, + limitsize = FALSE, + ... + ) + img <- magick::image_read(tmp) + if (isTRUE(webp_lossless)) { + magick::image_write(img, path = filename, format = "webp", quality = 100) + } else { + magick::image_write(img, path = filename, format = "webp", quality = quality) + } + unlink(tmp) + return(invisible(filename)) + } + + # Support TIFF via magick + if (ext %in% c("tif", "tiff")) { + tmp <- withr::local_tempfile(fileext = ".png") + ggplot2::ggsave( + filename = tmp, + plot = plot, + width = width, + height = height, + units = units, + dpi = dpi, + bg = bg, + device = if (isTRUE(use_ragg) && requireNamespace("ragg", quietly = TRUE)) ragg::agg_png else NULL, + limitsize = FALSE, ... ) img <- magick::image_read(tmp) - magick::image_write(img, path = filename, format = "webp", quality = quality) + magick::image_write(img, path = filename, format = "tiff") unlink(tmp) return(invisible(filename)) } From 0bdf3a49d456a9d5c45f1545409d96af5eef2bb6 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 18:42:09 -0500 Subject: [PATCH 47/70] document uncompressing model outputs --- docs/workflow_documentation.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 32fce2a..d337fe0 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -200,6 +200,29 @@ PEcAn standard units are SI, following the Climate Forecasting standards: ### 4. Extract SIPNET Output + +First, uncompress the model output. Only the netCDF files are needed. + +```sh +# Set paths +ccmmf_dir=/projectnb2/dietzelab/ccmmf +archive_file="$ccmmf_dir/lebauer_agu_2025_20251210.tgz" +output_dir="$ccmmf_dir/modelout/ccmmf_phase_3_scenarios_20251210" + +# Ensure output directory exists +mkdir -p "$output_dir" + +# +tar --use-compress-program="pigz -d" -xf \ + "$archive_file" \ + -C "$output_dir" \ + --strip-components=1 \ + --no-same-owner #\ +# --wildcards \ +# 'lebauer_agu_2025/output_/out/' \ +# 'lebauer_agu_2025/output_/out/ENS--/[0-9][0-9][0-9][0-9].nc' +``` + ```sh Rscript scripts/030_extract_sipnet_output.R ``` From fffdd706f6c1adf369aa219e2967094286ac3d51 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 19:17:38 -0500 Subject: [PATCH 48/70] clarify that combine_mixed_crops works on pools and cumulative fluxes; pass annual_init unconditionally --- R/combine_mixed_crops.R | 20 +- docs/mixed_system_prototype.qmd | 349 ++++++++++++++++---------------- docs/workflow_documentation.md | 7 +- 3 files changed, 198 insertions(+), 178 deletions(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index d19d513..b389422 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -1,7 +1,7 @@ #' Combine two-PFT outputs to represent mixed cropping systems #' -#' Rules for combining woody and annual PFT outputs to represent a -#' mixed cropping system. Supports two methods: +#' Rules for combining woody and annual PFT outputs (stocks or accumulated +#' flux totals) to represent a mixed cropping system. Supports two methods: #' - "weighted": PFTs partition area (woody_cover + annual_cover = 1) and the #' output is a weighted average: `woody_cover * woody_value + annual_cover * annual_value`. #' - "incremental": preserves the full-area woody baseline (requires `woody_cover == 1`) @@ -10,6 +10,8 @@ #' #' All inputs must be vectors each of length 1 or a shared common length. #' +#' This function is intended to be applied to pools or integrated fluxes (e.g., annual NPP, annual N2O flux). +#' #' Validation rules (severe errors unless otherwise noted): #' * No input values may be NA (including covers, pool sizes, annual_init if required) #' * Covers must lie within [0,1] @@ -18,8 +20,12 @@ #' * Method "weighted": rows whose `woody_cover + annual_cover` differ from 1 by more than tolerance #' are set to NA in the result; a single aggregated warning is emitted listing the count #' -#' @param woody_value numeric. Pool size for the woody PFT (kg/m2). -#' @param annual_value numeric. Pool size for the annual PFT (kg/m2). +#' Inputs may be any quantity expressed per unit area (stocks such as +#' kg/m2 or fluxes accumulated over a defined time step, e.g., kg/m2 per +#' hour or year). +#' +#' @param woody_value numeric. Pool or accumulated flux for the woody PFT (kg/m2). +#' @param annual_value numeric. Pool or accumulated flux for the annual PFT (kg/m2). #' @param annual_init numeric, required for method = "incremental"; the initial annual pool. #' @param annual_cover numeric. Fractional cover of the annual PFT (0-1). #' @param woody_cover numeric. Fractional cover of the woody PFT (0-1). Must be 1 when `method` is "incremental". @@ -54,9 +60,9 @@ combine_mixed_crops <- function(woody_value, woody_value = woody_value, annual_value = annual_value, annual_cover = annual_cover, - woody_cover = woody_cover + woody_cover = woody_cover, + annual_init = annual_init ) - if (!is.null(annual_init)) recycle_inputs$annual_init <- annual_init recycled <- vctrs::vec_recycle_common( !!!recycle_inputs, @@ -67,7 +73,7 @@ combine_mixed_crops <- function(woody_value, annual_value <- recycled$annual_value annual_cover <- recycled$annual_cover woody_cover <- recycled$woody_cover - if (!is.null(annual_init)) annual_init <- recycled$annual_init + annual_init <- recycled$annual_init # Internal tolerance for floating point comparisons tol <- 1e-3 diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index 0c3d0df..a422eae 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -77,6 +77,9 @@ Technical details of these steps are described in more detail below. ### Mixture Methods {#sec-mixture-methods} +These formulas are implemented in `combine_mixed_crops()`. This function expects additive quantities +per unit area (stocks or flux totals already integrated over a defined time step. + #### Two Cases for Mixed Cropping Systems: Discrete and Overlap In a mixed cropping system, we define $f_{woody}$ and $f_{annual}$ as the percent contribution of a crop or plant functional type to ecosystem dynamics. This contribution is not "canopy cover" since that changes over time. Think of this as "percent of a monoculture". This method will build on SIPNET's single PFT design - the outputs from two separate single PFT runs simulate will be combined. @@ -150,8 +153,8 @@ These simulations were run from 2016 through 2024 across 10 sites, with an ensem library(tidyverse) library(lubridate) library(here) -library(stringr) # for label wrapping -library(grid) # grid::unit() for panel spacing +library(stringr) # for label wrapping +library(grid) # grid::unit() for panel spacing source(here::here("000-config.R")) if (!dir.exists(model_outdir)) { @@ -166,33 +169,37 @@ mix_label_fn <- function(x) stringr::str_replace_all(x, " \\+ ", "\n+ ") # labeller that applies the above to mix_description (fallback to a normal wrap for other strips) mix_labeller <- labeller( - mix_description = mix_label_fn, - .default = label_wrap_gen(width = 12) + mix_description = mix_label_fn, + .default = label_wrap_gen(width = 12) ) pal_mix <- c( - monoculture = "#1b9e77", - discrete = "#d95f02", - overlap = "#7570b3" + monoculture = "#1b9e77", + discrete = "#d95f02", + overlap = "#7570b3" ) base_theme <- theme_minimal(base_size = 10) + - theme( - legend.position = "top", - strip.text = element_text(size = 9, lineheight = 0.95, - margin = margin(2, 2, 2, 2)), - axis.text.x = element_text(angle = 0, hjust = 0.5), - plot.margin = margin(5, 5, 5, 5), - plot.title.position = "plot", - # center title and subtitle - plot.title = element_text(hjust = 0.5), - plot.subtitle = element_text(hjust = 0.5) - ) + theme( + legend.position = "top", + strip.text = element_text( + size = 9, lineheight = 0.95, + margin = margin(2, 2, 2, 2) + ), + axis.text.x = element_text(angle = 0, hjust = 0.5), + plot.margin = margin(5, 5, 5, 5), + plot.title.position = "plot", + # center title and subtitle + plot.title = element_text(hjust = 0.5), + plot.subtitle = element_text(hjust = 0.5) + ) wrap_labeller <- labeller(.default = label_wrap_gen(width = 12)) -year_scale_dt <- scale_x_datetime(date_breaks = "1 year", - date_labels = "%Y", - expand = expansion(mult = c(0.01, 0.02))) +year_scale_dt <- scale_x_datetime( + date_breaks = "1 year", + date_labels = "%Y", + expand = expansion(mult = c(0.01, 0.02)) +) ``` ## Mixed-system dataset @@ -223,12 +230,12 @@ combined_output <- readr::read_csv( ) ), mix_type = case_when( - mix_description %in% c("100% annual", "100% woody") ~ "monoculture", - scenario == "discrete" ~ "discrete", - scenario == "overlap" ~ "overlap", - TRUE ~ "other" - ) - ) |> + mix_description %in% c("100% annual", "100% woody") ~ "monoculture", + scenario == "discrete" ~ "discrete", + scenario == "overlap" ~ "overlap", + TRUE ~ "other" + ) + ) |> arrange(mix_description) # variables present (used by time-series sections below) @@ -247,60 +254,60 @@ This section compares three treatment scenarios on the same set of woody sites u ```{r woody-treatments} treat_path <- file.path(model_outdir, "treatments_woody_sites.csv") if (file.exists(treat_path)) { - tr <- readr::read_csv(treat_path, show_col_types = FALSE) - - # County-level per-ensemble totals and densities, then average across ensembles - tr_county <- tr |> - dplyr::group_by(model_output, scenario, county, ensemble) |> - dplyr::summarise(total_Mg = sum(total_c_Mg, na.rm = TRUE), total_ha = sum(area_ha, na.rm = TRUE), .groups = "drop") |> - dplyr::mutate( - density_Mg_ha = dplyr::if_else(total_ha > 0, total_Mg / total_ha, NA_real_), - total_Tg = PEcAn.utils::ud_convert(total_Mg, "Mg", "Tg") - ) |> - dplyr::group_by(model_output, scenario, county) |> - dplyr::summarise( - mean_total_Tg = mean(total_Tg, na.rm = TRUE), - mean_density_Mg_ha = mean(density_Mg_ha, na.rm = TRUE), - .groups = "drop" - ) + tr <- readr::read_csv(treat_path, show_col_types = FALSE) + + # County-level per-ensemble totals and densities, then average across ensembles + tr_county <- tr |> + dplyr::group_by(model_output, scenario, county, ensemble) |> + dplyr::summarise(total_Mg = sum(total_c_Mg, na.rm = TRUE), total_ha = sum(area_ha, na.rm = TRUE), .groups = "drop") |> + dplyr::mutate( + density_Mg_ha = dplyr::if_else(total_ha > 0, total_Mg / total_ha, NA_real_), + total_Tg = PEcAn.utils::ud_convert(total_Mg, "Mg", "Tg") + ) |> + dplyr::group_by(model_output, scenario, county) |> + dplyr::summarise( + mean_total_Tg = mean(total_Tg, na.rm = TRUE), + mean_density_Mg_ha = mean(density_Mg_ha, na.rm = TRUE), + .groups = "drop" + ) - # Bring in county boundaries for maps - county_boundaries <- sf::st_read(file.path(data_dir, "ca_counties.gpkg"), quiet = TRUE) - - # Differences vs 100% woody baseline - base <- tr_county |> - dplyr::filter(scenario == "woody_100") |> - dplyr::select(model_output, county, base_total_Tg = mean_total_Tg, base_density_Mg_ha = mean_density_Mg_ha) - diffs <- tr_county |> - dplyr::filter(scenario != "woody_100") |> - dplyr::left_join(base, by = c("model_output", "county")) |> - dplyr::mutate( - delta_total_Tg = mean_total_Tg - base_total_Tg, - delta_density_Mg_ha = mean_density_Mg_ha - base_density_Mg_ha + # Bring in county boundaries for maps + county_boundaries <- sf::st_read(file.path(data_dir, "ca_counties.gpkg"), quiet = TRUE) + + # Differences vs 100% woody baseline + base <- tr_county |> + dplyr::filter(scenario == "woody_100") |> + dplyr::select(model_output, county, base_total_Tg = mean_total_Tg, base_density_Mg_ha = mean_density_Mg_ha) + diffs <- tr_county |> + dplyr::filter(scenario != "woody_100") |> + dplyr::left_join(base, by = c("model_output", "county")) |> + dplyr::mutate( + delta_total_Tg = mean_total_Tg - base_total_Tg, + delta_density_Mg_ha = mean_density_Mg_ha - base_density_Mg_ha + ) |> + dplyr::left_join(county_boundaries, by = "county") + + # Simple maps: density deltas by scenario and pool + ggplot2::ggplot(diffs, ggplot2::aes(geometry = geom, fill = delta_density_Mg_ha)) + + ggplot2::geom_sf(color = "black", fill = NA) + + ggplot2::geom_sf() + + ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + + ggplot2::facet_grid(model_output ~ scenario) + + ggplot2::theme_minimal() + + ggplot2::labs(title = "County-level density delta vs 100% woody (on woody sites)", fill = "Delta (Mg/ha)") + + # Table preview + DT::datatable( + diffs |> + dplyr::select(model_output, scenario, county, delta_total_Tg, delta_density_Mg_ha) |> + dplyr::arrange(model_output, scenario, county), + options = list(pageLength = 10, scrollY = "50vh", scroller = TRUE, deferRender = TRUE), + extensions = c("Scroller"), + rownames = FALSE ) |> - dplyr::left_join(county_boundaries, by = "county") - - # Simple maps: density deltas by scenario and pool - ggplot2::ggplot(diffs, ggplot2::aes(geometry = geom, fill = delta_density_Mg_ha)) + - ggplot2::geom_sf(color = "black", fill = NA) + - ggplot2::geom_sf() + - ggplot2::scale_fill_gradient2(low = "royalblue", mid = "white", high = "firebrick", midpoint = 0, na.value = "white") + - ggplot2::facet_grid(model_output ~ scenario) + - ggplot2::theme_minimal() + - ggplot2::labs(title = "County-level density delta vs 100% woody (on woody sites)", fill = "Delta (Mg/ha)") - - # Table preview - DT::datatable( - diffs |> - dplyr::select(model_output, scenario, county, delta_total_Tg, delta_density_Mg_ha) |> - dplyr::arrange(model_output, scenario, county), - options = list(pageLength = 10, scrollY = "50vh", scroller = TRUE, deferRender = TRUE), - extensions = c('Scroller'), - rownames = FALSE - ) |> - DT::formatSignif(columns = c("delta_total_Tg", "delta_density_Mg_ha"), digits = 2) + DT::formatSignif(columns = c("delta_total_Tg", "delta_density_Mg_ha"), digits = 2) } else { - cat("treatments_woody_sites.csv not found; run scripts/040_downscale.R to generate.") + cat("treatments_woody_sites.csv not found; run scripts/040_downscale.R to generate.") } ``` @@ -310,45 +317,48 @@ Next we select representative sites based on their productivity. We use AGB in t ```{r select-representative-sites} agb_site <- combined_output |> - filter(mix_description == "100% woody", variable == "AGB") |> - group_by(site_id) |> - summarise(agb_mean = mean(value_combined), .groups = "drop") + filter(mix_description == "100% woody", variable == "AGB") |> + group_by(site_id) |> + summarise(agb_mean = mean(value_combined), .groups = "drop") if (nrow(agb_site) >= 3) { - qs_agb <- quantile( - agb_site$agb_mean, - probs = c(0.15, 0.50, 0.85) - ) - analysis_sites <- purrr::map_chr(qs_agb, function(qv) { - agb_site |> - mutate(diff = abs(agb_mean - qv)) |> - slice_min(diff, n = 1, with_ties = FALSE) |> - pull(site_id) - }) |> unique() - - if (length(analysis_sites) != 3) { - PEcAn.logger::logger.error("Representative sites not found for all quantiles") - } - site_map <- tibble( - site_id = analysis_sites, - agb_class = factor(c("low_agb", "med_agb", "high_agb"), - levels = c("low_agb", "med_agb", "high_agb")) - ) + qs_agb <- quantile( + agb_site$agb_mean, + probs = c(0.15, 0.50, 0.85) + ) + analysis_sites <- purrr::map_chr(qs_agb, function(qv) { + agb_site |> + mutate(diff = abs(agb_mean - qv)) |> + slice_min(diff, n = 1, with_ties = FALSE) |> + pull(site_id) + }) |> unique() + + if (length(analysis_sites) != 3) { + PEcAn.logger::logger.error("Representative sites not found for all quantiles") + } + site_map <- tibble( + site_id = analysis_sites, + agb_class = factor(c("low_agb", "med_agb", "high_agb"), + levels = c("low_agb", "med_agb", "high_agb") + ) + ) } else { - PEcAn.logger::logger.severe("Looks like something went wrong", - "There are only", nrow(agb_site), " sites with AGB data. ") + PEcAn.logger::logger.severe( + "Looks like something went wrong", + "There are only", nrow(agb_site), " sites with AGB data. " + ) } # join and relabel classes for plotting (use human-friendly labels) output_representative_sites <- site_map |> - left_join(combined_output, by = "site_id") |> - mutate( - agb_class = factor( - agb_class, - levels = c("low_agb", "med_agb", "high_agb"), - labels = c("low AGB site", "med AGB site", "high AGB site") + left_join(combined_output, by = "site_id") |> + mutate( + agb_class = factor( + agb_class, + levels = c("low_agb", "med_agb", "high_agb"), + labels = c("low AGB site", "med AGB site", "high AGB site") + ) ) - ) ``` @@ -360,30 +370,30 @@ TODO add initial conditions to table ```{r summary-table} summary_stats <- combined_output |> - filter(year(datetime) == max(year(datetime))) |> - group_by(variable, mix_description, mix_type) |> - summarise( - mean = mean(value_combined), - sd = sd(value_combined), - .groups = "drop" - ) |> - mutate( - # format as "mean (sd)" with one decimal place - val = sprintf("%.1f (%.1f)", mean, sd) - ) |> - select(-mean, -sd) |> - tidyr::pivot_wider( - id_cols = c(mix_description, mix_type), - names_from = variable, - values_from = val - ) + filter(year(datetime) == max(year(datetime))) |> + group_by(variable, mix_description, mix_type) |> + summarise( + mean = mean(value_combined), + sd = sd(value_combined), + .groups = "drop" + ) |> + mutate( + # format as "mean (sd)" with one decimal place + val = sprintf("%.1f (%.1f)", mean, sd) + ) |> + select(-mean, -sd) |> + tidyr::pivot_wider( + id_cols = c(mix_description, mix_type), + names_from = variable, + values_from = val + ) # TODO future iterations, don't assume columns; # use glue to construct dynamically knitr::kable( - summary_stats, - col.names = c("Mixture", "Type", "AGB", "SOC"), - caption = "Summary statistics (mean and standard deviation) for carbon pools by mixture type." + summary_stats, + col.names = c("Mixture", "Type", "AGB", "SOC"), + caption = "Summary statistics (mean and standard deviation) for carbon pools by mixture type." ) ``` @@ -405,61 +415,61 @@ last_year <- lubridate::year(last_dt) # Final-time values per (site x ensemble) combined_output_final_site_ens <- combined_output |> - filter(datetime == last_dt) |> - group_by(variable, mix_description, mix_type, site_id, ensemble_id) |> - summarise( - value = median(value_combined), - .groups = "drop" - ) + filter(datetime == last_dt) |> + group_by(variable, mix_description, mix_type, site_id, ensemble_id) |> + summarise( + value = median(value_combined), + .groups = "drop" + ) # Points: site-level means across ensembles at final time combined_output_site_mean <- combined_output_final_site_ens |> - group_by(variable, mix_description, mix_type, site_id) |> - summarise( - site_mean = mean(value), - .groups = "drop" - ) + group_by(variable, mix_description, mix_type, site_id) |> + summarise( + site_mean = mean(value), + .groups = "drop" + ) # Error bars + median dot: summarise across sites (using site-level means) combined_output_summaries <- combined_output_site_mean |> - group_by(variable, mix_description, mix_type) |> - summarise( - q25 = quantile(site_mean, 0.25), - q50 = quantile(site_mean, 0.5), - q75 = quantile(site_mean, 0.75), - .groups = "drop" - ) - - # Plot: ensemble points (jitter) + IQR + median; facet by variable - ggplot() + - geom_jitter( + group_by(variable, mix_description, mix_type) |> + summarise( + q25 = quantile(site_mean, 0.25), + q50 = quantile(site_mean, 0.5), + q75 = quantile(site_mean, 0.75), + .groups = "drop" + ) + +# Plot: ensemble points (jitter) + IQR + median; facet by variable +ggplot() + + geom_jitter( data = combined_output_site_mean, aes(x = mix_description, y = site_mean, color = mix_type), width = 0.15, alpha = 0.5, size = 1.2 - ) + - geom_errorbar( + ) + + geom_errorbar( data = combined_output_summaries, aes(x = mix_description, ymin = q25, ymax = q75, color = mix_type), width = 0.4, linewidth = 0.6 - ) + - geom_point( + ) + + geom_point( data = combined_output_summaries, aes(x = mix_description, y = q50, color = mix_type), size = 2 - ) + - facet_wrap(~ variable, scales = "free_y", nrow = 1, labeller = mix_labeller) + # use mix_labeller - scale_x_discrete(labels = mix_label_fn) + # wrap x labels into two lines - scale_color_manual(values = pal_mix) + - labs( + ) + + facet_wrap(~variable, scales = "free_y", nrow = 1, labeller = mix_labeller) + # use mix_labeller + scale_x_discrete(labels = mix_label_fn) + # wrap x labels into two lines + scale_color_manual(values = pal_mix) + + labs( title = paste0("End-Year Carbon Pool Sizes (", last_year, ")"), subtitle = "Points = site means; bars = IQR; dot = median", x = "Mixture", y = expression("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), color = "Type" - ) + - base_theme + - theme(axis.text.x = element_text(angle = 35, hjust = 1)) - + ) + + base_theme + + theme(axis.text.x = element_text(angle = 35, hjust = 1)) + ``` ### Effect Size (Final Year AGB and SOC Compared to Annual Monoculture) @@ -666,16 +676,17 @@ for (v in vars) { aes(datetime, q50, group = mix_description), color = "black", linewidth = .5 ) + - facet_wrap(~mix_description, - nrow = 1, - labeller = mix_labeller, - scales = "free_y") + + facet_wrap(~mix_description, + nrow = 1, + labeller = mix_labeller, + scales = "free_y" + ) + year_scale_dt + scale_color_manual(values = pal_mix) + labs( title = paste("Time Series (Median & IQR):", v), subtitle = "Lines=ensembles; black=median; ribbon=IQR", - x = NULL, y = bquote("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), + x = NULL, y = bquote("Carbon Pool Size (" * kg ~ C ~ m^-2 * ")"), color = "Category" ) + base_theme diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index d337fe0..3a1f019 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -245,12 +245,15 @@ Extracts and formats SIPNET outputs for downscaling: Rscript scripts/031_aggregate_sipnet_output.R ``` -Simulates mixed-cropping scenarios by combining outputs across two PFTs using a mixed aggregation function (see Mixed System Prototype). Two methods are supported: +Simulates mixed-cropping scenarios by combining outputs across two PFTs using `combine_mixed_crops()` (see Mixed System Prototype). Two methods are supported: - weighted: area-partitioned mix where `woody_cover + annual_cover = 1` - incremental: preserve woody baseline (`woody_cover = 1`) and add the annual delta scaled by `annual_cover` -The current analysis uses the weighted method to represent ground cover in orchards and vineyards. +`combine_mixed_crops()` is pool-agnostic: pass any additive quantity expressed per unit area, including instantaneous stocks +(`kg/m^2`) or total flux totals that have already been accumulated over the SIPNET output interval (e.g., hourly or annual `kg/m^2` of NEE). +(`kg/m^2`) or total flux totals that have already been accumulated over the SIPNET output interval (e.g., hourly or annual `kg/m^2` of NEE). +(`kg/m^2`) or total flux over a defined time step. Outputs include `multi_pft_ensemble_output.csv`, `combined_ensemble_output.csv`, and `ensemble_output_with_mixed.csv` with a synthetic mixed PFT. From e1dcaeda8c878b674cc7464146a2a438c1e35572 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 19:21:18 -0500 Subject: [PATCH 49/70] dropped intermediate list and now call vec_recycle_common with named vectors --- R/combine_mixed_crops.R | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index b389422..4ec0bd2 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -53,20 +53,23 @@ combine_mixed_crops <- function(woody_value, method = c("weighted", "incremental")) { method <- match.arg(method) + # Internal tolerance for floating point comparisons + tol <- 1e-3 # Accept scalars for cover and vectors for values - # Collect inputs for recycling (add annual_init only if provided) - recycle_inputs <- list( + recycled <- vctrs::vec_recycle_common( woody_value = woody_value, annual_value = annual_value, annual_cover = annual_cover, woody_cover = woody_cover, - annual_init = annual_init - ) - - recycled <- vctrs::vec_recycle_common( - !!!recycle_inputs, - .size = vctrs::vec_size_common(!!!recycle_inputs) + annual_init = annual_init, + .size = vctrs::vec_size_common( + woody_value, + annual_value, + annual_cover, + woody_cover, + annual_init + ) ) woody_value <- recycled$woody_value @@ -74,8 +77,6 @@ combine_mixed_crops <- function(woody_value, annual_cover <- recycled$annual_cover woody_cover <- recycled$woody_cover annual_init <- recycled$annual_init - # Internal tolerance for floating point comparisons - tol <- 1e-3 # NA checks (annual_init only if required for incremental) na_checks <- list( From 485331b19e3319f63ca7cac32fc9843e88e3dc17 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 17:21:28 -0700 Subject: [PATCH 50/70] Apply suggestion from @dlebauer --- R/combine_mixed_crops.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index d19d513..9fd74ed 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -102,7 +102,7 @@ combine_mixed_crops <- function(woody_value, not_one <- abs(woody_cover - 1) > tol if (any(not_one, na.rm = TRUE)) { n_bad <- sum(not_one, na.rm = TRUE) - PEcAn.logger::logger.severe(paste0("incremental: woody_cover must be 1 (+/- ", tol, "); ", n_bad, " rows violate.")) + PEcAn.logger::logger.severe("incremental: woody_cover must be 1 (+/- ", tol, "); ", n_bad, " rows violate.") } res <- woody_value + annual_cover * (annual_value - annual_init) return(as.numeric(res)) From da369e4c8dd84755941f712f6f69d2136a3eaaf9 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 17:24:18 -0700 Subject: [PATCH 51/70] Apply suggestion from @divine7022 Co-authored-by: Akash B V --- R/match_site_ids_by_location.R | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/R/match_site_ids_by_location.R b/R/match_site_ids_by_location.R index 61f3b6a..6cfd899 100644 --- a/R/match_site_ids_by_location.R +++ b/R/match_site_ids_by_location.R @@ -17,7 +17,17 @@ #' @param map_all logical. If TRUE, compute nearest distances for all target rows. #' If FALSE, only compute nearest distances for IDs missing from reference; #' matched-by-ID rows are returned with distance 0. -#' @return a tibble with mapping and distances (same number of rows as target_df) +#' @return a tibble with one row per target row, containing: +#' \describe{ +#' \item{target_site_id}{Original target site ID} +#' \item{matched_site_id}{Matched reference site ID (may differ from target if not found by ID)} +#' \item{target_lat, target_lon}{Original target coordinates} +#' \item{ref_lat, ref_lon}{Matched reference coordinates} +#' \item{distance_m}{Distance between target and matched reference (meters)} +#' \item{close}{Proximity classification: "same location", "very close", "close", "moderate", "far"} +#' } +#' Rows are returned in the same order as target_df. + match_site_ids_by_location <- function( target_df, reference_df, From c13eb17261056bc8550f264f3e876c6d73f4d1a0 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 19:29:55 -0500 Subject: [PATCH 52/70] clarify docs related to matching first on ids then location; correct guidance for setting crs --- R/match_site_ids_by_location.R | 60 ++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 20 deletions(-) diff --git a/R/match_site_ids_by_location.R b/R/match_site_ids_by_location.R index 61f3b6a..910fc07 100644 --- a/R/match_site_ids_by_location.R +++ b/R/match_site_ids_by_location.R @@ -1,8 +1,11 @@ -#' Match site IDs by geographic location +#' Match site IDs by ID, then nearest location (or location only) #' -#' Matches site IDs from a target data.frame to the nearest reference site IDs based on latitude/longitude. -#' Always returns a data.frame with one row per target row containing: -#' target_site_id, matched_site_id, coords, distance_m, and proximity class. +#' By default, preserves any exact ID matches between `target_df` and `reference_df`, +#' and for targets whose IDs are not present in `reference_df`, matches the nearest +#' reference site by location. Optionally, set `prefer_location = TRUE` to ignore +#' IDs entirely and match every target to its nearest reference by location. +#' Returns one row per target with: target_site_id, matched_site_id, coordinates, +#' distance_m, and a coarse proximity class. #' #' @param target_df data.frame with at least id/lat/lon columns #' @param reference_df data.frame with at least id/lat/lon columns @@ -12,11 +15,21 @@ #' @param target_lon_col character. Longitude column in target_df (default "lon") #' @param reference_lat_col character. Latitude column in reference_df (default "lat") #' @param reference_lon_col character. Longitude column in reference_df (default "lon") -#' @param crs character. Coordinate reference system of input points (default "EPSG:4326"). -#' For California analyses, use "EPSG:3310" (NAD83 / California Albers). +#' @param crs character. CRS of the INPUT coordinates (default "EPSG:4326"). +#' This must reflect how your `lat`/`lon` columns are expressed: +#' use a geographic CRS like "EPSG:4326" when coordinates are decimal degrees; +#' only use a projected CRS (e.g., "EPSG:3310") if your coordinates are +#' already in that projection and in linear units. This function does not +#' reproject the input; set the CRS to what it is. If you need to transform +#' coordinates, do so prior to calling (e.g., with `terra::project`). +#' @param prefer_location logical. If TRUE, ignore IDs and match all targets by +#' nearest location (location-only mode). If FALSE (default), first match +#' by ID and only compute nearest for targets whose IDs are missing. #' @param map_all logical. If TRUE, compute nearest distances for all target rows. #' If FALSE, only compute nearest distances for IDs missing from reference; -#' matched-by-ID rows are returned with distance 0. +#' ID-matched rows are returned with distance 0. Note: `map_all` does not +#' change which ID is returned; it only controls distance calculations. +#' @param max_distance numeric. Maximum allowable distance (m) for a match; error if exceeded (default 100 m). #' @return a tibble with mapping and distances (same number of rows as target_df) match_site_ids_by_location <- function( target_df, @@ -28,6 +41,7 @@ match_site_ids_by_location <- function( reference_lat_col = "lat", reference_lon_col = "lon", crs = "EPSG:4326", + prefer_location = FALSE, map_all = FALSE, max_distance = 100) { # Validate columns @@ -47,24 +61,30 @@ match_site_ids_by_location <- function( ) } - # Identify matched and mismatched rows by ID + # Annotate rows and, unless in location-only mode, split by ID membership by_id <- stats::setNames(reference_id_col, target_id_col) target_df <- target_df |> dplyr::mutate(`..row..` = dplyr::row_number()) - matched_id <- target_df |> - dplyr::inner_join(reference_df, by = by_id, suffix = c(".t", ".r")) + matched_id <- NULL + mismatched_id <- target_df + if (!isTRUE(prefer_location)) { + matched_id <- target_df |> + dplyr::inner_join(reference_df, by = by_id, suffix = c(".t", ".r")) - mismatched_id <- target_df |> - dplyr::anti_join(reference_df, by = by_id) + mismatched_id <- target_df |> + dplyr::anti_join(reference_df, by = by_id) - n_needs <- nrow(mismatched_id) - if (n_needs == 0) { - PEcAn.logger::logger.info("All target IDs found in reference by ID.") + n_needs <- nrow(mismatched_id) + if (n_needs == 0) { + PEcAn.logger::logger.info("All target IDs found in reference by ID.") + } else { + PEcAn.logger::logger.warn( + paste(n_needs, "target sites not in reference by ID; matching by nearest location.") + ) + } } else { - PEcAn.logger::logger.warn( - paste(n_needs, "target sites not in reference by ID; matching by nearest location.") - ) + PEcAn.logger::logger.info("prefer_location=TRUE: matching ALL targets to nearest reference by location (ignoring IDs).") } # Compute nearest for mismatches (always) @@ -96,9 +116,9 @@ match_site_ids_by_location <- function( ) } - # Build mapping for matched-by-ID rows + # Build mapping for matched-by-ID rows (skipped in location-only mode) mapping_match <- NULL - if (nrow(matched_id) > 0) { + if (!isTRUE(prefer_location) && nrow(matched_id) > 0) { # Prepare base table mapping_match <- tibble::tibble( `..row..` = matched_id$`..row..`, From 27d9895d9badac63db20ed6701c0cd82447903de Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 23:05:51 -0500 Subject: [PATCH 53/70] check that target and reference dfs don't contain duplicate site_ids with different coordinates --- R/match_site_ids_by_location.R | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/R/match_site_ids_by_location.R b/R/match_site_ids_by_location.R index 910fc07..c6b1a51 100644 --- a/R/match_site_ids_by_location.R +++ b/R/match_site_ids_by_location.R @@ -61,6 +61,26 @@ match_site_ids_by_location <- function( ) } + # target_df: check for records with the same site_id but different coordinates + id_coord <- target_df |> dplyr::distinct( + dplyr::across(dplyr::all_of(c(target_id_col, target_lat_col, target_lon_col))) + ) + if (any(duplicated(id_coord[[target_id_col]]))) { + PEcAn.logger::logger.severe( + "target_df has duplicate IDs with different coordinates; ensure sites are unique." + ) + } + + # reference_df: check for records with the same site_id but different coordinates + ref_id_coord <- reference_df |> dplyr::distinct( + dplyr::across(dplyr::all_of(c(reference_id_col, reference_lat_col, reference_lon_col))) + ) + if (any(duplicated(ref_id_coord[[reference_id_col]]))) { + PEcAn.logger::logger.severe( + "reference_df has duplicate IDs with different coordinates; ensure sites are unique." + ) + } + # Annotate rows and, unless in location-only mode, split by ID membership by_id <- stats::setNames(reference_id_col, target_id_col) target_df <- target_df |> @@ -155,8 +175,7 @@ match_site_ids_by_location <- function( TRUE ~ "far (>5000m)" ) ) |> - dplyr::select(-`..row..`) |> - dplyr::distinct() + dplyr::select(-`..row..`) # Enforce maximum allowable distance if (any(mapping$distance_m > max_distance, na.rm = TRUE)) { From 503e9960ecdd93ddbeaf16e41031beffca027064 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 23:08:20 -0500 Subject: [PATCH 54/70] remove redundant params (just use inherited params) --- R/match_site_ids_by_location.R | 2 -- 1 file changed, 2 deletions(-) diff --git a/R/match_site_ids_by_location.R b/R/match_site_ids_by_location.R index c6b1a51..9d56170 100644 --- a/R/match_site_ids_by_location.R +++ b/R/match_site_ids_by_location.R @@ -193,8 +193,6 @@ match_site_ids_by_location <- function( #' If some target IDs don't exist in reference, update them to the nearest reference IDs by location. #' If all target IDs exist in reference, returns the original target_df unchanged. #' -#' @param target_df data.frame -#' @param reference_df data.frame #' @inheritParams match_site_ids_by_location #' @return data.frame with potentially updated ID column update_site_ids_by_location <- function( From ceee65b00a45acd48d1394cb522fd703f976891f Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 23:13:41 -0500 Subject: [PATCH 55/70] check that n>1 and sd >0 for ensemble stats --- scripts/041_aggregate_to_county.R | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/scripts/041_aggregate_to_county.R b/scripts/041_aggregate_to_county.R index 71cf333..02a8348 100644 --- a/scripts/041_aggregate_to_county.R +++ b/scripts/041_aggregate_to_county.R @@ -106,6 +106,21 @@ county_summaries <- ens_county_preds |> sd_total_ha = sd(total_ha), .groups = "drop" ) |> + ( + function(df) { + if (any(df$n == 1, na.rm = TRUE)) { + PEcAn.logger::logger.severe( + "At least one (model_output, pft, county) group has n == 1; variability across ensembles cannot be assessed." + ) + } + if (any(df$sd_total_c_Tg == 0, na.rm = TRUE)) { + PEcAn.logger::logger.severe( + "At least one (model_output, pft, county) group has zero variability across ensembles (sd_total_c_Tg == 0)." + ) + } + df + } + ) |> dplyr::mutate( # Only save 3 significant digits dplyr::across( From 5b882dcaff14736782f970bb0217fdd7b55dae78 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 23:15:57 -0500 Subject: [PATCH 56/70] return numeric --- R/combine_mixed_crops.R | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index 4ec0bd2..9e634ed 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -112,7 +112,7 @@ combine_mixed_crops <- function(woody_value, PEcAn.logger::logger.severe(paste0("incremental: woody_cover must be 1 (+/- ", tol, "); ", n_bad, " rows violate.")) } res <- woody_value + annual_cover * (annual_value - annual_init) - return(as.numeric(res)) + return(res) } # weighted method @@ -124,5 +124,5 @@ combine_mixed_crops <- function(woody_value, PEcAn.logger::logger.warn(paste0("weighted: ", n_bad, " rows with cover fractions not summing to 1 (tol); results set to NA for those rows.")) res[bad_sum] <- NA_real_ } - return(as.numeric(res)) + return(res) } From 4e67335a47e9a2573ab2e96765b7e3352346fb3d Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Mon, 2 Feb 2026 21:24:15 -0700 Subject: [PATCH 57/70] Apply suggestion from @infotroph Co-authored-by: Chris Black --- tests/testthat/test-match_site_ids.R | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/tests/testthat/test-match_site_ids.R b/tests/testthat/test-match_site_ids.R index be58a51..8a7e5e9 100644 --- a/tests/testthat/test-match_site_ids.R +++ b/tests/testthat/test-match_site_ids.R @@ -5,11 +5,7 @@ test_that("match_site_ids_by_location returns full mapping when all IDs exist", lat = c(34.0, 35.0), lon = c(-118.0, -119.0) ) - reference <- data.frame( - site_id = c("A", "B"), - lat = c(34.0, 35.0), - lon = c(-118.0, -119.0) - ) + reference <- target map <- match_site_ids_by_location(target, reference) expect_s3_class(map, "data.frame") From 68c8b80bf6ec5a3f1d8c4098d236eb2d63dce38c Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Tue, 3 Feb 2026 14:25:27 -0500 Subject: [PATCH 58/70] addressed remaining PR comments --- scripts/009_update_landiq.R | 72 +++++++++++++++------------ scripts/021_clustering_diagnostics.R | 38 ++++++++------ scripts/031_aggregate_sipnet_output.R | 65 +++++++++++++++--------- 3 files changed, 105 insertions(+), 70 deletions(-) diff --git a/scripts/009_update_landiq.R b/scripts/009_update_landiq.R index 1f19a51..4c43844 100644 --- a/scripts/009_update_landiq.R +++ b/scripts/009_update_landiq.R @@ -1,6 +1,6 @@ -# This script is used to explore LandIQ data -# and reconcile design points generated in previous iterations -# with updated LandIQ data. +# This script is used to explore LandIQ data +# and reconcile design points generated in previous iterations +# with updated LandIQ data. # Will be obsolete once we re-generate new design points in phase 3 ## One off code used to fix LandIQ data has been moved to a gist:.groups @@ -18,7 +18,7 @@ cluster_library(cl, "dplyr") ## Load Configuration ## later can be replaced w/ config.yml or pecan.xml -config_file <- here::here("000-config.R") +config_file <- here::here("000-config.R") if (file.exists(config_file)) { source(config_file) } else { @@ -33,8 +33,8 @@ crops_all <- data.table::fread( filter(!is.na(CLASS)) |> mutate( SUBCLASS = replace_na(SUBCLASS, 0) - ) - + ) + crop_pft_map <- readr::read_csv( file.path(raw_data_dir, "cadwr_land_use", "CARB_PFTs_table.csv") ) |> @@ -56,7 +56,7 @@ crops_all |> values_from = n, values_fill = 0 ) - + # Perform join to map PFTs crops_all_pft <- crops_all |> left_join( @@ -65,7 +65,7 @@ crops_all_pft <- crops_all |> "CLASS" = "crop_type", "SUBCLASS" = "crop_code" ) - ) + ) # Split into multidplyr_df by year and season # For parallel dplyr crops_all_pft_x <- crops_all_pft |> @@ -75,9 +75,11 @@ crops_all_pft_x <- crops_all_pft |> # 1) How many rows lost pft_group (i.e. join failures) by year/season? crops_all_pft_x |> - summarise(n_total = n(), - n_missing = sum(is.na(pft_group)), - pct_missing = round(n_missing / n_total * 100)) |> + summarise( + n_total = n(), + n_missing = sum(is.na(pft_group)), + pct_missing = round(n_missing / n_total * 100) + ) |> collect() |> print(n = Inf) @@ -150,12 +152,12 @@ woody_herb_summary <- crops_all_pft_x |> woody_pcnt = sum(PCNT[pft_group == "woody"], na.rm = TRUE), herb_pcnt = sum(PCNT[pft_group == "herbaceous"], na.rm = TRUE) ) |> - ungroup() |> - filter(n_pft == 2) - + ungroup() |> + filter(n_pft == 2) + z <- woody_herb_summary |> collect() |> - #filter(woody_pcnt > 0, herb_pcnt > 0) |> + # filter(woody_pcnt > 0, herb_pcnt > 0) |> mutate( total_pcnt = woody_pcnt + herb_pcnt ) @@ -186,7 +188,7 @@ woody_herb_fields_by_year_season |> summarize(n_fields = n_distinct(UniqueID)) |> arrange(year, season) -### Trying reconcile 2016 and 2018 LandIQ data +### Trying reconcile 2016 and 2018 LandIQ data crops_all_2016 <- crops_all |> filter(year == 2016) @@ -196,7 +198,7 @@ crops_all_2018 <- crops_all |> dwr_2018 <- terra::vect( file.path(raw_data_dir, "cadwr_land_use", "LandIQ_shapefiles", "i15_Crop_Mapping_2018_SHP", "i15_Crop_Mapping_2018.shp") -) |> +) |> terra::project("epsg:3310") dwr_2016 <- terra::vect( @@ -210,11 +212,11 @@ dwr_x <- terra::intersect(dwr_2018, dwr_2016) ## Goal is to reconcile dwr_2016 Unique_ID with dwr_2018 UniqueID ## join them by geometry, creating id_2016 and id_2018 ## find where UniqueID has changed (or not) -## where 2016 is missing from 2018, find nearest polygon and calculate distance +## where 2016 is missing from 2018, find nearest polygon and calculate distance dwr_merged <- terra::intersect(dwr_2018, dwr_2016) |> # Project to WGS84 (decimal degrees) before converting to dataframe terra::project("epsg:4326") |> - as.data.frame(geom = "xy") |> + as.data.frame(geom = "xy") |> select(contains("id"), x, y) |> rename(lon = x, lat = y) |> mutate( @@ -229,9 +231,10 @@ unmatched <- anti_join(design_points, design_points_ids, by = "site_id") |> # Then append to the design_points_ids if (nrow(unmatched) > 0) { unmatched_vect <- terra::vect( - unmatched, geom = c("lon", "lat"), crs = "epsg:4326" + unmatched, + geom = c("lon", "lat"), crs = "epsg:4326" ) |> terra::project("epsg:3310") - + # Calculate distance matrix and find nearest polygons nearest_data <- tibble( site_id = unmatched$site_id, @@ -260,19 +263,22 @@ if (nrow(unmatched) > 0) { PEcAn.logger::logger.info("Design points joined to nearest polygon:") nearest_data |> - knitr::kable(digits = 5) - + knitr::kable(digits = 5) + # Combine intersected points with nearest points design_points_ids_updated <- bind_rows(design_points_ids, nearest_data) } - - -write.csv(design_points_ids_updated |> - select(UniqueID, site_id, lat, lon) |> - left_join(design_points |> - select(site_id, pft), - by = "site_id") |> - arrange(pft, lat, lon), + + +write.csv( + design_points_ids_updated |> + select(UniqueID, site_id, lat, lon) |> + left_join( + design_points |> + select(site_id, pft), + by = "site_id" + ) |> + arrange(pft, lat, lon), here::here("data/design_points.csv"), row.names = FALSE ) @@ -300,7 +306,7 @@ crops_all |> arrange(desc(n_counties)) |> print(n = 15) -### +### grass_fields <- crops_all |> filter(CLASS == "P", MULTIUSE == "S", season == 2) |> @@ -315,7 +321,7 @@ grass_fields <- crops_all |> ids = paste(unique(UniqueID), collapse = ",") ) |> ungroup() |> - sample_n(10) |> + slice_sample(n = 10) |> rename(lat = centy, lon = centx) |> terra::vect(crs = "epsg:3857") |> terra::project("epsg:4269") |> diff --git a/scripts/021_clustering_diagnostics.R b/scripts/021_clustering_diagnostics.R index 36ff1b3..a234507 100644 --- a/scripts/021_clustering_diagnostics.R +++ b/scripts/021_clustering_diagnostics.R @@ -7,7 +7,14 @@ ca_climregions <- sf::st_read(file.path(data_dir, "ca_climregions.gpkg")) ######### Cluster Diagnostics ################ -sites_clustered <- readRDS(file.path(cache_dir, "sites_clustered.rds")) +sites_clustered_path <- file.path(cache_dir, "sites_clustered.rds") +if (!file.exists(sites_clustered_path)) { + PEcAn.logger::logger.severe("Expected clustering output not found:", sites_clustered_path) +} +sites_clustered <- readRDS(sites_clustered_path) +if (!("cluster" %in% names(sites_clustered))) { + PEcAn.logger::logger.severe("Clustering object lacks 'cluster' column; check upstream clustering step.") +} # Summarize clusters cluster_summary <- sites_clustered |> dplyr::group_by(cluster) |> @@ -16,16 +23,22 @@ cluster_summary <- sites_clustered |> knitr::kable(cluster_summary, digits = 0) # Plot all pairwise numeric variables -ggpairs_plot <- sites_clustered |> - dplyr::select(-site_id) |> - # need small # pfts for ggpairs - dplyr::sample_n(min(nrow(sites_clustered), 1000)) |> - GGally::ggpairs( - # plot all values except site_id and cluster - columns = setdiff(names(sites_clustered), c("site_id", "cluster")), - mapping = aes(color = as.factor(cluster), alpha = 0.8) - ) + - theme_minimal() + +withr::with_seed(42, { + ggpairs_plot <- sites_clustered |> + dplyr::select(-site_id) |> + # need small # pfts for ggpairs + dplyr::slice_sample( + n = min(nrow(sites_clustered), 10000) + ) |> + GGally::ggpairs( + # plot all values except site_id and cluster + columns = setdiff(names(sites_clustered), c("site_id", "cluster")), + mapping = aes(color = as.factor(cluster), alpha = 0.8) + ) + + theme_minimal() +}) + ggsave_optimized( "figures/cluster_pairs.webp", plot = ggpairs_plot, @@ -157,9 +170,6 @@ design_points_clust <- design_points |> dplyr::mutate(cluster = as.factor(cluster)) |> sf::st_as_sf(coords = c("lon", "lat"), crs = 4326) -ca_fields_pts <- ca_fields |> - sf::st_as_sf(coords = c("lon", "lat"), crs = 4326) - design_pt_plot <- ggplot() + geom_sf(data = ca_climregions, fill = "white") + theme_minimal() + diff --git a/scripts/031_aggregate_sipnet_output.R b/scripts/031_aggregate_sipnet_output.R index 8a2835d..215b4aa 100644 --- a/scripts/031_aggregate_sipnet_output.R +++ b/scripts/031_aggregate_sipnet_output.R @@ -1,5 +1,5 @@ ## Simulate multi-PFT scenarios by aggregating SIPNET output from two PFTs -## Two approaches: +## Two approaches: ## - overlap (e.g. orchard + herbaceous ground cover; geometric overlap) ## - discrete (e.g. annual crop monoculture with hedgerows; partitions area) ## @@ -7,7 +7,7 @@ ## ensemble_output.csv (long) ## site_cover_fractions.csv with columns: ## site_id, year, woody_cover, annual_cover, scenario (scenario ∈ {overlap, discrete}) -## For development, a mock grid of (woody_cover, annual_cover, scenario) is generated and +## For development, a mock grid of (woody_cover, annual_cover, scenario) is generated and ## applied to all sites) ## ## Notation: @@ -17,7 +17,6 @@ source("000-config.R") PEcAn.logger::logger.info("*** Starting multi-PFT aggregation ***") -source(here::here("R", "mixed_aggregation.R")) # ---- Load ensemble output ---------------------------------------------------- ensemble_output_csv <- file.path(model_outdir, "ensemble_output.csv") @@ -40,7 +39,7 @@ ensemble_data <- readr::read_csv(ensemble_output_csv) |> # distinct site-year combinations distinct_site_year <- ensemble_data |> dplyr::mutate(year = lubridate::year(datetime)) |> - dplyr::distinct(site_id, year) + dplyr::distinct(site_id, year) # Scenarios for development # - 100% woody perennial (orchard / monoculture) @@ -51,7 +50,7 @@ distinct_site_year <- ensemble_data |> # - Annual crop + 50% woody hedgerows scenarios <- tibble::tibble( annual_cover = c(0, 1, 0.25, 0.5, 0.75, 0.5), - woody_cover = c(1, 0, 1, 1, 0.25, 0.5), + woody_cover = c(1, 0, 1, 1, 0.25, 0.5), mix_description = c( "100% woody", "100% annual", "100% woody + 25% annual", "100% woody + 50% annual", @@ -68,7 +67,7 @@ scenarios <- tibble::tibble( # pfts woody_pft <- "woody perennial crop" annual_pft <- "annual crop" -mixed_overlap_pft <- "woody_annual_overlap_100_50" # new synthetic PFT label +mixed_overlap_pft <- "woody_annual_overlap_100_50" # new synthetic PFT label # ensemble members ensemble_ids <- unique(ensemble_data$ensemble_id) @@ -80,15 +79,14 @@ annual_init <- ensemble_data |> dplyr::slice_min(order_by = datetime, n = 1, with_ties = FALSE) |> dplyr::mutate( annual_init = dplyr::case_when( - # TODO: think through this - # we want to add AGB to the ecosystem level value - # for SOC, we only want the diff - # this probably isn't the best place to store this logic - # also, + # For incremental (overlap) mixing, use a baseline for annuals: + # - AGB: baseline ~ 0 at planting; carry forward only the increment. + # - TotSoilCarb (SOC): baseline is the initial SOC stock (value at first timestep). + # baseline SOC is used to compute delta SOC increment variable == "AGB" ~ 0, variable == "TotSoilCarb" ~ value - ) - ) |> + ) + ) |> dplyr::select(site_id, ensemble_id, variable, annual_init) # ---- Reshape ensemble output (wide by PFT) ----------------------------------- @@ -101,7 +99,7 @@ annual_init <- ensemble_data |> tidyr::pivot_wider(names_from = pft, values_from = value) scenarios_x_vars <- scenarios |> - tidyr::crossing(ensemble_id = ensemble_ids) + tidyr::crossing(ensemble_id = ensemble_ids) ens_wide <- .ens_wide |> dplyr::rename( @@ -136,33 +134,49 @@ if (n_bad > 0) { # ---- Check for missing values --------------------------------------- # TBD -if(any(is.na(ens_wide))) { +if (any(is.na(ens_wide))) { PEcAn.logger::logger.severe( "Missing values found in ensemble wide data. Examples:\n" ) head(ens_wide[is.na(ens_wide)], 10) } -# ---- Combine values (row-wise) ---------------------------------------------- +# ---- Combine values (vectorized by method) ---------------------------------- -ens_combined <- ens_wide |> - dplyr::rowwise() |> +# Weighted case (discrete area partition) +ens_weighted <- ens_wide |> + dplyr::filter(method == "weighted") |> dplyr::mutate( - value_combined = combine_value( + value_combined = combine_mixed_crops( woody_value = woody_value, annual_value = annual_value, - annual_init = ifelse(method == "incremental", annual_init, NULL), annual_cover = annual_cover, woody_cover = woody_cover, - method = method + method = "weighted" ) - ) + ) + +# Incremental case (overlap; preserve woody baseline, add annual increment) +ens_incremental <- ens_wide |> + dplyr::filter(method == "incremental") |> + dplyr::mutate( + value_combined = combine_mixed_crops( + woody_value = woody_value, + annual_value = annual_value, + annual_cover = annual_cover, + woody_cover = woody_cover, + annual_init = annual_init, + method = "incremental" + ) + ) + +ens_combined <- dplyr::bind_rows(ens_weighted, ens_incremental) ################################################################################ # ---- Write outputs ----------------------------------------------------------- -# +# ens_wide_csv <- file.path(model_outdir, "multi_pft_ensemble_output.csv") readr::write_csv(ens_wide, ens_wide_csv) PEcAn.logger::logger.info("Wrote wide diagnostic: ", ens_wide_csv) @@ -180,6 +194,11 @@ efi_original <- ensemble_data |> mixed_overlap_rows <- ens_combined |> dplyr::ungroup() |> dplyr::filter( + # Use a single overlap scenario for EFI export: + # orchard with 50% ground cover + # this can be configurable if indicated during calibration / validation. + # This parameter can be calibrated based on observed data, + # and can vary, or not, by location mix_description == "100% woody + 50% annual" ) |> dplyr::select( From 4bbc105de9470cb9720809775f9ce33414195331 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:47:05 -0500 Subject: [PATCH 59/70] applied inline suggestions --- R/combine_mixed_crops.R | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index db50f5c..8dea5f0 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -98,11 +98,11 @@ combine_mixed_crops <- function(woody_value, } # Range checks for covers - out_of_range_annual <- (annual_cover < 0 - tol) | (annual_cover > 1 + tol) - out_of_range_woody <- (woody_cover < 0 - tol) | (woody_cover > 1 + tol) + out_of_range_annual <- (annual_cover < -tol) | (annual_cover > 1 + tol) + out_of_range_woody <- (woody_cover < -tol) | (woody_cover > 1 + tol) if (any(out_of_range_annual | out_of_range_woody, na.rm = TRUE)) { n_bad <- sum(out_of_range_annual | out_of_range_woody, na.rm = TRUE) - PEcAn.logger::logger.severe("weighted: cover fractions outside must be in the range [0,1] (+/- tol).", n_bad, "rows violate.") + PEcAn.logger::logger.severe("weighted: cover fractions must be in the range [0,1] (+/- tol). ", n_bad, " rows violate this constraint.") } if (method == "incremental") { From 0485176980d374059c3a398f2736ec9dca2aea56 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:47:48 -0500 Subject: [PATCH 60/70] fix typos --- docs/mixed_system_prototype.qmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/mixed_system_prototype.qmd b/docs/mixed_system_prototype.qmd index a422eae..e0ed128 100644 --- a/docs/mixed_system_prototype.qmd +++ b/docs/mixed_system_prototype.qmd @@ -1,7 +1,7 @@ --- title: "Mixed-System Prototype (Two-PFT Aggregation)" author: "David LeBauer" -date: "`r Sys.Date()`" +date: today quarto-cache: true format: html: @@ -14,7 +14,7 @@ execute: message: false --- -This document prototypes a workflow for modeling croplands with multiple crops ("mixed cropping systems"). +This document presents a prototype workflow for modeling croplands with multiple crops ("mixed cropping systems"). ## Challenge @@ -48,7 +48,7 @@ Technical details of these steps are described in more detail below. - Use the same meteorological driver data (ERA5 ensemble). - Use the same soil texture and soil organic carbon. 3. Settings File: - - For each mixed site, create separate a separate configuration file for each PFT; append `-PFT` to the site ID. + - For each mixed site, create a separate configuration file for each PFT; append `-PFT` to the site ID. 4. For herbaceous crops: - Annual herbaceous: assume start with bare ground (zero biomass). - Perennial herbaceous: start as bare ground run; consider methods for estimating initial biomass in later iterations. From d3a8fb45c053cd06573db878f23ce9d5bfac786e Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:49:04 -0500 Subject: [PATCH 61/70] remove redundent line --- docs/workflow_documentation.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/workflow_documentation.md b/docs/workflow_documentation.md index 3a1f019..b35976e 100644 --- a/docs/workflow_documentation.md +++ b/docs/workflow_documentation.md @@ -252,7 +252,6 @@ Simulates mixed-cropping scenarios by combining outputs across two PFTs using `c `combine_mixed_crops()` is pool-agnostic: pass any additive quantity expressed per unit area, including instantaneous stocks (`kg/m^2`) or total flux totals that have already been accumulated over the SIPNET output interval (e.g., hourly or annual `kg/m^2` of NEE). -(`kg/m^2`) or total flux totals that have already been accumulated over the SIPNET output interval (e.g., hourly or annual `kg/m^2` of NEE). (`kg/m^2`) or total flux over a defined time step. Outputs include `multi_pft_ensemble_output.csv`, `combined_ensemble_output.csv`, and `ensemble_output_with_mixed.csv` with a synthetic mixed PFT. From 42ae7e897dfb3cd563130fe33ce5953fa61cb58b Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:49:39 -0500 Subject: [PATCH 62/70] formate table --- reports/design_points_analysis.qmd | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/reports/design_points_analysis.qmd b/reports/design_points_analysis.qmd index 10f2f1f..cbc0a63 100644 --- a/reports/design_points_analysis.qmd +++ b/reports/design_points_analysis.qmd @@ -15,15 +15,16 @@ The following figures illustrate the distribution of these design points, which The environmental covariates used for clustering are: -| Variable | Description | Source | Units | -|----------|-------------|--------|-------| -| temp | Mean annual temperature | ERA5 | °C | -| precip | Mean annual precipitation | ERA5 | mm/year | -| srad | Solar radiation | ERA5 | W/m² | -| vapr | Vapor pressure deficit | ERA5 | kPa | -| clay | Clay content | SoilGrids | % | -| ocd | Organic carbon density | SoilGrids | g/kg | -| twi | Topographic wetness index | SRTM-derived | - | +| Variable | Description | Source | Units | +|----------|-----------------------------|---------------|-----------| +| temp | Mean annual temperature | ERA5 | °C | +| precip | Mean annual precipitation | ERA5 | mm/year | +| srad | Solar radiation | ERA5 | W/m² | +| vapr | Vapor pressure deficit | ERA5 | kPa | +| clay | Clay content | SoilGrids | % | +| ocd | Organic carbon density | SoilGrids | g/kg | +| twi | Topographic wetness index | SRTM-derived | – | + ## Map of Selected Design Points @@ -45,7 +46,7 @@ The clusters should show distinct groupings based on the environmental covariate ## Environmental Characteristics of each Cluster -These plots present This the normalized mean values of environmental covariates for each cluster. +These plots present the normalized mean values of environmental covariates for each cluster. This summary highlights the characteristics of each cluster based on the environmental covariates used for clustering. From a0f6605b6434d4f61af72faab08321b81fc32d7c Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:50:27 -0500 Subject: [PATCH 63/70] fix typo --- reports/downscaling_results.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reports/downscaling_results.qmd b/reports/downscaling_results.qmd index 48b56cf..a85112f 100644 --- a/reports/downscaling_results.qmd +++ b/reports/downscaling_results.qmd @@ -95,7 +95,7 @@ The maps below are organized to compare across PFTs. First we show total county The first table summarizes county-level metrics for each PFT. The second table pivots carbon pools to highlight differences across cropping systems.. ```{r, options, include=FALSE} -options(ccmmf.quiet_bannerj = TRUE) +options(ccmmf.quiet_banner = TRUE) source(here::here("000-config.R")) ``` From 5272ddd682abd2e4ea158750940e15acbebc8ac2 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:53:11 -0500 Subject: [PATCH 64/70] after logging an error, the script continues execution. added logger.severe() to halt --- scripts/009_update_landiq.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/009_update_landiq.R b/scripts/009_update_landiq.R index 4c43844..bb585c8 100644 --- a/scripts/009_update_landiq.R +++ b/scripts/009_update_landiq.R @@ -22,7 +22,7 @@ config_file <- here::here("000-config.R") if (file.exists(config_file)) { source(config_file) } else { - PEcAn.logger::logger.error( + PEcAn.logger::logger.severe( "Config file not found, are you in the correct directory?", getwd() ) } From 24d65d38b764a68a9c15bd591d883ee62e230688 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:53:53 -0500 Subject: [PATCH 65/70] applied inline suggestions --- scripts/040_downscale.R | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/scripts/040_downscale.R b/scripts/040_downscale.R index 6d22391..47f60e1 100644 --- a/scripts/040_downscale.R +++ b/scripts/040_downscale.R @@ -41,6 +41,9 @@ end_date <- lubridate::as_date(max(ensemble_data$datetime)) #--- load ca_fields ------------------------------------------------ # this is a convenience time saver for development +# cache sf object to avoid repeated reads in interactive sessions. +# TODO: consider memoise::memoise() for production robustness or +# refactor to pass as function argument. if (!exists("ca_fields_full")) { ca_fields_full <- sf::read_sf(file.path(data_dir, "ca_fields.gpkg")) } @@ -581,9 +584,7 @@ PEcAn.logger::logger.info( log_mem("Post primary downscaling loop :: ") PEcAn.logger::logger.info( - paste0(" Finished downscaling "), - "\n\nCongratulations! You are almost there!\n\n", - rep("", 10) + "Finished downscaling.\nCongratulations! You are almost there.\n" ) ### --- Print Metrics for Each Ensemble Member ---#### From 25acf8438bfc4265dc81200706f81701dd901996 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:54:29 -0500 Subject: [PATCH 66/70] applied inline suggestion --- scripts/041_aggregate_to_county.R | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/scripts/041_aggregate_to_county.R b/scripts/041_aggregate_to_county.R index 02a8348..cb9875e 100644 --- a/scripts/041_aggregate_to_county.R +++ b/scripts/041_aggregate_to_county.R @@ -65,9 +65,8 @@ ens_county_preds <- downscale_preds |> n = dplyr::n(), total_c_Mg = sum(total_c_Mg), # total Mg C per county total_ha = sum(area_ha), - .groups = "drop_last" + .groups = "drop" ) |> - dplyr::ungroup() |> # counties with no fields will result in NA below dplyr::filter(total_ha > 0) |> dplyr::mutate( @@ -136,7 +135,5 @@ readr::write_csv( PEcAn.logger::logger.info("County summaries written to", file.path(model_outdir, "county_summaries.csv")) PEcAn.logger::logger.info( - rep(" ", 5), "\n\n", - "Finished aggregation to County level", "\n\n", - rep(" ", 5) + "Finished aggregation to county level.\n" ) From 657b531743a1dbfd2e587ac28f19a6342de650e4 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 09:57:46 -0500 Subject: [PATCH 67/70] silently suppressing source errors with try mask issues; added tryCatch with a warning so failures are visible in test logs --- tests/testthat/helper.R | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tests/testthat/helper.R b/tests/testthat/helper.R index cf38ef2..92ab346 100644 --- a/tests/testthat/helper.R +++ b/tests/testthat/helper.R @@ -8,6 +8,11 @@ withr::defer(PEcAn.logger::logger.setLevel(level)) r_dir <- here::here("R") r_scripts <- list.files(r_dir, pattern = "\\.R$", full.names = TRUE) for (f in r_scripts) { - # source quietly; re-definitions are harmless for tests - try(source(f, local = TRUE), silent = TRUE) + result <- tryCatch( + source(f, local = TRUE), + error = function(e) { + warning("Failed to source ", basename(f), ": ", conditionMessage(e)) + NULL + } + ) } From 7575cde04226ee9c1a43f450b1284947198bd180 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 10:01:37 -0500 Subject: [PATCH 68/70] added additional test cases to cover: (1) vector inputs to ensure vectorization works correctly, (2) edge case where annual_init == annual_value (delta = 0), and (3) the +/- 0.1% tolerance boundary for woody_cover in incremental mode; to improves confidence in edge case handling --- tests/testthat/test-combine_mixed_crops.R | 55 +++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tests/testthat/test-combine_mixed_crops.R b/tests/testthat/test-combine_mixed_crops.R index c3635e2..251f070 100644 --- a/tests/testthat/test-combine_mixed_crops.R +++ b/tests/testthat/test-combine_mixed_crops.R @@ -71,3 +71,58 @@ test_that("length mismatch rejected", { "Can't recycle" ) }) + +test_that("weighted mixing works with vector inputs", { + res <- combine_mixed_crops( + woody_value = c(100, 200, 150), + annual_value = c(50, 100, 75), + annual_cover = 0.2, + woody_cover = 0.8, + method = "weighted" + ) + expected <- 0.8 * c(100, 200, 150) + 0.2 * c(50, 100, 75) + expect_equal(res, expected, tolerance = 1e-10) + expect_length(res, 3) +}) + +test_that("incremental mixing handles delta = 0 (annual_init equals annual_value)", { + res <- combine_mixed_crops( + woody_value = 200, + annual_value = 150, # same as annual_init + annual_init = 150, + annual_cover = 0.5, + woody_cover = 1.0, + method = "incremental" + ) + # delta = 150 - 150 = 0, so result should equal woody_value + expect_equal(res, 200) +}) + +test_that("incremental accepts woody_cover within tolerance of 1", { + # woody_cover = 0.999 should be accepted (within 0.1% of 1) + res <- combine_mixed_crops( + woody_value = 200, + annual_value = 220, + annual_init = 200, + annual_cover = 0.3, + woody_cover = 0.999, # within 1e-3 tolerance + method = "incremental" + ) + expected <- 200 + 0.3 * (220 - 200) + expect_equal(res, expected, tolerance = 1e-10) +}) + +test_that("incremental rejects woody_cover outside tolerance of 1", { + # woody_cover = 0.99 is outside the 0.1% tolerance + expect_error( + combine_mixed_crops( + woody_value = 200, + annual_value = 220, + annual_init = 200, + annual_cover = 0.3, + woody_cover = 0.99, # outside tolerance (1% away) + method = "incremental" + ), + "woody_cover must be 1" + ) +}) \ No newline at end of file From ae4b8ef26f1ac72fc6fd5f42fc14ef26506bbb26 Mon Sep 17 00:00:00 2001 From: divne7022 Date: Thu, 5 Feb 2026 10:16:13 -0500 Subject: [PATCH 69/70] specified annual_init unit and made doc clear --- R/combine_mixed_crops.R | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/R/combine_mixed_crops.R b/R/combine_mixed_crops.R index 8dea5f0..5afea59 100644 --- a/R/combine_mixed_crops.R +++ b/R/combine_mixed_crops.R @@ -26,7 +26,8 @@ #' #' @param woody_value numeric. Pool or accumulated flux for the woody PFT (kg/m2). #' @param annual_value numeric. Pool or accumulated flux for the annual PFT (kg/m2). -#' @param annual_init numeric, required for method = "incremental"; the initial annual pool. +#' @param annual_init numeric, required for method = "incremental"; the initial annual pool +#' size at t0 (kg C m-2). Used to compute the delta: (annual_value - annual_init). #' @param annual_cover numeric. Fractional cover of the annual PFT (0-1). #' @param woody_cover numeric. Fractional cover of the woody PFT (0-1). Must be 1 when `method` is "incremental". #' @param method character. One of "weighted" or "incremental". From 38086fdc2d36b24a8b0ebdfdc5b78cb0974f0b42 Mon Sep 17 00:00:00 2001 From: David LeBauer Date: Thu, 5 Feb 2026 15:04:56 -0700 Subject: [PATCH 70/70] Update scripts/040_downscale.R --- scripts/040_downscale.R | 3 +++ 1 file changed, 3 insertions(+) diff --git a/scripts/040_downscale.R b/scripts/040_downscale.R index 47f60e1..cce3ac9 100644 --- a/scripts/040_downscale.R +++ b/scripts/040_downscale.R @@ -507,6 +507,9 @@ for (pool in outputs_to_extract) { } else if (!is.null(mdl$y)) { y_train <- mdl$y } + # TODO OOB r2 is within-training only (bootstrap from same sites) + # Add spatial cross-validation (e.g., leave-one-site-out or spatial + # blocking). r2_oob <- extract_oob_r2(mdl, y_train) vi_rows[[length(vi_rows) + 1L]] <- tibble::tibble( pft = pft_i,