Multisite SA support with input coordination and README tracking #3660

divine7022 · 2025-11-10T14:36:13Z

Description

enable SA + ensemble size > 1 through input coordination
workaround: add input_design parameter to write.sa.config to coordinate multiple inputs (met, IC, soil, etc.) across SA runs.
enhanced README.txt files to track all inputs used per run
fix overwriting issue of sample.Rdata in case of multisite setting;
workaround : implemented logic to append each site's data to existing samples.Rdata instead of overwriting

Motivation and Context

Review Time Estimate

Immediately
Within one week
When possible

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
My name is in the list of CITATION.cff
I agree that PEcAn Project may distribute my contribution under any or all of
- the same license as the existing code,
- and/or the BSD 3-clause license.
I have updated the CHANGELOG.md.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

…g issue of sample.Rdata for multisite setting

…t to track all inputs used per run

infotroph

Thanks for working on this! Big picture, I don't follow how appending to samples.Rdata will work for the processes that need to read from it downstream and I wonder how much of the information appended is already captured by the input design. I left some more specific comments inline, but it's worth being sure the overall design is right before worrying about most of them.

base/workflow/R/run.write.configs.R

modules/uncertainty/R/sensitivity.R

infotroph · 2025-11-10T21:23:05Z

modules/uncertainty/R/sensitivity.R

+    # This checks for met.id tag in the settings under sensitivity analysis -
+    # if it's not there it creates it. Then it's gonna use what it created.
+    if (is.null(settings$sensitivity.analysis$met.id)) {
+      settings$sensitivity.analysis$met.id <- 1


👍 to the overall goal of picking one met (/other input) path to use across the whole SA.

But calling this "met.id" is confusing to me -- looks like this is basically an index into the ensemble, whereas I would usually expect "met ID" to mean an identifier for which source (e.g. CRUNCEP ERA5, etc) the met data came from.

Could it instead be something like settings$sensitivity.analysis$met_path and contain the actual filepath to be used by the SA instead of an index that needs to be extracted from the ensemble? That'd be clearer to me, and would also allow a user to pass an arbitrary path to the SA even if it doesn't exist in the ensemble (not that I expect most people to want that, but it could come in handy sometimes)

...But wait! Doesn't input_design handle this for us just below?

good question, it duplicates with input_design( new approach )

and also one of the reason is to be consistent with the ensemble pattern, so SA doing different (path) would be inconsistent

and works when input_design is NOT provided (single-site or legacy workflows)

The met.id resolution is indeed redundant when input_design is present. The met.id logic serves as a backward compatibility fallback for workflows that don't use input_design. However, when input_design exists it supersedes met.id and handles ALL inputs (met, IC, soil, etc.) .

Alternatively we could skip met.id resolution entirely when input_design exists

short-term fix (This PR):
keep current approach, add clarifying comment:

# select single met path for SA runs without input_design (backward compatibility) # when input_design is provided (recommended), it supersedes this selection # and allows different met files per SA run if (is.list(settings$run$inputs$met$path)) { if (is.null(settings$sensitivity.analysis$met.id)) { settings$sensitivity.analysis$met.id <- 1 } settings$run$inputs$met$path <- settings$run$inputs$met$path[[settings$sensitivity.analysis$met.id]] }

Long-term (Further iteration):
Document input_design as recommended approach , consider deprecating met.id in favor of input_design.

Re met path: You've talked me out of the idea, especially since it would need to be site-specific -- so my proposal wouldn't need just one settings$sensitivity.analysis$met_path, it'd be at least a settings$sensitivity.analysis$<site>$met_path for every site. I do still suggest naming it something like met.index instead of met.id.

Re input design: I'll defer to @dlebauer but my sense is that since we've bought into making input_design mandatory for other parts of the workflow and it's specifically designed to be where we keep all the pieces of the sampling design together in one place, it may be sensible to make it mandatory here too rather than duplicate its functionality.

I do still suggest naming it something like met.index instead of met.id

that's renaming to met.index would be nice clarity, but i am afraid of breaking backward compatibility for existing workflows may have <met.id> , and a direct rename would silently ignore those settings ?

I'm in favor of having a consistent input_design object. It is hard to see an advantage of different approaches if one generalize solution is available.

I am curious - where is met.id used? I can't find the string in either the PEcAn repository or in xml settings files used by CCMMF or Dongchen's workflows.

@dlebauer it's only used immediately below to set the met$path.

I agree with the consensus to drop the met.id solution proposed here and just use input_design. If one wants to only vary parameters and hold everything else fixed it's really easy to generate an input_design that's all 1's for everything other than parameters. The met.id "solution" isn't really a solution anyways as it only fixes met ensemble members and not the other ensembled inputs (phenology, soil physics, initial conditions, etc.)

infotroph · 2025-11-11T07:41:40Z

base/workflow/R/run.write.configs.R

+          # combine parameter samples for same PFT across sites
+          ensemble.samples[[pft_name]] <- rbind(
+            existing_env$ensemble.samples[[pft_name]],
+            ensemble.samples[[pft_name]]


I'm not seeing how one figures out after the fact which samples were used for which site

The site information is preserved through the run id's; Each run id contains the site id
(e.g, ENS-00001-1000004925 --- > site 1000004925), and this becomes the output folder name. Downstream functions don't parse the site id, they just iterate through all run id's and read from folders named with those id's.

# site 1 : runs.samples$ensemble <- data.frame( id = c("ENS-00001-1000004925", # site id "ENS-00002-1000004925") ) ensemble.samples$temperate <- data.frame( growth_resp = c(0.28, 0.31) # rows 1-2 ) # after site 2 merging: runs.samples$ensemble <- data.frame( id = c("ENS-00001-1000004925", # site 1, run 1 "ENS-00002-1000004925", # site 1, run 2 "ENS-00001-1000004928", # site 2, run 1 <-- Different site id "ENS-00002-1000004928") # site 2, run 2 ) ensemble.samples$temperate <- data.frame( growth_resp = c(0.28, 0.31, 0.33, 0.30) # rows match run id's )

In that case I don't think ensemble.samples should grow at all -- The point of passing in an input design is to enforce that every run with a given ensemble member uses the same value of growth_resp (and every other parameter too), so ensemble.samples needs to stay n_ens lines long. I guess one could repeat the same set of values for each site, but that seems VERY inefficient.

@infotroph @mdietze @dlebauer want to make sure that i got right logic with the merging approach ?

Explained in comments

PEcAn.logger::logger.info("###### Finished writing model run config files #####") PEcAn.logger::logger.info("config files samples in ", file.path(settings$outdir, "run")) # ----------------------------------------------------------------------- # multisite merging logic # ----------------------------------------------------------------------- # In multisite ensemble analysis, we maintain parameter # consistency across sites while allowing run tracking to grow. # This ensures that the same parameter samples are applied across all sites, # enabling valid comparison of model performance under different environmental # conditions while tracking site-specific run execution. # ----------------------------------------------------------------------- samples.file <- file.path(settings$outdir, "samples.Rdata") if (file.exists(samples.file)) { PEcAn.logger::logger.info("Merging with existing samples.Rdata to support multisite run...") # load existing data existing_env <- new.env() load(samples.file, envir = existing_env) # Ensemble parameters must remain identical across sites to answer: # How do the SAME parameter sets perform under DIFFERENT conditions ? if (!is.null(existing_env$ensemble.samples)) { # does the existing list have names ? # this validates we have a properly structured ensemble.samples object if (length(names(existing_env$ensemble.samples)) > 0) { # use existing parameter set to maintain ensemble consistency # all sites use identical parameter samples for ensemble analysis ensemble.samples <- existing_env$ensemble.samples } else { # existing parameter structure is invalid - use current site's parameters PEcAn.logger::logger.warn("Existing ensemble.samples structure is invalid. Using current site parameters as new baseline.") } } # maintain consistency in trait and sensitivity analysis samples if (!is.null(existing_env$trait.samples)) { if (length(names(existing_env$trait.samples)) > 0) { trait.samples <- existing_env$trait.samples } } if (!is.null(existing_env$sa.samples)) { if (length(names(existing_env$sa.samples)) > 0) { sa.samples <- existing_env$sa.samples } } # Run IDs append (this SHOULD grow) # --------------------------------------------------------------------- # Run IDs accumulate to track which parameter sets execute at which sites # eg. with 2 ens members across 2 sites: # - site 1: ENS-00001-1000004925, ENS-00002-1000004925 # - site 2: ENS-00001-1000004928, ENS-00002-1000004928 # - final: 4 run IDs tracking same 2 parameter sets across 2 sites # --------------------------------------------------------------------- if (!is.null(existing_env$runs.samples)) { # ensemble runs tracking (data frame -> rbind) # rbind here because run IDs must accumulate across sites if (!is.null(runs.samples$ensemble)) { if (!is.null(existing_env$runs.samples$ensemble)) { # check we are not rbinding garbage. Check for 'id' column. if("id" %in% names(existing_env$runs.samples$ensemble)) { runs.samples$ensemble <- rbind(existing_env$runs.samples$ensemble, runs.samples$ensemble) # run IDs grow to track which sites use which ensemble members } } } else if (!is.null(existing_env$runs.samples$ensemble)) { runs.samples$ensemble <- existing_env$runs.samples$ensemble } # SA runs tracking (list structure -> modifyList) if (!is.null(runs.samples$sa)) { if (!is.null(existing_env$runs.samples$sa)) { runs.samples$sa <- utils::modifyList(existing_env$runs.samples$sa, runs.samples$sa) } } else if (!is.null(existing_env$runs.samples$sa)) { runs.samples$sa <- existing_env$runs.samples$sa } } # metadata (pft names, trait names) should merge to create # coverage across all sites while avoiding duplication # --------------------------------------------------------------------- if (!is.null(existing_env$pft.names)) { pft.names <- unique(c(existing_env$pft.names, pft.names)) } if (!is.null(existing_env$trait.names)) { if (length(names(existing_env$trait.names)) > 0) { trait.names <- utils::modifyList(existing_env$trait.names, trait.names) } } } save(ensemble.samples, trait.samples, sa.samples, runs.samples, pft.names, trait.names, file = samples.file ) PEcAn.logger::logger.info("parameter values for runs in ", samples.file) options(scipen = scipen) invisible(settings) return(settings) }

This seems overly complicated. get.param.samples should generate one matrix of parameters per PFT and currently there is not within-PFT variability in parameter samples across sites (though this may change in the future). For a particular site, the PFT and input_design should determine which parameter matrix you use, and which row in that matrix you select, respectively. There should be no need to rbind matrices.

I agree with Mike here and am leaning toward an even stronger claim that run.write.configs should not be rewriting samples.Rdata at all. Its contents are all either generated once by get.parameter.samples before run.write.configs starts (ensemble.samples, sa.samples, trait.samples) or are not actually samples and can be taken from runs.txt (runs.samples)

not actually samples and can be taken from runs.txt (runs.samples)

does the downstream analysis expect runs.samples to be in samples.Rdata ?

I looked into read.ensemble.output and read.sa.output to verify if we can stop writing samples.Rdata, but those functions explicitly load samples.Rdata to find the run ids :
ens.run.ids <- samples$runs.samples$ensemble
sa.run.ids <- samples$runs.samples$sa

they do not currently have logic to parse runs.txt

so here we can ---
1 ) reload the existing parameters (avoid duplication/growth) but I must append the run ids to runs.samples and save the file. (run.write.cofigs)
2 ) or refactor downstream to read runs.txt ( does this feel like to be a separate PR ? )

Don't know what you mean by "stop writing samples.Rdata". No one is saying this isn't needed, we're saying it should already exist before run.write.configs is called (it is now generated by the new input design function, which should be called before the write configs functions) and it shouldn't need to be modified.

Only start model runs should rely on runs.txt. There's not enough information in there for anything else downstream

Don't know what you mean by "stop writing samples.Rdata".

I was responding specifically to chris's suggestion that runs.samples might be redundant cuz that info

"can be taken from runs.txt".

I was verifying if run.write.configs could stop appending even the new run ids(runs.samples) to samples.Rdata ( relaying on runs.txt instead) , but downstream (read.ensemble.output , .. ) depends on finding those ids in samples.Rdata.

But it's clear now

Only start model runs should rely on runs.txt.

infotroph · 2025-11-11T16:35:56Z

modules/uncertainty/R/sensitivity.R

-          # write run information to disk
+
+          # Apply input design coordination for SA runs
+          settings_copy <- settings


It's confusing that below here some values are taken from settings and others from settings_copy. Is there actually a need for two separate objects or could these mutations be applied straight to settings?

SA loop without settings_copy:

for (trait in traits) { for (quantile in quantiles) { settings$run$inputs$met$path <- input_paths[[run_index]] # mutates ... # write config write.config.SIPNET(settings, ...) # next iteration uses MUTATED settings; # quantile 2 sees met path from quantile 1 } }

so by having settings_copy it modify copy only (settings_copy$run$inputs$met$path <- input_paths[[run_index]]) and write.config.* gets clean settings each iteration.

If we mutated settings directly, run 2's config would inherit run 1's modified input paths, causing incorrect configs.

The answer for question: why settings is used for some values ?
cat("model id : ", settings$model$id, "\n") # never changes across runs
cat("site id : ", settings$run$site$id, "\n") # same for all SA runs

BUT inputs must come from settings_copy:
do.call(my.write.config, settings = settings_copy) # has resolved input paths

Right, but the path is updated earlier in the loop than the write.config call, so by the time write.config gets called for run 2 the path from run 1 has already been overwritten by the path from run 2. As long as you're sure that every iteration will update the same fields of the settings object, it doesn't matter whether it started from a "clean" object or not.

Similarly, if you are using settings_copy then it also contains all the fields from settings that aren't updated each iteration, so there's definitely no reason to interleave uses of both.

I agree that it makes sense to use settings_copy instead of settings. It seems this only concerns the README block that pulls fields from the original settings. If we just switch those cat() calls to read from settings_copy, it’ll be obvious that each iteration uses one clean settings object end-to-end.

I've proposed this change below in https://github.com/PecanProject/pecan/pull/3660/files/f075e4d7d0f9ad85ee340bbe01f7dca464988677#r2561591261

modules/uncertainty/R/sensitivity.R

base/workflow/R/run.write.configs.R

…e env load

…comments

infotroph · 2025-11-24T08:26:28Z

base/workflow/R/run.write.configs.R

+          # combine parameter samples for same PFT across sites
+          ensemble.samples[[pft_name]] <- rbind(
+            existing_env$ensemble.samples[[pft_name]],
+            ensemble.samples[[pft_name]]


In that case I don't think ensemble.samples should grow at all -- The point of passing in an input design is to enforce that every run with a given ensemble member uses the same value of growth_resp (and every other parameter too), so ensemble.samples needs to stay n_ens lines long. I guess one could repeat the same set of values for each site, but that seems VERY inefficient.

infotroph · 2025-11-24T08:32:31Z

base/workflow/R/run.write.configs.R

+    if (!is.null(existing_data$trait.samples)) {
+      trait.samples <- utils::modifyList(existing_data$trait.samples, trait.samples)


Trait.samples ought to be generated exactly once at the beginning of setup (to define the set of posteriors that ensemble.samples is selected from) and not differ from site to site. Since it may contain MCMC results with potentially as many as millions of samples per trait, I really don't think we want to be duplicating it here.

infotroph · 2025-11-24T09:03:06Z

modules/uncertainty/R/sensitivity.R

-          # write run information to disk
+
+          # Apply input design coordination for SA runs
+          settings_copy <- settings


Right, but the path is updated earlier in the loop than the write.config call, so by the time write.config gets called for run 2 the path from run 1 has already been overwritten by the path from run 2. As long as you're sure that every iteration will update the same fields of the settings object, it doesn't matter whether it started from a "clean" object or not.

Similarly, if you are using settings_copy then it also contains all the fields from settings that aren't updated each iteration, so there's definitely no reason to interleave uses of both.

dlebauer · 2025-11-25T21:55:14Z

modules/uncertainty/R/sensitivity.R

              "workflow id : ", workflow.id, "\n",
              "ensemble id : ", ensemble.id, "\n",
-              "pft name    : ", names(trait.samples)[i], "\n",
+              "pft name    : ", names(trait.samples)[pft_idx], "\n",
              "quantile    : ", quantile.str, "\n",
              "trait       : ", trait, "\n",
              "run id      : ", run.id, "\n",
              "model       : ", model, "\n",
              "model id    : ", settings$model$id, "\n",
              "site        : ", settings$run$site$name, "\n",
              "site  id    : ", settings$run$site$id, "\n",
-              "met data    : ", settings$run$site$met, "\n",
+              sa_input_info,
              "start date  : ", settings$run$start.date, "\n",
              "end date    : ", settings$run$end.date, "\n",
              "hostname    : ", settings$host$name, "\n",
              "rundir      : ", file.path(settings$host$rundir, run.id), "\n",
              "outdir      : ", file.path(settings$host$outdir, run.id), "\n",
              file = file.path(settings$rundir, run.id, "README.txt"),


This should address the suggestion in #3660 (comment) to avoid mixing use of settings and settings_copy once the copy is made.

Suggested change

"workflow id : ", workflow.id, "\n",

"ensemble id : ", ensemble.id, "\n",

"pft name : ", names(trait.samples)[i], "\n",

"pft name : ", names(trait.samples)[pft_idx], "\n",

"quantile : ", quantile.str, "\n",

"trait : ", trait, "\n",

"run id : ", run.id, "\n",

"model : ", model, "\n",

"model id : ", settings$model$id, "\n",

"site : ", settings$run$site$name, "\n",

"site id : ", settings$run$site$id, "\n",

"met data : ", settings$run$site$met, "\n",

sa_input_info,

"start date : ", settings$run$start.date, "\n",

"end date : ", settings$run$end.date, "\n",

"hostname : ", settings$host$name, "\n",

"rundir : ", file.path(settings$host$rundir, run.id), "\n",

"outdir : ", file.path(settings$host$outdir, run.id), "\n",

file = file.path(settings$rundir, run.id, "README.txt"),

"workflow id : ", workflow.id, "\n",

"ensemble id : ", ensemble.id, "\n",

"pft name : ", names(trait.samples)[pft_idx], "\n",

"quantile : ", quantile.str, "\n",

"trait : ", trait, "\n",

"run id : ", run.id, "\n",

"model : ", model, "\n",

"model id : ", settings_copy$model$id, "\n",

"site : ", settings_copy$run$site$name, "\n",

"site id : ", settings_copy$run$site$id, "\n",

sa_input_info,

"start date : ", settings_copy$run$start.date, "\n",

"end date : ", settings_copy$run$end.date, "\n",

"hostname : ", settings_copy$host$name, "\n",

"rundir : ", file.path(settings_copy$host$rundir, run.id), "\n",

"outdir : ", file.path(settings_copy$host$outdir, run.id), "\n",

file = file.path(settings_copy$rundir, run.id, "README.txt"),

dlebauer · 2025-11-25T21:56:21Z

modules/uncertainty/R/sensitivity.R

                                               run.id))
          cat(
            run.id,
            file = file.path(settings$rundir, "runs.txt"),


Suggested change

file = file.path(settings_copy$rundir, "runs.txt"),

divine7022 added 3 commits November 10, 2025 14:02

pass input_design param to write.sa.config and resolve the overwritin…

0c3d0fc

…g issue of sample.Rdata for multisite setting

add input_design param for input coordination and enhancing README.tx…

18a3564

…t to track all inputs used per run

update .Rd

f075e4d

github-actions bot added Modules Base labels Nov 10, 2025

infotroph reviewed Nov 11, 2025

View reviewed changes

divine7022 added 3 commits November 12, 2025 09:16

fix indent, remove end-of-line whitespace and consolidated to a singl…

f7b0c3a

…e env load

fix indent, remove end-of-line whitespace, remove auto linter and add…

81f50f6

…comments

add explicit namespace to modifyList calls

5dcfe59

dlebauer self-requested a review November 13, 2025 18:33

infotroph reviewed Nov 24, 2025

View reviewed changes

dlebauer mentioned this pull request Nov 25, 2025

Document joint input design #3677

Open

dlebauer reviewed Nov 25, 2025

View reviewed changes

		if (!is.null(existing_data$trait.samples)) {
		trait.samples <- utils::modifyList(existing_data$trait.samples, trait.samples)

Multisite SA support with input coordination and README tracking #3660

Are you sure you want to change the base?

Multisite SA support with input coordination and README tracking #3660

Conversation

divine7022 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Review Time Estimate

Types of changes

Checklist:

Uh oh!

infotroph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

infotroph Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divine7022 Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlebauer Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divine7022 commented Nov 10, 2025 •

edited

Loading

infotroph Nov 12, 2025 •

edited

Loading

divine7022 Nov 28, 2025 •

edited

Loading

dlebauer Nov 25, 2025 •

edited

Loading