Process validation and it's role in uncertainty decomposition #158

divine7022 · 2025-12-26T10:02:08Z

divine7022
Dec 26, 2025
Maintainer

understanding of how process validation fits into our uncertainty workflow:

conceptually, i am thinking of total forecast uncertainty as:
Var(Y) = Vθ + VX + VIC + Vint + Vprocess + Vobs,
where we have already implemented parameter, driver, IC, and interaction terms via OAT/Sobol and ensemble variance.

What is process error?
process error represents the model's structural inability to perfectly represent reality --fundamentally different from parameter uncertainty (where the equation is correct but values are uncertain).

for agricultural systems specifically, process error dominates over stochasticity because:
• no wildfire disturbance
• controlled planting (no dispersal stochasticity)
• low background mortality
• managed pest control

This means residual error ~= structural/process error for croplands.

earlier discussion with @mdietze and @dlebauer , my understanding is that the gold-standard way to estimate process error is dynamic estimation via SDA (iteratively learns process error through Bayesian joint estimation of process error, state, and observation error through time), requires operational SDA pipeline; but that this is out of scope for the current project. Post-hoc validation residuals (RMSE, bias, etc.) can’t be treated as process error directly because they conflate process, observation, and accumulated errors (what we can do)

Given that, my proposed integration strategy is:

Phase 1 (current scope): Parallel reporting
• Uncertainty report: variance partitioning over parameters, drivers, ICs, and interactions
• Validation report: RMSE / bias / R2 by variable, site, or pft
without folding validation residuals into the variance budget.

Phase 2 (optional extension, only if useful): Residual variance attribution
As a conservative extension, we could compute an “unexplained variance” term
(V_unexplained = V_total,observed − (Vθ + VX + VIC))
and report bounds on V_process using observation-error estimates, rather than a point estimate, with explicit caveats.

My assumption is that Phase 1 is the expected deliverable, and Phase 2 would only be exploratory if there's interest and sufficient validation data.

Questions for discussion

Scope confirmation: Are we aligned that dynamic process error (via SDA) is out of scope, and we are focusing on post-hoc residual analysis?
Integration depth: Should validation appear as:
(a) Separate report section (minimal integration), or
(b) Explicit "unexplained variance" term in decomposition (deeper integration)?
Uncertainty bounds: For the process error component, should we report:
(a) Point estimate with caveats, or
(b) Confidence interval acknowledging we can't fully separate sources?
Site matching: Which of our design points overlap with validation sites? This determines where we can do integrated analysis.

mdietze · 2025-12-26T15:43:24Z

mdietze
Dec 26, 2025
Maintainer

The phase 1 you described is exactly what I was envisioning. Yes, SDA is out of scope at this point in time.
I think the 2a is fine -- one section on UA and then a separate distinct section on Validation. We can put 2b as a "reach goal" if everything works out in time. Decomposition is non-trivial because of all the covariances. Perhaps more useful might be to make sure that the Validation includes posterior predictive (distributional) checks (e.g. quantile histograms), not just point estimate checks. From there we might estimate the bias correction and variance inflation needed to get proper coverage.
I think in 2a we don't estimate process error, but even in the best scenario w/ SDA process error would be a posterior distribution with parametric uncertainty (about the uncertainty)
great question! We should definitely make sure that any site added to the validation is automatically added to the design points too

1 reply

divine7022 Jan 5, 2026
Maintainer Author

now the scope would be --
phase 1: separate UA section (done) + validation section (pending data)
phase 2b: reach goal (noted re: covariance complications)

on posterior predictive checks, i will implement PIT based calibration diagnostics rather than just point estimates. This will include:

PIT histograms to assess distributional calibration
coverage statistics (does 95% CI capture ~95% of obs?)
estimated bias correction and variance inflation factors

for the variance inflation estimation, should we:
(a) report raw ensemble coverage + correction factors needed, or
(b) apply corrections and report "calibrated" predictions?

in other words, target a specific coverage level (e.g. 95% CI captures 95% of obs), or report the actual coverage achieved by the raw ensemble?
my current leaning is to report both the raw ensemble coverage and the inflation factor required to reach nominal coverage, rather than silently applying corrections. That keeps the results transparent. One open question is whether you would prefer these inflation factors reported disaggregated (by variable / site) or as a pooled summary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

California Cropland Monitoring and Modeling Framework

Process validation and it's role in uncertainty decomposition #158

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

California Cropland Monitoring and Modeling Framework

Process validation and it's role in uncertainty decomposition #158

Uh oh!

Uh oh!

divine7022 Dec 26, 2025 Maintainer

Replies: 1 comment · 1 reply

Uh oh!

mdietze Dec 26, 2025 Maintainer

Uh oh!

divine7022 Jan 5, 2026 Maintainer Author

divine7022
Dec 26, 2025
Maintainer

Replies: 1 comment 1 reply

mdietze
Dec 26, 2025
Maintainer

divine7022 Jan 5, 2026
Maintainer Author