1b downscaling workflow #2

dlebauer · 2025-02-03T07:34:46Z

First draft of code to select design points, downscale SIPNET output to fields, and aggregate to county level summaries.

This PR implements the core functionalities for the MVP of Phase 1b (Scale up Statewide Perennial Woody Crop Inventory).

Key features include:

Anchor sites: CSV file of 25 locations for calibration / validation
Data Preparation: extract environmental covariates for each landIQ parcel and join anchor sites to LandIQ parcels
Design Point Selection: Integration of k-means clustering and anchor site merging to generate design points.
Extraction of output from multi-site ensemble SIPNET outputs.
Downscaling & Aggregation: Application of Random Forest to downscale SIPNET outputs and aggregate predictions to county-level inventories.
Documentation & Outputs: Summarize workflow and results in 04_downscaling_documentation_results.qmd. This report can be reviewed by downloading and uncompressing
04_downscaling_documentation_results 2.html.zip (updated 2025-04-15)

Note: I don't expect that this will run from beginning to end flawlessly, but it should work if the install_github is used to install PEcAn packages (assim.sequential and data.land) from specific branches. and using input data in this repository that is on geo.bu.edu.

Related: This code uses branches from the dlebauer/pecan repository in two places. This includes:

code for processing manually curated LandIQ data is in [WIP] Proposed vector data format and application to LandIQ data PecanProject/pecan#3423 (this will need to be updated to more recent version)
downscaling code is in Update SDA Downscaling to allow sf PecanProject/pecan#3431 modules/assim.sequential/R/ensemble_downscale.R

Review Requests

I am primarily interested in feedback on the logic of the workflows and correctness of implementation.
This PR focuses on MVP requirements for Phase 1b. When providing suggestions please distinguish what is required for phase 1b MVP and what should or could be included in future iterations.

…mpile

… order to test downscaling

…urled qmd-->R

…iddling with downscaling

Co-authored-by: Chris Black <[email protected]>

dlebauer

Still halfway through, but I've finished fixing the 00 and 01 scripts

data/anchor_sites_ids.csv

dlebauer · 2025-04-07T22:31:01Z

downscale/00-prepare.R

+#' TODO: move a copy of these files to data_dir
+#' 
+## ----load-soilgrids-----------------------------------------------------------
+soilgrids_north_america_clay_tif <- '/projectnb/dietzelab/dongchen/anchorSites/NA_runs/soil_nc/soilgrids_250m/clay/clay_0-5cm_mean/clay/clay_0-5cm_mean.tif'


(thus the todo on line 123) but ... ideally running workflows wouldn't require changing the scripts. I'm still wondering what is the best way to handle all of these machine specific paths - a bespoke workflow-level settings file?

downscale/03_downscale_and_agregate.R

downscale/02_extract_sipnet_output.R

downscale/04_downscaling_documentation_results.qmd

downscale/01_cluster_and_select_design_points.R

dlebauer · 2025-04-11T00:02:49Z

downscale/01_cluster_and_select_design_points.R

+#' ### Check Clustering
+#'
+## ----check-clustering---------------------------------------------------------
+# Summarize clusters


These comment styles are the result of using knitr::purl when I converted from Rmd or qmd to an R script. Although I don't have immediate plans, the idea is that this could still be knit into a useful document. But yes this will clean up a lot of redundant comments!

downscale/01_cluster_and_select_design_points.R

downscale/00-prepare.R

downscale/01_cluster_and_select_design_points.R

Co-authored-by: Chris Black <[email protected]>

downscale/02_extract_sipnet_output.R

infotroph · 2025-04-11T20:28:20Z

.future.R

@@ -0,0 +1,10 @@
+# This file will load any time the future package is loaded.


Thanks for moving this out of the script --definitely an improvement. I do have some reservations with this approach too:

Having it as a dotfile makes it harder to know it's there. I can imagine myself trying to run one script while not having any future-specific details at the front of my mind and being baffled trying to work out where the "using cores" message was coming from.

Goes against general Git advice that when the point of a file is to contain user-specific values, it's usually better not to commit it even if there's nothing sensitive/secret about the values it stores.

I assume you're intending this to set one plan for the project and have all runs use it. Will there be times that's not true (e.g. are some steps IO-bound such that throwing more cores at them causes slowdown rather than speedup)?

having it as a dotfile ...

This filename is recommended in the docs. We can change it, I'm trying to reduce the amount of reused code.

when the point of a file is to contain user-specific values

How are these user-specific?

set one plan for the project

Could be moved so it is workflow specific.

Will there be times that's not true

There could be, in which case a script could temporarily change the plan.

How are these user-specific?

I'm channeling the standard frustrating-but-persuasive-to-me advice that "the only safe default is one core until the user explicitly says otherwise." Whether or not future's available core detection is robust, it's still not a given that the user wants to use all or all-but-one of them, and similarly the user's choice of plan "multisession"/"multicore"/"cluster"/etc is one that we can't really predict. I get that everything here is overrideable, but am starting from the belief it's better for leave this entirely to the user rather than assume anything.

To resolve this thread I'm OK with merging .future.R and revisiting in the next iteration, but we should put it where it will be useful:

future only checks the working directory and doesn't walk up the parent tree, so I think this should be placed in downscaling/ rather than the repo root, yeah?

BTW, a caution I discovered while testing: .future.R is actually only read when the package is attached, e.g. compare Rscript -e 'future::plan()' to Rscript -e 'library(future); plan()'

downscale/02_extract_sipnet_output.R

… scripts

Co-authored-by: Chris Black <[email protected]>

downscale/02_extract_sipnet_output.R

…plots until I finish debugging

Co-authored-by: Chris Black <[email protected]>

infotroph

Thank you for all the refinements! I think the remaining conversations can be for future improvement.

Last two requests before merging: Please move .future.R into downscaling/ and rename 03_downscale_and_agre... to 03_downscale_and_aggre..

dlebauer added 4 commits January 29, 2025 12:09

first draft of data prep steps for downscaling workflow

0aa071f

first drafts of data prep and clustering workflows

e46420b

major revisions to prepare and clustering workflows

d4a7753

very rough draft for design point simulations

4369a2f

dlebauer changed the title ~~First draft of 1b deliverables~~ [WIP] First draft of 1b deliverables Feb 3, 2025

dlebauer added 25 commits February 5, 2025 04:13

Updated data preparation and clustering workflows so that they now co…

5b20512

…mpile

Created new SIPNETWOPET model as a surrogate for design point runs in…

4778094

… order to test downscaling

update readme and fix SIPNETWOPET

6ca6d6d

move sipnetwopet to standalone script

8cde4b9

converted qmd to R script

191c895

rename 00-prepare.qmd --> 00-prepare.R

d3bb8f4

Merge branch 'main' of github.com:ccmmf/workflows into 1b

dc71719

convert 01 and 02 qmd to R

2c8ddbf

Add downscaling and aggregation workflow for woody crop SOC stocks, p…

543264f

…urled qmd-->R

Refactor SIPNETWOPET workflow: streamline data processing and still f…

c4cd51d

…iddling with downscaling

Initial draft w/ targets package

bf672b0

Initial draft w/ targets package

968f153

Merge branch '1b' of github.com:ccmmf/workflows into 1b

eff17bb

add first version of script to extract sipnet output

6b64c7a

create EFI standard tables and arrays

7c461ff

correct netcdf format

22c3625

convert sipnetwopet simulated data as arrays

1e7b09c

save objects as RDS instead of RData

f4aacfb

first draft of county aggregated SOC and AGB

46a61d5

added anchor sites table

72c5d7a

lots of clean up; too much to document, sorry

0d1801d

Merge branch 'main' of github.com:ccmmf/workflows into 1b

3880a26

added first draft of documentation

d468ea4

updated draft of downscaling workflow

19eec45

update qmd docs to generate self-contained html

273f861

dlebauer and others added 3 commits April 9, 2025 20:58

addressing PR review feedback

2c43668

Update downscale/00-prepare.R

9482450

Co-authored-by: Chris Black <[email protected]>

merge

5d111e8

dlebauer commented Apr 11, 2025

View reviewed changes

dlebauer added 2 commits April 11, 2025 01:07

Address PR reviews - lots of changes in one commit

6dceea5

round lat,lon in anchor_sites_ids.csv

0958654

infotroph reviewed Apr 11, 2025

View reviewed changes

downscale/00-prepare.R Outdated Show resolved Hide resolved

infotroph reviewed Apr 11, 2025

View reviewed changes

downscale/01_cluster_and_select_design_points.R Outdated Show resolved Hide resolved

dlebauer and others added 3 commits April 11, 2025 13:06

Update downscale/00-prepare.R

b71741e

Co-authored-by: Chris Black <[email protected]>

Update downscale/00-prepare.R

3c054d6

Update downscale/02_extract_sipnet_output.R

84c135c

Co-authored-by: Chris Black <[email protected]>

dlebauer commented Apr 11, 2025

View reviewed changes

downscale/02_extract_sipnet_output.R Outdated Show resolved Hide resolved

Update downscale/02_extract_sipnet_output.R

48d79d4

infotroph reviewed Apr 11, 2025

View reviewed changes

dlebauer mentioned this pull request Apr 11, 2025

Get variable importance from RF models ccmmf/organization#114

Closed

infotroph reviewed Apr 11, 2025

View reviewed changes

downscale/02_extract_sipnet_output.R Show resolved Hide resolved

dlebauer and others added 3 commits April 11, 2025 19:00

Refactor output file names for consistency and clarity in downscaling…

5377545

… scripts

Merge branch '1b' of github.com:ccmmf/workflows into 1b

fb6aeb3

Update downscale/02_extract_sipnet_output.R

42a4f3e

Co-authored-by: Chris Black <[email protected]>

dlebauer commented Apr 15, 2025

View reviewed changes

downscale/02_extract_sipnet_output.R Outdated Show resolved Hide resolved

dlebauer and others added 4 commits April 15, 2025 13:37

Update downscale/02_extract_sipnet_output.R

96c604c

fixing plots, including fixing units and removing partial dependency …

77fe472

…plots until I finish debugging

Merge branch '1b' of github.com:ccmmf/workflows into 1b

367f983

Update downscale/01_cluster_and_select_design_points.R

4c38cb1

Co-authored-by: Chris Black <[email protected]>

infotroph approved these changes Apr 16, 2025

View reviewed changes

dlebauer added 2 commits April 16, 2025 17:30

moved .future.R and fixed spelling in 03_ **aggregate**

2271d7d

Merge branch '1b' of github.com:ccmmf/workflows into 1b

90deb6e

dlebauer merged commit 7b5de11 into main Apr 16, 2025

dlebauer deleted the 1b branch April 16, 2025 21:30

dlebauer mentioned this pull request May 5, 2025

Reorganize downscaling workflow & generate design points for herbaceous crops ccmmf/downscaling#1

Merged

		@@ -0,0 +1,10 @@
		# This file will load any time the future package is loaded.

1b downscaling workflow #2

1b downscaling workflow #2

Uh oh!

Conversation

dlebauer commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlebauer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dlebauer Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dlebauer Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

infotroph Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

dlebauer Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

infotroph left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dlebauer commented Feb 3, 2025 •

edited

Loading