-
Notifications
You must be signed in to change notification settings - Fork 2
1b downscaling workflow #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… order to test downscaling
…iddling with downscaling
Co-authored-by: Chris Black <[email protected]>
dlebauer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still halfway through, but I've finished fixing the 00 and 01 scripts
| #' TODO: move a copy of these files to data_dir | ||
| #' | ||
| ## ----load-soilgrids----------------------------------------------------------- | ||
| soilgrids_north_america_clay_tif <- '/projectnb/dietzelab/dongchen/anchorSites/NA_runs/soil_nc/soilgrids_250m/clay/clay_0-5cm_mean/clay/clay_0-5cm_mean.tif' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(thus the todo on line 123) but ... ideally running workflows wouldn't require changing the scripts. I'm still wondering what is the best way to handle all of these machine specific paths - a bespoke workflow-level settings file?
| #' ### Check Clustering | ||
| #' | ||
| ## ----check-clustering--------------------------------------------------------- | ||
| # Summarize clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comment styles are the result of using knitr::purl when I converted from Rmd or qmd to an R script. Although I don't have immediate plans, the idea is that this could still be knit into a useful document. But yes this will clean up a lot of redundant comments!
Co-authored-by: Chris Black <[email protected]>
Co-authored-by: Chris Black <[email protected]>
| @@ -0,0 +1,10 @@ | |||
| # This file will load any time the future package is loaded. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving this out of the script --definitely an improvement. I do have some reservations with this approach too:
- Having it as a dotfile makes it harder to know it's there. I can imagine myself trying to run one script while not having any future-specific details at the front of my mind and being baffled trying to work out where the "using cores" message was coming from.
- Goes against general Git advice that when the point of a file is to contain user-specific values, it's usually better not to commit it even if there's nothing sensitive/secret about the values it stores.
- I assume you're intending this to set one plan for the project and have all runs use it. Will there be times that's not true (e.g. are some steps IO-bound such that throwing more cores at them causes slowdown rather than speedup)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having it as a dotfile ...
This filename is recommended in the docs. We can change it, I'm trying to reduce the amount of reused code.
when the point of a file is to contain user-specific values
How are these user-specific?
set one plan for the project
Could be moved so it is workflow specific.
Will there be times that's not true
There could be, in which case a script could temporarily change the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these user-specific?
I'm channeling the standard frustrating-but-persuasive-to-me advice that "the only safe default is one core until the user explicitly says otherwise." Whether or not future's available core detection is robust, it's still not a given that the user wants to use all or all-but-one of them, and similarly the user's choice of plan "multisession"/"multicore"/"cluster"/etc is one that we can't really predict. I get that everything here is overrideable, but am starting from the belief it's better for leave this entirely to the user rather than assume anything.
To resolve this thread I'm OK with merging .future.R and revisiting in the next iteration, but we should put it where it will be useful:
- future only checks the working directory and doesn't walk up the parent tree, so I think this should be placed in
downscaling/rather than the repo root, yeah?
BTW, a caution I discovered while testing: .future.R is actually only read when the package is attached, e.g. compare Rscript -e 'future::plan()' to Rscript -e 'library(future); plan()'
…plots until I finish debugging
Co-authored-by: Chris Black <[email protected]>
infotroph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the refinements! I think the remaining conversations can be for future improvement.
Last two requests before merging: Please move .future.R into downscaling/ and rename 03_downscale_and_agre... to 03_downscale_and_aggre..
First draft of code to select design points, downscale SIPNET output to fields, and aggregate to county level summaries.
This PR implements the core functionalities for the MVP of Phase 1b (Scale up Statewide Perennial Woody Crop Inventory).
Key features include:
04_downscaling_documentation_results 2.html.zip (updated 2025-04-15)
Note: I don't expect that this will run from beginning to end flawlessly, but it should work if the
install_githubis used to install PEcAn packages (assim.sequential and data.land) from specific branches. and using input data in this repository that is on geo.bu.edu.Related: This code uses branches from the dlebauer/pecan repository in two places. This includes:
modules/assim.sequential/R/ensemble_downscale.RReview Requests
I am primarily interested in feedback on the logic of the workflows and correctness of implementation.
This PR focuses on MVP requirements for Phase 1b. When providing suggestions please distinguish what is required for phase 1b MVP and what should or could be included in future iterations.