Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Awaiting publication for source_id entries CAM-MPAS-HR and CAM-MPAS-LR #1105

Open
durack1 opened this issue Jun 16, 2022 · 31 comments
Open

Awaiting publication for source_id entries CAM-MPAS-HR and CAM-MPAS-LR #1105

durack1 opened this issue Jun 16, 2022 · 31 comments
Labels
Awaiting data publication awaiting data to be published on ESGF

Comments

@durack1
Copy link
Member

durack1 commented Jun 16, 2022

From: "Harrop, Bryce E"
Date: Thursday, June 16, 2022 at 9:15 AM
To: "Durack, Paul J."
Subject: RE: CMIP6 data licensing update

Hi Paul,

Thanks for reaching out. Things have been busy out here, but are going well. We encountered several severe problems running our model after HPC system maintenance that has caused us significant slowdowns. We are still planning to submit data. We have the historical portion of the HighResMIP experiment complete, but we still need to run the CMOR post-processing on it. Sorry for being so slow on our end.

-Bryce

@beharrop @lrleung @kosaka90 ping

@durack1
Copy link
Member Author

durack1 commented Aug 5, 2022

@beharrop just circling around, it doesn't appear that the CAM-MPAS-HR or the CAM-MPAS-LR are published yet, are you expecting to publish this imminently?

@beharrop
Copy link

beharrop commented Aug 7, 2022

@durack1 we hit some snags with the HPC we have been running simulations on that set us back several months (the same system maintenance referenced in the previous comment). We have only just recently gotten a version of the code back up and running, so we hope to have progress for this soon. Sorry again to be so slow, but we are still working on getting the data ready for publication.

@durack1
Copy link
Member Author

durack1 commented Aug 8, 2022

@beharrop thanks for the update, let's leave this open and await progress then

@durack1
Copy link
Member Author

durack1 commented Oct 4, 2022

Hi @beharrop just circling around, is this issue still valid (you plan to publish data soon), or should we close it out and deregister the CAM* configs?

@beharrop
Copy link

beharrop commented Oct 4, 2022

Hi @durack1, yes the plan is still valid. I hope to have the historical data ready to publish by the end of 2022. The projections are still delayed, but we working to complete them.

@durack1
Copy link
Member Author

durack1 commented Oct 4, 2022

@beharrop great, thanks for the timeline update - until December then..

@durack1
Copy link
Member Author

durack1 commented Feb 22, 2023

@beharrop happy 2023! I am just circling around on the CAM-MPAS-HR or the CAM-MPAS-LR unpublished registrations, got any updates for us?

@beharrop
Copy link

Hi @durack1, happy 2023! I have an update and a question for you. After struggling with changes to the machine we were running on, we discovered we needed to update the code to get the model to run again. The code changes, unfortunately, led to differences in the cloud fields, so the future portion of our HighResMIP experiment would have a step change in climate owing to code differences. We decided to port the code to a different machine that is newer and faster and rerun the entire historical and future segments. My question is, do you think it is worth publishing any of the data from the current historical portion? If we did that, we would end up with two historical segments and one future segment, and users would have to know which historical to pair with the future to get the correct climate change signal. I am a little worried that would be confusing, but if you want, we can CMORize the current historical data and get that up on ESGF. What do you think?

@durack1
Copy link
Member Author

durack1 commented Feb 22, 2023

@beharrop interesting question! The decision is really up to you, but I would personally be hesitant to recommend publication of data that is not reproducible, having said that, it's likely the case for a large portion of the contributions to date that have had machine (and code/compiler) changes since the original submission in 2018.

On a pragmatic front, you could in theory publish the first hist-1950? with r1i1p1f1, and the second sims with r2i1p1f1, or a variant of this if your physics (p) have changed. What most folks would then assume, is that the r2i1p1f1 hist-1950, would be the precursor sim in which the highres-future r2i1p1f1 is branched from - that's what I would assume

@beharrop
Copy link

@durack1 Thanks for the suggestions. Our group is leaning toward not publishing the other data to make things clearer. Fingers crossed the new runs can get done soon. Sorry again this is taking so long.

@durack1
Copy link
Member Author

durack1 commented Feb 23, 2023

@beharrop sounds good, glad you are making progress and publication is imminent!

@durack1
Copy link
Member Author

durack1 commented Nov 19, 2023

@beharrop it's been a while, wondering how you're going with this?

@beharrop
Copy link

@durack1, progress is going nicely. We've nearly completed the historical LR rerun and the HR is moving along at a decent clip.

@durack1
Copy link
Member Author

durack1 commented Nov 20, 2023

@beharrop nice. Are you planning on prepping and publishing these data soon?

@beharrop
Copy link

@durack1, yea, I'll aim to get the historical portions published as soon as possible rather than wait for the projection part to finish

@durack1
Copy link
Member Author

durack1 commented Aug 2, 2024

@beharrop just circling around on this - any progress to report? We are starting to move toward sunsetting the CMIP6 project, so am closing up remaining loose ends - new data can continue to be published into the https://aims2.llnl.gov/search?project=CMIP6Plus project

@beharrop
Copy link

beharrop commented Aug 3, 2024

@durack1 the simulations (LR and HR) are into the projection phase. Still need to process the data for the historical and run it through the CMOR process.

@durack1
Copy link
Member Author

durack1 commented Aug 5, 2024

@beharrop sounds good. As you have the models registered here, it will be relatively straight forward to configure CMOR to write data - using the cmip6-cmor-tables

@durack1
Copy link
Member Author

durack1 commented Dec 18, 2024

@beharrop just circling. We are now starting to move toward closing down publications to the CMIP6 project, with the opportunity to migrate new datasets to CMIP6Plus as an interim step before CMIP7 comes online ~mid next year.

If you are not planning on publishing these data soon, I suggest we deregister the CAM-MPAS-* models, and deal with registration when you have these simulations ready for rewriting and publication

@beharrop
Copy link

@durack1, thanks for checking in. The simulations are complete (finished last week during AGU). It is on my to-do list to get the data processing run ASAP. Do you know if I can submit data in intervals (e.g. submit monthly data and then circle back for daily later)?

@durack1
Copy link
Member Author

durack1 commented Dec 18, 2024

@beharrop nice, good to know you're almost there. You can do things on a timeline that works for you, with the full knowledge that where and how these publications are occurring have already started migrating post-CMIP6.

The E3SM team is now publishing their data at ANL (see here), and work is underway on a new ESGF index, which means the existing SOLR-based CMIP6 index will cease operation mid-2025.

There are plans to migrate the existing CMIP6 index across to new infrastructure, but that requires data to already be published and present in the index.. Which gets us back to the original query of timelines..

@durack1
Copy link
Member Author

durack1 commented Mar 1, 2025

@beharrop, I'm just circling. As broadcast earlier this month, the CMIP6 project will close to new publications in just over a month—see https://wcrp-cmip.org/esgf-information.

Let me know if you need any more info.

@beharrop
Copy link

Hi @durack1, I have the monthly data processed and I'm ready to get it published while I work on higher frequency data. The data are currently on NERSC's perlmutter, but presumably it needs to be moved to one of the DOE ESGF nodes. I've been trying to do some googling about how to get that setup, but you have a lot more expertise on this than I do. Any recommendations on where I can learn the steps needed to get this moving?

@durack1
Copy link
Member Author

durack1 commented Mar 17, 2025

@beharrop wonderful - we're off!

Just looping a couple of folks in, connecting you to folks active with E3SM publication - @sashakames @TonyB9000 @rljacob @chengzhuzhang

@sashakames
Copy link

We may want to rope in @climate-dude (Forrest Hoffman) to see if ORNL would be the suitable host. In the meantime we are working on standing up data node infrastructure at NERSC to serve data in place.

@chengzhuzhang
Copy link

Hi @durack1, I have the monthly data processed and I'm ready to get it published while I work on higher frequency data. The data are currently on NERSC's perlmutter, but presumably it needs to be moved to one of the DOE ESGF nodes. I've been trying to do some googling about how to get that setup, but you have a lot more expertise on this than I do. Any recommendations on where I can learn the steps needed to get this moving?

@beharrop The source_ids don't look familiar. But you mentioned HighResMIP, so I suppose this belongs to a CMIP6 experiment. Given that CMIP6 will close its data index by April 5th, it may not be realistic to publish following the CMIP6 workflow at this point. From E3SM side, our plan is to resume publishing CMIP6 data on May with an update ESGF publisher and use our workflow set up at ANL. We will start an E3SM-CMIP6-Supplement project to host all the remaining CMIP6 era data from E3SM. We can share this workflow, if this sounds reasonable, of course, if this is not E3SM data, you can use a different project name.

@beharrop
Copy link

Thanks, @chengzhuzhang, for the update. These are not E3SM runs; they are MPAS using CAM physics run as part of the WACCEM RGMA project. They follow the CMIP6 HighResMIP protocol. We had hoped to publish the data as part of phase 6, but there were some very significant delays that happened along the way. Ideally we would like the data to be discoverable along with the HighResMIP simulations using other models. Do the workflow changes you eluded to essentially just mean a set of changes to the metadata json files?

@chengzhuzhang
Copy link

chengzhuzhang commented Mar 21, 2025

@beharrop I missed important details from your original post, it looks like that the monthly data are already cmorized and ready to publish! In this case, once the data is moved to one of the DOE ESGF nodes (ANL or ORNL), the publication step is fast and straightforward. @TonyB9000 has been publishing on ANL, but we are tight on disk space (LCRC). How large is the dataset, we might be help if the data size is not too large.

@beharrop
Copy link

@chengzhuzhang, yes the monthly data are CMORized and are sitting on perlmutter at the moment. I have HR and LR runs for historical (1950-2014) and projection (2015-2050) with the following volumes:
HR hist: 369G
HR proj: 205G
LR hist: 24G
LR proj: 14G
So a total of 612G for the monthly. We do have higher frequency data (daily and hourly) that are being processed at NERSC currently.

@chengzhuzhang
Copy link

@beharrop Thanks! The total volume is about half TB and I think we can manage it. Could you global transfer files over LCRC (/lcrc/group/e3sm/) , and then @TonyB9000 can take over and try publishing?

@beharrop
Copy link

@chengzhuzhang, @TonyB9000, I am moving the data to /lcrc/group/e3sm/ac.bharrop/HRMIP/ currently. Looks like it's already finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting data publication awaiting data to be published on ESGF
Projects
None yet
Development

No branches or pull requests

4 participants