- 
        Couldn't load subscription status. 
- Fork 203
Description
What is wrong?
We have had multiple failures of ocean post caused when the forecast job fails (for various reasons, reasons unrelated to this issue here). The underlying issue is that we check on restarts if the MOM6 restart file is written, however that does not mean that the last time-averaged output has been written. Therefore the model successfully restarts from the restart and then we start getting ocean post failures that we cannot fix because the ocean output is not fully written.
What should have happened?
We should restart the model where we have both output and restarts up to that point in time.
What machines are impacted?
All or N/A
What global-workflow hash are you using?
develop 54dc7d2
Steps to reproduce
Potentially hard to reproduce because you essentially have to have the model crash when the MOM6 restart file is written and the MOM6 output has not finished writing. (Note, if restart write times are weird intervals of averaged output, this could be an issue).
Additional information
This is a rewording of the issue originally reported here: #3773
After discussions with @jiandewang and @sanAkel - we have determined that the writing of restarts and output are asynchronous so the restart being written out is not an indicator for the output having being written out.
If needed or it would help, @sanAkel has offered to modify mom code to echo when:
a. output is written out - as in NOAA-EMC/MOM6#148
b. restart has (also) been written out - to another ascii file (different from a) that can be queried, say be the workflow.
Do you have a proposed solution?
Several options:
- Go back one restart (this however requires more restarts to be written to make sure we can restart within a window and potentially restarting earlier than you have to in other cases.
- Add a check to make sure the ocean model also has written out it's output. This will be a several step process:
- Update MOM6 to write something to the log file to indicate that the model output has been written (modify MOM_sum_output.F90 to have capability of write date in yyyymmddhhmmss format MOM6#151)
- Ensure the MOM6 update makes its way to global-workflow after it's in the ufs-weather-model
- In g-w when we check to see if a restart exists for determining if we can use that restart, we should also do that same log-check for the MOM6 output as well. This code is here: https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/forecast_det.sh#L83-L86