-
Notifications
You must be signed in to change notification settings - Fork 16
Avoid closing invalid HDF5 IDs by checking H5Iis_valid first
#17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Avoid closing invalid HDF5 IDs by checking H5Iis_valid first
#17
Conversation
| namespace Closers { | ||
| struct CloseHDF5Attribute { | ||
| static inline void Close(hid_t h) { | ||
| if(h == H5I_INVALID_HID) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhoneyager could you quickly advise us on whether it is a good idea to put in these safety checks (to prevent the closers from acting on an invalid hdf5 object)? Would you agree that if the hid is H5_INVALID_HID, then there is no object to be closed and it's okay to forego calling the associated hdf5 close function. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an object to be closed, you don't have the handle for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In most cases I'd suggest trying explicit calls to
H5Iis_validinstead. This would cover bothH5I_INVALID_HIDand previously-valid identifiers that have already been released elsewhere. - HDF5 datatypes, however, are a bit funky as they refer to both user-defined types (actual handles that need to be freed) and predefined types. I vaguely recall a bug where the predefined datatypes were not detected as valid handles. The upstream HDF5 codes refactored datatypes around v1.12-1.14, so you should check in case of inconsistent behavior.
With the root issue, were you able to determine where the invalid type was being generated in the first place? Normally in the ioda code an invalid handle should be detected upon generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm recompiling and retesting with H5Iis_valid.
We haven't identified the source of the invalid type.
Another way to solve this problem is to not add a check for it and let the error be reported. If an invalid ID in a Closer is genuinely an error, and never expected, then it lets HDF5 warn us about the situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the JEDI executables completed successfully. We got the output ioda files, the updated analysis files and we can cycling for one week.
Curious about SkyLab: does it have those unnecessary "error" messages after a successful JEDI run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen that particular error for a very long time, but my working snapshot is out of date.
The handle release codes run inside destructors, and it's hard to break execution there, so that's why you get a log message at most.
If you need to debug where this is happening, you can catch new messages entering the HDF5 stack via their H5Eset_auto2() function. This function in ioda provides an example on how to extract the messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhoneyager thanks for your help with this!
I don't think we've been seeing these errors in skylab experiments (at least I haven't heard reports of them). @fabiolrdiniz have you ever seen the error mentioned in the internal PR (jcsda-internal/ioda/pull/1571).
Once this gets sorted out, please make the corresponding changes in the internal PR (jcsda-internal/ioda/pull/1571). Our process treats the internal repos as the authoritative source. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srherbener, I've seen these errors early this week while debugging an experiment that @BenjaminRuston was running. I assumed there was something wrong with his build and that wasn't related to what we were looking for. It looks like others are seeing the same. This is what I saved from that:
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: /apps/contrib/spack-stack/spack-stack-1.8.0/cache/build_stage/role-epic/spack-stage-hdf5-1.14.3-h75y5p7cv3n6fqkqjfoahuyfdutestkb/spack-src/src/H5T.c line 1888 in H5Tclose(): not a datatype
major: Invalid arguments to routine
minor: Inappropriate typeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiolrdiniz That the same information we got. Thanks for posting them.
I posted my output in the interval PR page.
https://github.com/JCSDA-internal/ioda/pull/1571#issuecomment-3549967135
|
Does anyone know the commit or date at which these errors started showing up? |
@SamuelTrahanNOAA In my impression, I saw these messages at least since we started focusing on the development of RRFSv2 (Aug or Sept, 2024). |
|
I've switched to using H5Iis_valid to detect invalid handles, as suggested by @rhoneyager-tomorrow. It'll detect other types of invalid handles, such as negative numbers or handles that have been closed. The program still runs without error messages with these changes. |
H5I_INVALID_IDH5Iis_valid(h))
H5Iis_valid(h))H5Iis_valid(h)
H5Iis_valid(h)H5Iis_valid
H5Iis_validH5Iis_valid first
Description
Don't pass H5I_INVALID_ID to any HDF5 library close routines.
The MPAS JEDI "rrfs-workflow" workflow has seen log messages about closing invalid HDF5 types. They look like
.../src/H5T.c line 1911 in H5Tclose(): not a datatype. It was caused by closingH5I_INVALID_ID. That's the constant the HDF5 library uses for the failure to create or open something, or simply for an ID that wasn't initialized.Issue(s) addressed
I'm not aware of any open issues for it; @guoqing-noaa alerted me to the problem outside of github
Dependencies
None.
Impact
Eliminates error messages
.../src/H5T.c line 1911 in H5Tclose(): not a datatypeChecklist