Skip to content

Conversation

@SamuelTrahanNOAA
Copy link

Description

Don't pass H5I_INVALID_ID to any HDF5 library close routines.

The MPAS JEDI "rrfs-workflow" workflow has seen log messages about closing invalid HDF5 types. They look like .../src/H5T.c line 1911 in H5Tclose(): not a datatype. It was caused by closing H5I_INVALID_ID. That's the constant the HDF5 library uses for the failure to create or open something, or simply for an ID that wasn't initialized.

Issue(s) addressed

I'm not aware of any open issues for it; @guoqing-noaa alerted me to the problem outside of github

Dependencies

None.

Impact

Eliminates error messages .../src/H5T.c line 1911 in H5Tclose(): not a datatype

Checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have run the unit tests before creating the PR

@SamuelTrahanNOAA SamuelTrahanNOAA changed the base branch from master to develop November 18, 2025 22:56
namespace Closers {
struct CloseHDF5Attribute {
static inline void Close(hid_t h) {
if(h == H5I_INVALID_HID) return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhoneyager could you quickly advise us on whether it is a good idea to put in these safety checks (to prevent the closers from acting on an invalid hdf5 object)? Would you agree that if the hid is H5_INVALID_HID, then there is no object to be closed and it's okay to forego calling the associated hdf5 close function. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an object to be closed, you don't have the handle for it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In most cases I'd suggest trying explicit calls to H5Iis_valid instead. This would cover both H5I_INVALID_HID and previously-valid identifiers that have already been released elsewhere.
  2. HDF5 datatypes, however, are a bit funky as they refer to both user-defined types (actual handles that need to be freed) and predefined types. I vaguely recall a bug where the predefined datatypes were not detected as valid handles. The upstream HDF5 codes refactored datatypes around v1.12-1.14, so you should check in case of inconsistent behavior.

With the root issue, were you able to determine where the invalid type was being generated in the first place? Normally in the ioda code an invalid handle should be detected upon generation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm recompiling and retesting with H5Iis_valid.

We haven't identified the source of the invalid type.

Another way to solve this problem is to not add a check for it and let the error be reported. If an invalid ID in a Closer is genuinely an error, and never expected, then it lets HDF5 warn us about the situation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the JEDI executables completed successfully. We got the output ioda files, the updated analysis files and we can cycling for one week.

Curious about SkyLab: does it have those unnecessary "error" messages after a successful JEDI run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen that particular error for a very long time, but my working snapshot is out of date.

The handle release codes run inside destructors, and it's hard to break execution there, so that's why you get a log message at most.

If you need to debug where this is happening, you can catch new messages entering the HDF5 stack via their H5Eset_auto2() function. This function in ioda provides an example on how to extract the messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhoneyager thanks for your help with this!

I don't think we've been seeing these errors in skylab experiments (at least I haven't heard reports of them). @fabiolrdiniz have you ever seen the error mentioned in the internal PR (jcsda-internal/ioda/pull/1571).

Once this gets sorted out, please make the corresponding changes in the internal PR (jcsda-internal/ioda/pull/1571). Our process treats the internal repos as the authoritative source. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srherbener, I've seen these errors early this week while debugging an experiment that @BenjaminRuston was running. I assumed there was something wrong with his build and that wasn't related to what we were looking for. It looks like others are seeing the same. This is what I saved from that:

HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: /apps/contrib/spack-stack/spack-stack-1.8.0/cache/build_stage/role-epic/spack-stage-hdf5-1.14.3-h75y5p7cv3n6fqkqjfoahuyfdutestkb/spack-src/src/H5T.c line 1888 in H5Tclose(): not a datatype
    major: Invalid arguments to routine
    minor: Inappropriate type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiolrdiniz That the same information we got. Thanks for posting them.

I posted my output in the interval PR page.
https://github.com/JCSDA-internal/ioda/pull/1571#issuecomment-3549967135

@SamuelTrahanNOAA
Copy link
Author

Does anyone know the commit or date at which these errors started showing up?

@guoqing-noaa
Copy link

Does anyone know the commit or date at which these errors started showing up?

@SamuelTrahanNOAA In my impression, I saw these messages at least since we started focusing on the development of RRFSv2 (Aug or Sept, 2024).

@SamuelTrahanNOAA
Copy link
Author

I've switched to using H5Iis_valid to detect invalid handles, as suggested by @rhoneyager-tomorrow. It'll detect other types of invalid handles, such as negative numbers or handles that have been closed.

The program still runs without error messages with these changes.

@SamuelTrahanNOAA SamuelTrahanNOAA changed the title Don't tell HDF5 to close H5I_INVALID_ID Don't tell HDF5 to close invalid HDF5 IDs (H5Iis_valid(h)) Nov 21, 2025
@SamuelTrahanNOAA SamuelTrahanNOAA changed the title Don't tell HDF5 to close invalid HDF5 IDs (H5Iis_valid(h)) Don't tell HDF5 to close invalid HDF5 IDs using H5Iis_valid(h) Nov 21, 2025
@SamuelTrahanNOAA SamuelTrahanNOAA changed the title Don't tell HDF5 to close invalid HDF5 IDs using H5Iis_valid(h) Don't tell HDF5 to close invalid HDF5 IDs using H5Iis_valid Nov 21, 2025
@SamuelTrahanNOAA SamuelTrahanNOAA changed the title Don't tell HDF5 to close invalid HDF5 IDs using H5Iis_valid Avoid closing invalid HDF5 IDs by checking H5Iis_valid first Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants