missing_constant treated differently to valid_minimim/valid_maximum #1387

rjwilson-LASP · 2025-09-29T17:25:55Z

rjwilson-LASP
Sep 29, 2025

The Validator tool gives me errors about the missing_constant value here, for a number in CSV files to 3 decimal places.
e.g. I'd assume the xml code should be this:

        <Field_Delimited>
          <name>name here</name>
          <field_number>38</field_number>
          <data_type>ASCII_Real</data_type>
          <field_format>%10.3f</field_format>
          <unit>eV</unit>
          <description>
            Species 3 Temperature Uncertainty
          </description>
          <Special_Constants>
            <missing_constant>   -1.000</missing_constant>
            <valid_maximum>   10000.000</valid_maximum>
            <valid_minimum>       0.001</valid_minimum>
          </Special_Constants>
        </Field_Delimited>

The validator gives warnings every time the CSV has a -1.000 in that column.
Yet if I alter one line above to be:

            <missing_constant>   -1</missing_constant>

The validator is then happy, despite the CSV being unaltered with -1.000's in it.

I've had similar issues with column formats of %10.3f, %9.4f and %8.6f, all with missing_constant values of -1 (to however many decimal places).
But weirdly, one column had a format of %9.3f (with a missing constant of 99999.999) and this didn't complain. (perhaps because not negative?)

That seems unexpected, and I'd prefer to keep all values here to the same precision.
Is this a bug in the code?

Answered by al-niessner

Oct 3, 2025

Let me be bullet point it because we went of track somewhere.

For missing constant -1.000:

validate does string comparison prior to converting to floats exactly because bits are hard.
validate cannot strip the white space from your XML strings without potentially changing its meaning
There would be no change to floating point numbers if the white space was not there or matched what was in the CSV.

For missing constant -9.999999e4:

validate does string comparison prior to converting to floats (same as -1.000)
validate cannot strip white space or do any other string manipulation (move decimal) without potentially changing its meaning
There would be no change to floating point numbers if …

View full answer

rjwilson-LASP · 2025-09-30T19:21:15Z

rjwilson-LASP
Sep 30, 2025
Author

I get a similar sort of error with:

          <data_type>ASCII_Real</data_type>
          <field_format>%14.7E</field_format>
          <Special_Constants>
            <missing_constant>-9.9999999E+04</missing_constant>
            <valid_maximum>    1.0000000E-08</valid_maximum>
            <valid_minimum>    1.0000000E-17</valid_minimum>
          </Special_Constants>

Which the validator warning of: "Field has a value '-99999.999' that is less than the defined minimum value '1.0000000E-17'. "
I thought missing_constant could be outside of the valid_min/max?

0 replies

jordanpadams · 2025-09-30T20:12:05Z

jordanpadams
Sep 30, 2025
Maintainer

@rjwilson-LASP what version of validate are you running?

also, just curious, how did you create this ticket? It looks like it is missing an issue template, so curious how was created without one.

0 replies

rjwilson-LASP · 2025-09-30T20:41:18Z

rjwilson-LASP
Sep 30, 2025
Author

Hi, am using 3.7.1 which gave both the above issues, but had 3.6.3 previously and was getting the first issue. (The second I only made and tried an XML file after I had 3.7.1).
As for how I reported this, I was on the GitHub website, clicked issue and started one. Not sure exactly what the button was, and I am generally unfamiliar with GitHub, so extremely possible I did something the wrong way.

0 replies

al-niessner · 2025-10-01T15:35:18Z

al-niessner
Oct 1, 2025
Maintainer

@jordanpadams can we trim the white space of a special constant? My reading is no because it is supposed to be what the user types is literal truth - remember the special constant is not the same type as the cell nor is required to be. Example: fixed width column of 3 characters. missing constant is a. Trimming would give 'a'. However a valid field would be a that also trims to 'a'. Silly, yes. Still, it demonstrates why we should not trim.

@rjwilson-LASP

There are fixable and non fixable items in your statements. First, comparing special constants with values in a table or array are tricky because their respective types do not need to be the same according to the PDS documentation.

The special constant -1 works but -1.000 does not. It is a floating point conversion problem. When validate asks Java to convert -1.000 from the special constants it gets a different bit pattern then when converting from the CSV. I have no idea why other than -1.000 does not have an exact floating point representation causing a == b to fail. When given -1, validate converts it to an integer allowing a == b in this case.

Validate compares the special constants and values as strings before doing int and float conversions. By removing the white space in <missing_constant> -1.000</missing_constant> it will do a better job. It may still fail if your CSV does not represent it as -1.000 (say -1) as you are back to float conversions. I would test this but I do not see a data set to work with on this ticket.

The same is happening with -9.9999999E+04. The imprecise float representation of that number makes it one bit away from -9999.999 and thus fails. If you changed the special constant to <missing_constant>-9999.999</missing_constant> - assuming that is how it is in your CSV - then it should work.

I understand that the white space in your XML is an attempt to make the XML more readable and removing it is not ideal. I also understand that your choice of how values are represented may have meaning as well -(-9.999999e+04 vs -9999.999). However, the problems you are experiencing are bound by the limitations of floating point quantization and to avoid those limitations changing the representations may be the only solution.

0 replies

rjwilson-LASP · 2025-10-01T15:55:45Z

rjwilson-LASP
Oct 1, 2025
Author

Thanks for the thorough reply! So my summary, The Java code converts floats to bit patterns in an inconsistent way for the data file vs. values in the xml file, resulting in a different bit-pattern representation of the numbers. That seems to be a coding feature rather than a format issue, and to me it makes more sense to keep <missing_constant> in the same <field_format> as the data in the CSV file and live with the warnings out of validate. Thanks for the swift feedback!

0 replies

jordanpadams · 2025-10-01T16:45:23Z

jordanpadams
Oct 1, 2025
Maintainer

@al-niessner you are correct. right now, the standards do not provide enough information for us to say we can safely remove whitespace from special constants. that being said, this is currently being worked by the PDS4 Data Design Working Group to loosen that portion of the standard.

@rjwilson-LASP as @al-niessner, I believe the primary issue here is limitations of floating point quantization, not a coding feature. I believe this issue would arise in any software. @al-niessner may be able to shed some more light on this.

0 replies

rjwilson-LASP · 2025-10-01T17:05:32Z

rjwilson-LASP
Oct 1, 2025
Author

Since this is reopened (and thanks for continuing a potential resolution), @al-niessner wrote: "First, comparing special constants with values in a table or array are tricky because their respective types do not need to be the same according to the PDS documentation."

Can I ask which documentation that's from? I see "Note that the value that is chosen for a special constant must be the right data type for the field." from bottom of page 90 of PDS4 Data Providers Handbook

0 replies

al-niessner · 2025-10-01T17:39:27Z

al-niessner
Oct 1, 2025
Maintainer

Absolutely. The PDS4 Information Model Specification states:

valid_maximum in [Special_Constants](https://pds.nasa.gov/datastandards/documents/im/v1/index_1J00.html#class_pds_special_constants)
The valid_maximum attribute specifies the maximum valid value in the field or digital object with which the Special_Constants class is associated. Values above the valid_maximum have a special meaning. Values of this attribute should be represented in the same data_type as the elements in the object or field described. (Note that PDS3 had no qube-related valid_maximum values because all special constants were set below the valid_minimum.)

The keywords are "should be" and not "must be". Right data type does not necessarily mean matching data type?

Another project uses 0x7FF80000 as their special constant which is floating point NaN. We have mechanisms for allowing specific bit patterns to be turned into floats, like the NaN. It is the only way to guarantee the desired quantization of a textual number to bits. Their data is also binary.

0 replies

rjwilson-LASP · 2025-10-01T19:09:56Z

rjwilson-LASP
Oct 1, 2025
Author

At the risk of beginning to sound like a lawyer, I agree should is not the same as shall, but the example you gave is for valid_maximum (same for valid_minimum), which I've had no issues with. My issue is with the missing_constant entry, and the definition of it is actually less detailed in that model: "The missing_constant attribute provides a value that indicates the original value was missing, such as due to a gap in coverage. " No mentions of should/shalls or data_type at all, so we could assume this is not a requirement to match data_type. One potential lead, in the missing_constant entry in the model, it has an entry of 'Conceptual Domain: Short_String', whereas valid_minimum/maximum definitions have no Conceptual Domain entry. Perhaps we've simply found an item that is not sufficiently defined.

0 replies

al-niessner · 2025-10-01T19:53:55Z

al-niessner
Oct 1, 2025
Maintainer

lol - pretending to be a language lawyer is the worst part of my job

Same document and far, far less clear:

missing_constant in Special_Constants
The missing_constant attribute provides a value that indicates the original value was missing, such as due to a gap in coverage.

No should or shall or must anywhere. It can be any random thing like 'intentionally left blank'.

I think if you tried to force data types onto special constants you would invite more problems than you would solve. The problem is in floating point representation and expecting bit equality. Here is a short example in Python (Java is even more suseptible):

a = 0.1 + 0.2
b = 0.3

if a == b:
    print("a and b are equal")
else:
    print("a and b are not equal")
    print(f"a = {a:.17f}")
    print(f"b = {b:.17f}")

results in for me with Python 3.12:

a and b are not equal
a = 0.30000000000000004
b = 0.29999999999999999

0 replies

rjwilson-LASP · 2025-10-01T20:19:07Z

rjwilson-LASP
Oct 1, 2025
Author

Bits can be surprisingly annoying (much like dealing with time), even though they sound so simple.
However I see this as a code issue, in that it could be solved with extra code. We don't want to compare bit-patterns if a field-format is provided on reading in a text file, we want to compare numbers when expressed in that format.
e.g. to borrow your python example and the %10.3f format of my initial query, I'd have done it something like this:

a = 0.1 + 0.2
b = 0.3

frmt = '%10.3f' # from the field_format entry

if (frmt % a) == (frmt % b):
    print("a and b are equal in a format of %s"%frmt)
else:
    print("a and b are not equal in a format of %s"%frmt)

which returns:
a and b are equal in a format of %10.3f
I admit this string comparison will be slower, but it's more faithful to the intent, and potentially could only be used if a!=b directly (for speed).

0 replies

al-niessner · 2025-10-01T20:40:51Z

al-niessner
Oct 1, 2025
Maintainer

And now back to my original comments and questions to @jordanpadams . Since the images - the textual representation of your numbers - do not match, they must be bit compared since the value is a number. Hence the suggestion about removing white space and changing -9.99999e4 to -99999.999 if that is used in the CSV. Hence the example about not being able to strip the constants of white space. You can strip the constants of their white space, of course, but validate cannot without potentially changing the meaning of the special constant.

I get it. I really do. I bet, only because it seems common practice, that in your code you do not test x == -9.999999e4 but rather x < -99999 (equivalent to %10.3f comparing) since you never get close to -99999 anyway. Doing so erases all of those pesky bit problems. Validate cannot make such nice fixes because the generic case does allow for it - like the NaN stated earlier.

Yes, there are plenty of code blobs that can be added for a bespoke solution to your exact problem but they break in the general case like NaN. Gotta love NaN. validate is constrained by not stripping the image of white space for comparison and using Java libraries for converting numbers like 9.99999e4. All of the constraints are fixed when the value image, which validate has, and the image of the special constant exactly as in the XML, which validate has, match erasing all of the bit problems and your warnings.

0 replies

rjwilson-LASP · 2025-10-03T21:38:00Z

rjwilson-LASP
Oct 3, 2025
Author

Hi @al-niessner, I appreciate the back and forth, and my continued comments are merely intended as hopefully helpful feedback. And most of this is getting philosophical, we should be debating this over a coffee/beer, and if the chance ever happens, I'll be happy to buy the round.

My philosophy on this is that for binary data files, I would like a bit-pattern comparison, but for ASCII data files I would like the comparison to be string based if a field_format code was provided.
(I acknowledge that since the label file is in ASCII, writing a floating point number to represent a binary file's float could be tricky.)

Validate does check that all my ASCII values match the field_format I stated in the label (which is great), and I would have wished for a similar check on the missing_constant value too.

But I can understand why this gets complicated for a code that must work on all PDS4 file variants.

My (and just my) issue now is that when I run validate, I get thousands of warnings due to the float format comparison failing on the missing_constant value, making it hard to see any warnings that aren't due to that. A few pipes to grep on the output will likely help there though with some trial and error. And with that, I suspect we're ready to close this out. Thanks for the replies and explanations!

0 replies

al-niessner · 2025-10-03T21:49:55Z

al-niessner
Oct 3, 2025
Maintainer

Let me be bullet point it because we went of track somewhere.

For missing constant -1.000:

validate does string comparison prior to converting to floats exactly because bits are hard.
validate cannot strip the white space from your XML strings without potentially changing its meaning
There would be no change to floating point numbers if the white space was not there or matched what was in the CSV.

For missing constant -9.999999e4:

validate does string comparison prior to converting to floats (same as -1.000)
validate cannot strip white space or do any other string manipulation (move decimal) without potentially changing its meaning
There would be no change to floating point numbers if number's image in the CSV was the same - including white space etc - as the XML. They both have to -be 9.99999e4 OR -99999.999.

0 replies

rchenatjpl · 2025-11-19T04:53:00Z

rchenatjpl
Nov 19, 2025
Collaborator

Hi, @al-niessner , I'm a little lost in the words. Is this behavior right: missing_constant gets compared as a string, and if its numerical value is below valid_minimum, that's still going to draw a warning?

5 replies

al-niessner Nov 19, 2025
Maintainer

For max and min, cannot do strings because a < b for strings is not the same thing as a < b for integers or floats.

rjwilson-LASP Nov 20, 2025
Author

If I'm interpreting the question correctly, can we do a string comparison from missing_constant itself? (maybe minus leading/trailing whitespace)

al-niessner Nov 20, 2025
Maintainer

Yes, validate does the string comparison first, but sadly not trimming or stripping it first. As strings, we have treat them as literal - no trimming or stripping.

rjwilson-LASP Nov 20, 2025
Author

Okay, but I've files where the CSV has a value (showing commas for clarity) of ", -1.000,", and in the xml " <missing_constant> -1.000</missing_constant>", which is the exact same string (inc. white space) between the commas in the CSV or the >< in the xml, and validator gives me the warning "Field has a value '-1' that is less than the defined minimum value '0.001'." Seems like a literal string comparison should match there. Anyway, that was my personal confusion with this.

al-niessner Nov 20, 2025
Maintainer

If they are string image identical including white space and not being seen as identical, then that is a bug in validate. I can guess where too.

missing_constant treated differently to valid_minimim/valid_maximum #1387

Uh oh!

rjwilson-LASP Sep 29, 2025

Replies: 15 comments · 5 replies

Uh oh!

rjwilson-LASP Sep 30, 2025 Author

Uh oh!

jordanpadams Sep 30, 2025 Maintainer

Uh oh!

rjwilson-LASP Sep 30, 2025 Author

Uh oh!

al-niessner Oct 1, 2025 Maintainer

Uh oh!

rjwilson-LASP Oct 1, 2025 Author

Uh oh!

jordanpadams Oct 1, 2025 Maintainer

Uh oh!

rjwilson-LASP Oct 1, 2025 Author

Uh oh!

al-niessner Oct 1, 2025 Maintainer

Uh oh!

Uh oh!

rjwilson-LASP Oct 1, 2025 Author

Uh oh!

al-niessner Oct 1, 2025 Maintainer

Uh oh!

rjwilson-LASP Oct 1, 2025 Author

Uh oh!

al-niessner Oct 1, 2025 Maintainer

Uh oh!

rjwilson-LASP Oct 3, 2025 Author

Uh oh!

al-niessner Oct 3, 2025 Maintainer

Uh oh!

rchenatjpl Nov 19, 2025 Collaborator

Uh oh!

al-niessner Nov 19, 2025 Maintainer

Uh oh!

rjwilson-LASP Nov 20, 2025 Author

Uh oh!

al-niessner Nov 20, 2025 Maintainer

Uh oh!

rjwilson-LASP Nov 20, 2025 Author

Uh oh!

al-niessner Nov 20, 2025 Maintainer

rjwilson-LASP
Sep 29, 2025

Replies: 15 comments 5 replies

rjwilson-LASP
Sep 30, 2025
Author

jordanpadams
Sep 30, 2025
Maintainer

rjwilson-LASP
Sep 30, 2025
Author

al-niessner
Oct 1, 2025
Maintainer

rjwilson-LASP
Oct 1, 2025
Author

jordanpadams
Oct 1, 2025
Maintainer

rjwilson-LASP
Oct 1, 2025
Author

al-niessner
Oct 1, 2025
Maintainer

rjwilson-LASP
Oct 1, 2025
Author

al-niessner
Oct 1, 2025
Maintainer

rjwilson-LASP
Oct 1, 2025
Author

al-niessner
Oct 1, 2025
Maintainer

rjwilson-LASP
Oct 3, 2025
Author

al-niessner
Oct 3, 2025
Maintainer

rchenatjpl
Nov 19, 2025
Collaborator

al-niessner Nov 19, 2025
Maintainer

rjwilson-LASP Nov 20, 2025
Author

al-niessner Nov 20, 2025
Maintainer

rjwilson-LASP Nov 20, 2025
Author

al-niessner Nov 20, 2025
Maintainer