missing_constant treated differently to valid_minimim/valid_maximum #1387
-
|
The Validator tool gives me errors about the missing_constant value here, for a number in CSV files to 3 decimal places. The validator gives warnings every time the CSV has a -1.000 in that column. The validator is then happy, despite the CSV being unaltered with -1.000's in it. I've had similar issues with column formats of %10.3f, %9.4f and %8.6f, all with missing_constant values of -1 (to however many decimal places). That seems unexpected, and I'd prefer to keep all values here to the same precision. |
Beta Was this translation helpful? Give feedback.
Replies: 15 comments 5 replies
-
|
I get a similar sort of error with: Which the validator warning of: "Field has a value '-99999.999' that is less than the defined minimum value '1.0000000E-17'. " |
Beta Was this translation helpful? Give feedback.
-
|
@rjwilson-LASP what version of validate are you running? also, just curious, how did you create this ticket? It looks like it is missing an issue template, so curious how was created without one. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, am using 3.7.1 which gave both the above issues, but had 3.6.3 previously and was getting the first issue. (The second I only made and tried an XML file after I had 3.7.1). |
Beta Was this translation helpful? Give feedback.
-
|
@jordanpadams can we trim the white space of a special constant? My reading is no because it is supposed to be what the user types is literal truth - remember the special constant is not the same type as the cell nor is required to be. Example: fixed width column of 3 characters. missing constant is There are fixable and non fixable items in your statements. First, comparing special constants with values in a table or array are tricky because their respective types do not need to be the same according to the PDS documentation. The special constant -1 works but -1.000 does not. It is a floating point conversion problem. When validate asks Java to convert -1.000 from the special constants it gets a different bit pattern then when converting from the CSV. I have no idea why other than -1.000 does not have an exact floating point representation causing a == b to fail. When given -1, validate converts it to an integer allowing a == b in this case. Validate compares the special constants and values as strings before doing int and float conversions. By removing the white space in The same is happening with -9.9999999E+04. The imprecise float representation of that number makes it one bit away from -9999.999 and thus fails. If you changed the special constant to I understand that the white space in your XML is an attempt to make the XML more readable and removing it is not ideal. I also understand that your choice of how values are represented may have meaning as well -(-9.999999e+04 vs -9999.999). However, the problems you are experiencing are bound by the limitations of floating point quantization and to avoid those limitations changing the representations may be the only solution. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the thorough reply! So my summary, The Java code converts floats to bit patterns in an inconsistent way for the data file vs. values in the xml file, resulting in a different bit-pattern representation of the numbers. That seems to be a coding feature rather than a format issue, and to me it makes more sense to keep <missing_constant> in the same <field_format> as the data in the CSV file and live with the warnings out of validate. Thanks for the swift feedback! |
Beta Was this translation helpful? Give feedback.
-
|
@al-niessner you are correct. right now, the standards do not provide enough information for us to say we can safely remove whitespace from special constants. that being said, this is currently being worked by the PDS4 Data Design Working Group to loosen that portion of the standard. @rjwilson-LASP as @al-niessner, I believe the primary issue here is limitations of floating point quantization, not a coding feature. I believe this issue would arise in any software. @al-niessner may be able to shed some more light on this. |
Beta Was this translation helpful? Give feedback.
-
|
Since this is reopened (and thanks for continuing a potential resolution), @al-niessner wrote: "First, comparing special constants with values in a table or array are tricky because their respective types do not need to be the same according to the PDS documentation." Can I ask which documentation that's from? I see "Note that the value that is chosen for a special constant must be the right data type for the field." from bottom of page 90 of PDS4 Data Providers Handbook |
Beta Was this translation helpful? Give feedback.
-
|
Absolutely. The PDS4 Information Model Specification states: The keywords are "should be" and not "must be". Right data type does not necessarily mean matching data type? Another project uses 0x7FF80000 as their special constant which is floating point NaN. We have mechanisms for allowing specific bit patterns to be turned into floats, like the NaN. It is the only way to guarantee the desired quantization of a textual number to bits. Their data is also binary. |
Beta Was this translation helpful? Give feedback.
-
|
At the risk of beginning to sound like a lawyer, I agree should is not the same as shall, but the example you gave is for valid_maximum (same for valid_minimum), which I've had no issues with. My issue is with the missing_constant entry, and the definition of it is actually less detailed in that model: "The missing_constant attribute provides a value that indicates the original value was missing, such as due to a gap in coverage. " No mentions of should/shalls or data_type at all, so we could assume this is not a requirement to match data_type. One potential lead, in the missing_constant entry in the model, it has an entry of 'Conceptual Domain: Short_String', whereas valid_minimum/maximum definitions have no Conceptual Domain entry. Perhaps we've simply found an item that is not sufficiently defined. |
Beta Was this translation helpful? Give feedback.
-
|
lol - pretending to be a language lawyer is the worst part of my job Same document and far, far less clear: No should or shall or must anywhere. It can be any random thing like 'intentionally left blank'. I think if you tried to force data types onto special constants you would invite more problems than you would solve. The problem is in floating point representation and expecting bit equality. Here is a short example in Python (Java is even more suseptible): results in for me with Python 3.12: |
Beta Was this translation helpful? Give feedback.
-
|
Bits can be surprisingly annoying (much like dealing with time), even though they sound so simple. which returns: |
Beta Was this translation helpful? Give feedback.
-
|
And now back to my original comments and questions to @jordanpadams . Since the images - the textual representation of your numbers - do not match, they must be bit compared since the value is a number. Hence the suggestion about removing white space and changing -9.99999e4 to -99999.999 if that is used in the CSV. Hence the example about not being able to strip the constants of white space. You can strip the constants of their white space, of course, but validate cannot without potentially changing the meaning of the special constant. I get it. I really do. I bet, only because it seems common practice, that in your code you do not test x == -9.999999e4 but rather x < -99999 (equivalent to %10.3f comparing) since you never get close to -99999 anyway. Doing so erases all of those pesky bit problems. Validate cannot make such nice fixes because the generic case does allow for it - like the NaN stated earlier. Yes, there are plenty of code blobs that can be added for a bespoke solution to your exact problem but they break in the general case like NaN. Gotta love NaN. validate is constrained by not stripping the image of white space for comparison and using Java libraries for converting numbers like 9.99999e4. All of the constraints are fixed when the value image, which validate has, and the image of the special constant exactly as in the XML, which validate has, match erasing all of the bit problems and your warnings. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @al-niessner, I appreciate the back and forth, and my continued comments are merely intended as hopefully helpful feedback. And most of this is getting philosophical, we should be debating this over a coffee/beer, and if the chance ever happens, I'll be happy to buy the round. My philosophy on this is that for binary data files, I would like a bit-pattern comparison, but for ASCII data files I would like the comparison to be string based if a field_format code was provided. Validate does check that all my ASCII values match the field_format I stated in the label (which is great), and I would have wished for a similar check on the missing_constant value too. But I can understand why this gets complicated for a code that must work on all PDS4 file variants. My (and just my) issue now is that when I run validate, I get thousands of warnings due to the float format comparison failing on the missing_constant value, making it hard to see any warnings that aren't due to that. A few pipes to grep on the output will likely help there though with some trial and error. And with that, I suspect we're ready to close this out. Thanks for the replies and explanations! |
Beta Was this translation helpful? Give feedback.
-
|
Let me be bullet point it because we went of track somewhere. For missing constant -1.000:
For missing constant -9.999999e4:
|
Beta Was this translation helpful? Give feedback.
-
|
Hi, @al-niessner , I'm a little lost in the words. Is this behavior right: missing_constant gets compared as a string, and if its numerical value is below valid_minimum, that's still going to draw a warning? |
Beta Was this translation helpful? Give feedback.
Let me be bullet point it because we went of track somewhere.
For missing constant -1.000:
For missing constant -9.999999e4: