Skip to content

Fix next_up and next_down behavior for zero float values #16745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

liamzwbao
Copy link
Contributor

@liamzwbao liamzwbao commented Jul 11, 2025

Which issue does this PR close?

Rationale for this change

The root cause mentioned in the linked issue is caused by an invalid interval produced by satisfy_greater. For instance, calling satisfy_greater([0.0, 0.0], [-0.0, any], true) yields [0.0, 0.0], [-0.0, -ε] instead of the expected [0.0, 0.0], [-0.0, -0.0].

What changes are included in this PR?

As @berkaysynnada pointed out, the correct fix is to update next_up and next_down in rounding.rs, ensuring that next_up(-0.0) returns 0.0 and next_down(0.0) returns -0.0.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 11, 2025
@ozankabak
Copy link
Contributor

Thanks for taking a look at this.

A cursory look suggests when a strict inequality is being propagated, if the next value of other side's lower bound is greater than the upper bound, the propagation result should be "infeasible".

@berkaysynnada will help with this issue

@berkaysynnada
Copy link
Contributor

Currently, when computing the next representable float from +0.0 or -0.0, the behavior incorrectly skips directly to the smallest subnormal (±ε) instead of transitioning between -0.0 and +0.0. For example, next_down(+0.0) returns -ε, but we expect it to return -0.0. Similarly, next_up(-0.0) returns +ε, but we expect it to return +0.0.

This causes intervals like [-0.0, -ε] instead of the expected [-0.0, -0.0]. In ScalarValue comparisons we already treat -0.0 and +0.0 as NOT equal, but the rounding logic was skipping over them and jumping directly to subnormals.

To fix this, I locally updated next_up and next_down to handle ±0.0 explicitly. In next_up, if the input is -0.0, it now returns +0.0 instead of +ε. In next_down, if the input is +0.0, it now returns -0.0 instead of -ε. All other cases remain as they were. This keeps the fix localized to the specific ±0.0 boundary without unnecessarily affecting the general behavior of interval arithmetic logic.

pub fn next_up<F: FloatBits + Copy>(float: F) -> F {
    let bits = float.to_bits();
    if float.float_is_nan() || bits == F::infinity().to_bits() {
        return float;
    }

    // Special case: -0.0 → +0.0
    if bits == F::NEG_ZERO {
        return F::from_bits(F::ZERO);
    }
    ...
pub fn next_down<F: FloatBits + Copy>(float: F) -> F {
    let bits = float.to_bits();
    if float.float_is_nan() || bits == F::neg_infinity().to_bits() {
        return float;
    }
    
    // Special case: +0.0 → -0.0
    if bits == F::ZERO {
        return F::from_bits(F::NEG_ZERO);
    }
    ...

With these changes, the interval calculations now respect the special ±0.0 representations before moving into the subnormal range. This aligns the rounding behavior with how ScalarValue comparisons already work and avoids producing unexpected intervals.

@liamzwbao liamzwbao force-pushed the issue-16736-interval-arithmetic branch from def4520 to 7074029 Compare July 12, 2025 15:14
@github-actions github-actions bot added the common Related to common crate label Jul 12, 2025
@berkaysynnada
Copy link
Contributor

BTW, maybe we should modify how floating point ScalarValue's are compared (

(Some(f1), Some(f2)) => Some(f1.total_cmp(f2)),
). IDK why are we following total order convention, maybe @comphead have an idea, who is the original author of that change

@liamzwbao
Copy link
Contributor Author

Thank you for pointing that out, @berkaysynnada and @ozankabak! The fix in next_up and next_down makes more sense.

As for the total_cmp changes, that was originally raised by @comphead.

@liamzwbao liamzwbao marked this pull request as ready for review July 12, 2025 16:11
@liamzwbao liamzwbao changed the title Fix invalid intervals in satisfy_greater Fix next_up and next_down behavior for zero float values Jul 12, 2025
@berkaysynnada
Copy link
Contributor

Thank you for pointing that out, @berkaysynnada and @ozankabak! The fix in next_up and next_down makes more sense.

As for the total_cmp changes, that was originally raised by @comphead.

If this issue is not urgent for you, let's wait for a few days. This is not a trivial change, and we need to consider all consequences of the given decision. I need to do some readings and investigate how other engine/platforms behave.

@berkaysynnada
Copy link
Contributor

Hi again @liamzwbao. We’ve discussed this with @ozankabak, and the actual fix should be on the PartialOrd implementation of ScalarValue of floats. The comparison currently uses total_ord, but I don’t think there’s a valid reason for that. We should revert it to use partial_ord instead. After that, the interval_arithmetic code won’t need any modifications

@findepi
Copy link
Member

findepi commented Jul 23, 2025

The SQL ordering of float values clearly distinguishes between 0 and -0 and is a total ordering.

$ cargo run --bin datafusion-cli
> SELECT t, t::float AS f from (values ('1'), ('-1'), ('0'), ('-0'), ('0'), ('-0'), ('Inf'), ('-Inf'), ('nan')) _(t) ORDER BY f;
+------+------+
| t    | f    |
+------+------+
| -Inf | -inf |
| -1   | -1.0 |
| -0   | -0.0 |
| -0   | -0.0 |
| 0    | 0.0  |
| 0    | 0.0  |
| 1    | 1.0  |
| Inf  | inf  |
| nan  | NaN  |
+------+------+

The ScalarValue comparison must match that done in SQL.
The next_up and next_down should match that as well.

@berkaysynnada
Copy link
Contributor

What you give as an example is correct but missing. In SQL "ORDER BY" produces a total order and distinguishes -0.0 from +0.0. However, SQL comparisons follow IEEE 754 ordering semantics, so -0.0 == 0.0 is true, and both -0.0 < 0.0 and -0.0 > 0.0 are false.

So, for ScalarValue, PartialOrd should align with SQL’s comparison semantics (-0.0 == 0.0) and use partial_cmp.
The total ordering used for "ORDER BY" can still rely on total_cmp explicitly, without leaking into PartialOrd. (now I see that SQL’s "ORDER BY" breaks ties based on bits)

@ozankabak
Copy link
Contributor

Well explained, @berkaysynnada. It seems like the ScalarValue code is deferring to total_cmp in the wrong place.

@findepi
Copy link
Member

findepi commented Jul 23, 2025

You're absolutely right. The ORDER BY ordering and < operator are not the same thing.
The ORDER BY places NaN as higher than +inf, while < operator should likely return false for any < comparison involving NaN (same for =).

However, SQL comparisons follow IEEE 754 ordering semantics, so -0.0 == 0.0 is true, and both -0.0 < 0.0 and -0.0 > 0.0 are false.

Not sure whether you mean SQL spec, or what's implemented in DataFusion, or what DataFusion should be implementing?

What's currently implemented seems to be this:

> WITH fs AS (SELECT t::float AS f FROM (values ('0'), ('-0'), ('NaN')) _(t))
SELECT f1, f2, f1 = f2, f1 < f2, f2 < f1
FROM fs _(f1), fs _(f2);
+------+------+-------------+-------------+-------------+
| f1   | f2   | _.f1 = _.f2 | _.f1 < _.f2 | _.f2 < _.f1 |
+------+------+-------------+-------------+-------------+
| 0.0  | 0.0  | true        | false       | false       |
| 0.0  | -0.0 | false       | false       | true        | -- apparently 0 compares greater than -0
| 0.0  | NaN  | false       | true        | false       | -- apparently 0 compares less than NaN
| -0.0 | 0.0  | false       | true        | false       | -- apparently -0 compares less than 0
| -0.0 | -0.0 | true        | false       | false       |
| -0.0 | NaN  | false       | true        | false       | 
| NaN  | 0.0  | false       | false       | true        |
| NaN  | -0.0 | false       | false       | true        |
| NaN  | NaN  | true        | false       | false       | -- Is NaN = NaN following IEEE 754? at least in C it's false
+------+------+-------------+-------------+-------------+
9 row(s) fetched.
Elapsed 0.016 seconds.

@berkaysynnada
Copy link
Contributor

Not sure whether you mean SQL spec, or what's implemented in DataFusion, or what DataFusion should be implementing?

I mean SQL spec and what DataFusion should be implementing.

 +------+------+-------------+-------------+-------------+
| f1   | f2   | _.f1 = _.f2 | _.f1 < _.f2 | _.f2 < _.f1 |
+------+------+-------------+-------------+-------------+
...
| 0.0  | -0.0 | false       | false       | true        | -- apparently 0 compares greater than -0
...
| -0.0 | 0.0  | false       | true        | false       | -- apparently -0 compares less than 0
...
+------+------+-------------+-------------+-------------+

This is what's implemented in DataFusion, and conflicts with SQL spec

@findepi
Copy link
Member

findepi commented Jul 23, 2025

I mean SQL spec and what DataFusion should be implementing.

I wish DataFusion followed SQL spec in everything, but that's not the project design philosophy AFAICT.
That was supposed to be made more explicit in #13704 / @alamb's #13706

if we want to follow the SQL spec, you have my full support, but I'd encourage codifying it in a form of a referencible documentation.

BTW, last time I checked SQL spec didn't know about NaN values and I don't think it really distinguishes between positive or negative zeros.

following #13706 proposal, here is behavior check with PostgreSQL
it clearly compares negative and positive zero as equal in both = and < operators:

postgres=# WITH fs AS (SELECT t::float AS f FROM (values ('0'), ('-0'), ('NaN')) _(t))
SELECT f1, f2, f1 = f2, f1 < f2, f2 < f1
FROM fs t(f1), fs u(f2);
 f1  | f2  | ?column? | ?column? | ?column?
-----+-----+----------+----------+----------
   0 |   0 | t        | f        | f
   0 |  -0 | t        | f        | f
   0 | NaN | f        | t        | f
  -0 |   0 | t        | f        | f
  -0 |  -0 | t        | f        | f
  -0 | NaN | f        | t        | f
 NaN |   0 | f        | f        | t
 NaN |  -0 | f        | f        | t
 NaN | NaN | t        | f        | f
(9 rows)

here is the output from Trino
it clearly compares negative and positive zero as equal in both = and < operators
it also follows IEEE 754 for comparisons involving NaN (returning false even for "NaN = NaN")

trino> WITH fs AS (SELECT CAST(t AS real) AS f FROM (values ('0'), ('-0'), ('NaN')) _(t))
    -> SELECT f1, f2, f1 = f2, f1 < f2, f2 < f1
    -> FROM fs _(f1), fs _(f2);
  f1  |  f2  | _col2 | _col3 | _col4
------+------+-------+-------+-------
  0.0 |  0.0 | true  | false | false
 -0.0 |  0.0 | true  | false | false
  NaN |  0.0 | false | false | false
  0.0 | -0.0 | true  | false | false
 -0.0 | -0.0 | true  | false | false
  NaN | -0.0 | false | false | false
  0.0 |  NaN | false | false | false
 -0.0 |  NaN | false | false | false
  NaN |  NaN | false | false | false
(9 rows)

@findepi
Copy link
Member

findepi commented Jul 23, 2025

Again, if we want to follow the SQL spec, you have my support, but it won't bring answers to what actual float equality, comparison and ordering for float values should be.

Following IEEE 754 makes sense to me. For that purpose we should be cross checking with Trino (and not with PostgreSQL!). It really goes far beyond this single PR, so @berkaysynnada can you please put relevant wording about IEEE 754 in the docs somewhere, so that we set the direction once and then execute on it? I think this is important enough to go through mailing list.

@liamzwbao
Copy link
Contributor Author

Thanks for your insights, @berkaysynnada @findepi! It seems we haven't reached a consensus yet. Should I proceed with the PartialOrd fix, or wait until we have clearer doc to align on?

@tustvold
Copy link
Contributor

Copying here for visibility - #13704 (comment)

@ozankabak
Copy link
Contributor

ozankabak commented Jul 24, 2025

Since we do not yet fully understand what transitioning to partial ordering will entail (and we may not even want to do it, at the end), I think the best path forward is to go back to @berkaysynnada's original suggestion, which was to fix next_up and next_down. We can figure out what to do with float comparison semantics later on.

cc @findepi

He will respond shortly with our final suggestion.

@berkaysynnada
Copy link
Contributor

@liamzwbao can you please cherrypick this commit
main...synnada-ai:datafusion-upstream:next-up-down
I believe there would be no objections to this change, as we're adopting the total order convention in interval arithmetic.
It should also address your case where satisfy_greater([0.0, 0.0], [-0.0, any], true) results in [-0.0, -0.0] now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filtering and counting afterwards causes overflow panic in interval arithmetics
5 participants