Fix `next_up` and `next_down` behavior for zero float values #16745

liamzwbao · 2025-07-11T02:47:44Z

Which issue does this PR close?

Closes Filtering and counting afterwards causes overflow panic in interval arithmetics #16736.

Rationale for this change

The root cause mentioned in the linked issue is caused by an invalid interval produced by satisfy_greater. For instance, calling satisfy_greater([0.0, 0.0], [-0.0, any], true) yields [0.0, 0.0], [-0.0, -ε] instead of the expected [0.0, 0.0], [-0.0, -0.0].

What changes are included in this PR?

As @berkaysynnada pointed out, the correct fix is to update next_up and next_down in rounding.rs, ensuring that next_up(-0.0) returns 0.0 and next_down(0.0) returns -0.0.

Are these changes tested?

Yes

Are there any user-facing changes?

No

ozankabak · 2025-07-11T16:31:27Z

Thanks for taking a look at this.

A cursory look suggests when a strict inequality is being propagated, if the next value of other side's lower bound is greater than the upper bound, the propagation result should be "infeasible".

@berkaysynnada will help with this issue

berkaysynnada · 2025-07-12T10:17:34Z

Currently, when computing the next representable float from +0.0 or -0.0, the behavior incorrectly skips directly to the smallest subnormal (±ε) instead of transitioning between -0.0 and +0.0. For example, next_down(+0.0) returns -ε, but we expect it to return -0.0. Similarly, next_up(-0.0) returns +ε, but we expect it to return +0.0.

This causes intervals like [-0.0, -ε] instead of the expected [-0.0, -0.0]. In ScalarValue comparisons we already treat -0.0 and +0.0 as NOT equal, but the rounding logic was skipping over them and jumping directly to subnormals.

To fix this, I locally updated next_up and next_down to handle ±0.0 explicitly. In next_up, if the input is -0.0, it now returns +0.0 instead of +ε. In next_down, if the input is +0.0, it now returns -0.0 instead of -ε. All other cases remain as they were. This keeps the fix localized to the specific ±0.0 boundary without unnecessarily affecting the general behavior of interval arithmetic logic.

pub fn next_up<F: FloatBits + Copy>(float: F) -> F {
    let bits = float.to_bits();
    if float.float_is_nan() || bits == F::infinity().to_bits() {
        return float;
    }

    // Special case: -0.0 → +0.0
    if bits == F::NEG_ZERO {
        return F::from_bits(F::ZERO);
    }
    ...

pub fn next_down<F: FloatBits + Copy>(float: F) -> F {
    let bits = float.to_bits();
    if float.float_is_nan() || bits == F::neg_infinity().to_bits() {
        return float;
    }
    
    // Special case: +0.0 → -0.0
    if bits == F::ZERO {
        return F::from_bits(F::NEG_ZERO);
    }
    ...

With these changes, the interval calculations now respect the special ±0.0 representations before moving into the subnormal range. This aligns the rounding behavior with how ScalarValue comparisons already work and avoids producing unexpected intervals.

berkaysynnada · 2025-07-12T15:57:53Z

BTW, maybe we should modify how floating point ScalarValue's are compared (

datafusion/datafusion/common/src/scalar/mod.rs

Line 484 in ce3f62a

(Some(f1), Some(f2)) => Some(f1.total_cmp(f2)),

). IDK why are we following total order convention, maybe @comphead have an idea, who is the original author of that change

liamzwbao · 2025-07-12T16:10:14Z

Thank you for pointing that out, @berkaysynnada and @ozankabak! The fix in next_up and next_down makes more sense.

As for the total_cmp changes, that was originally raised by @comphead.

berkaysynnada · 2025-07-12T16:43:02Z

Thank you for pointing that out, @berkaysynnada and @ozankabak! The fix in next_up and next_down makes more sense.

As for the total_cmp changes, that was originally raised by @comphead.

If this issue is not urgent for you, let's wait for a few days. This is not a trivial change, and we need to consider all consequences of the given decision. I need to do some readings and investigate how other engine/platforms behave.

berkaysynnada · 2025-07-23T06:19:51Z

Hi again @liamzwbao. We’ve discussed this with @ozankabak, and the actual fix should be on the PartialOrd implementation of ScalarValue of floats. The comparison currently uses total_ord, but I don’t think there’s a valid reason for that. We should revert it to use partial_ord instead. After that, the interval_arithmetic code won’t need any modifications

findepi · 2025-07-23T13:40:34Z

The SQL ordering of float values clearly distinguishes between 0 and -0 and is a total ordering.

$ cargo run --bin datafusion-cli
> SELECT t, t::float AS f from (values ('1'), ('-1'), ('0'), ('-0'), ('0'), ('-0'), ('Inf'), ('-Inf'), ('nan')) _(t) ORDER BY f;
+------+------+
| t    | f    |
+------+------+
| -Inf | -inf |
| -1   | -1.0 |
| -0   | -0.0 |
| -0   | -0.0 |
| 0    | 0.0  |
| 0    | 0.0  |
| 1    | 1.0  |
| Inf  | inf  |
| nan  | NaN  |
+------+------+

The ScalarValue comparison must match that done in SQL.
The next_up and next_down should match that as well.

berkaysynnada · 2025-07-23T14:12:27Z

What you give as an example is correct but missing. In SQL "ORDER BY" produces a total order and distinguishes -0.0 from +0.0. However, SQL comparisons follow IEEE 754 ordering semantics, so -0.0 == 0.0 is true, and both -0.0 < 0.0 and -0.0 > 0.0 are false.

So, for ScalarValue, PartialOrd should align with SQL’s comparison semantics (-0.0 == 0.0) and use partial_cmp.
The total ordering used for "ORDER BY" can still rely on total_cmp explicitly, without leaking into PartialOrd. (now I see that SQL’s "ORDER BY" breaks ties based on bits)

ozankabak · 2025-07-23T14:31:29Z

Well explained, @berkaysynnada. It seems like the ScalarValue code is deferring to total_cmp in the wrong place.

findepi · 2025-07-23T14:37:34Z

You're absolutely right. The ORDER BY ordering and < operator are not the same thing.
The ORDER BY places NaN as higher than +inf, while < operator should likely return false for any < comparison involving NaN (same for =).

However, SQL comparisons follow IEEE 754 ordering semantics, so -0.0 == 0.0 is true, and both -0.0 < 0.0 and -0.0 > 0.0 are false.

Not sure whether you mean SQL spec, or what's implemented in DataFusion, or what DataFusion should be implementing?

What's currently implemented seems to be this:

> WITH fs AS (SELECT t::float AS f FROM (values ('0'), ('-0'), ('NaN')) _(t))
SELECT f1, f2, f1 = f2, f1 < f2, f2 < f1
FROM fs _(f1), fs _(f2);
+------+------+-------------+-------------+-------------+
| f1   | f2   | _.f1 = _.f2 | _.f1 < _.f2 | _.f2 < _.f1 |
+------+------+-------------+-------------+-------------+
| 0.0  | 0.0  | true        | false       | false       |
| 0.0  | -0.0 | false       | false       | true        | -- apparently 0 compares greater than -0
| 0.0  | NaN  | false       | true        | false       | -- apparently 0 compares less than NaN
| -0.0 | 0.0  | false       | true        | false       | -- apparently -0 compares less than 0
| -0.0 | -0.0 | true        | false       | false       |
| -0.0 | NaN  | false       | true        | false       | 
| NaN  | 0.0  | false       | false       | true        |
| NaN  | -0.0 | false       | false       | true        |
| NaN  | NaN  | true        | false       | false       | -- Is NaN = NaN following IEEE 754? at least in C it's false
+------+------+-------------+-------------+-------------+
9 row(s) fetched.
Elapsed 0.016 seconds.

berkaysynnada · 2025-07-23T14:41:48Z

Not sure whether you mean SQL spec, or what's implemented in DataFusion, or what DataFusion should be implementing?

I mean SQL spec and what DataFusion should be implementing.

 +------+------+-------------+-------------+-------------+
| f1   | f2   | _.f1 = _.f2 | _.f1 < _.f2 | _.f2 < _.f1 |
+------+------+-------------+-------------+-------------+
...
| 0.0  | -0.0 | false       | false       | true        | -- apparently 0 compares greater than -0
...
| -0.0 | 0.0  | false       | true        | false       | -- apparently -0 compares less than 0
...
+------+------+-------------+-------------+-------------+

This is what's implemented in DataFusion, and conflicts with SQL spec

findepi · 2025-07-23T15:04:48Z

I mean SQL spec and what DataFusion should be implementing.

I wish DataFusion followed SQL spec in everything, but that's not the project design philosophy AFAICT.
That was supposed to be made more explicit in #13704 / @alamb's #13706

if we want to follow the SQL spec, you have my full support, but I'd encourage codifying it in a form of a referencible documentation.

BTW, last time I checked SQL spec didn't know about NaN values and I don't think it really distinguishes between positive or negative zeros.

following #13706 proposal, here is behavior check with PostgreSQL
it clearly compares negative and positive zero as equal in both = and < operators:

postgres=# WITH fs AS (SELECT t::float AS f FROM (values ('0'), ('-0'), ('NaN')) _(t))
SELECT f1, f2, f1 = f2, f1 < f2, f2 < f1
FROM fs t(f1), fs u(f2);
 f1  | f2  | ?column? | ?column? | ?column?
-----+-----+----------+----------+----------
   0 |   0 | t        | f        | f
   0 |  -0 | t        | f        | f
   0 | NaN | f        | t        | f
  -0 |   0 | t        | f        | f
  -0 |  -0 | t        | f        | f
  -0 | NaN | f        | t        | f
 NaN |   0 | f        | f        | t
 NaN |  -0 | f        | f        | t
 NaN | NaN | t        | f        | f
(9 rows)

here is the output from Trino
it clearly compares negative and positive zero as equal in both = and < operators
it also follows IEEE 754 for comparisons involving NaN (returning false even for "NaN = NaN")

trino> WITH fs AS (SELECT CAST(t AS real) AS f FROM (values ('0'), ('-0'), ('NaN')) _(t))
    -> SELECT f1, f2, f1 = f2, f1 < f2, f2 < f1
    -> FROM fs _(f1), fs _(f2);
  f1  |  f2  | _col2 | _col3 | _col4
------+------+-------+-------+-------
  0.0 |  0.0 | true  | false | false
 -0.0 |  0.0 | true  | false | false
  NaN |  0.0 | false | false | false
  0.0 | -0.0 | true  | false | false
 -0.0 | -0.0 | true  | false | false
  NaN | -0.0 | false | false | false
  0.0 |  NaN | false | false | false
 -0.0 |  NaN | false | false | false
  NaN |  NaN | false | false | false
(9 rows)

findepi · 2025-07-23T15:07:18Z

Again, if we want to follow the SQL spec, you have my support, but it won't bring answers to what actual float equality, comparison and ordering for float values should be.

Following IEEE 754 makes sense to me. For that purpose we should be cross checking with Trino (and not with PostgreSQL!). It really goes far beyond this single PR, so @berkaysynnada can you please put relevant wording about IEEE 754 in the docs somewhere, so that we set the direction once and then execute on it? I think this is important enough to go through mailing list.

liamzwbao · 2025-07-23T20:46:38Z

Thanks for your insights, @berkaysynnada @findepi! It seems we haven't reached a consensus yet. Should I proceed with the PartialOrd fix, or wait until we have clearer doc to align on?

tustvold · 2025-07-23T22:11:05Z

Copying here for visibility - #13704 (comment)

ozankabak · 2025-07-24T13:05:16Z

Since we do not yet fully understand what transitioning to partial ordering will entail (and we may not even want to do it, at the end), I think the best path forward is to go back to @berkaysynnada's original suggestion, which was to fix next_up and next_down. We can figure out what to do with float comparison semantics later on.

cc @findepi

He will respond shortly with our final suggestion.

berkaysynnada · 2025-07-24T13:15:36Z

@liamzwbao can you please cherrypick this commit
main...synnada-ai:datafusion-upstream:next-up-down
I believe there would be no objections to this change, as we're adopting the total order convention in interval arithmetic.
It should also address your case where satisfy_greater([0.0, 0.0], [-0.0, any], true) results in [-0.0, -0.0] now.

github-actions bot added the logical-expr Logical plan and expressions label Jul 11, 2025

liamzwbao added 2 commits July 12, 2025 11:14

Fix invalid intervals in satisfy_greater

855e776

Comparison between None and Some

7074029

liamzwbao force-pushed the issue-16736-interval-arithmetic branch from def4520 to 7074029 Compare July 12, 2025 15:14

Apply changes in next_up and next_down

c47543b

github-actions bot added the common Related to common crate label Jul 12, 2025

liamzwbao marked this pull request as ready for review July 12, 2025 16:11

liamzwbao changed the title ~~Fix invalid intervals in satisfy_greater~~ Fix next_up and next_down behavior for zero float values Jul 12, 2025

Merge branch 'main' into issue-16736-interval-arithmetic

089ddfc

findepi mentioned this pull request Jul 23, 2025

Document the SQL dialect DataFusion attempts to follow #13704

Open

Fix next_up and next_down behavior for zero float values #16745

Are you sure you want to change the base?

Fix next_up and next_down behavior for zero float values #16745

Uh oh!

Conversation

liamzwbao commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

ozankabak commented Jul 11, 2025

Uh oh!

berkaysynnada commented Jul 12, 2025

Uh oh!

berkaysynnada commented Jul 12, 2025

Uh oh!

liamzwbao commented Jul 12, 2025

Uh oh!

berkaysynnada commented Jul 12, 2025

Uh oh!

berkaysynnada commented Jul 23, 2025

Uh oh!

findepi commented Jul 23, 2025

Uh oh!

berkaysynnada commented Jul 23, 2025

Uh oh!

ozankabak commented Jul 23, 2025

Uh oh!

findepi commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

berkaysynnada commented Jul 23, 2025

Uh oh!

findepi commented Jul 23, 2025

Uh oh!

findepi commented Jul 23, 2025

Uh oh!

liamzwbao commented Jul 23, 2025

Uh oh!

tustvold commented Jul 23, 2025

Uh oh!

ozankabak commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

berkaysynnada commented Jul 24, 2025

Uh oh!

Uh oh!

Fix `next_up` and `next_down` behavior for zero float values #16745

Fix `next_up` and `next_down` behavior for zero float values #16745

liamzwbao commented Jul 11, 2025 •

edited

Loading

findepi commented Jul 23, 2025 •

edited

Loading

ozankabak commented Jul 24, 2025 •

edited

Loading