Fixed #1061 failure in snippet unit tests due to the instability of `np.sum` for array with many small floating point numbers #1080

NimaSarajpoor · 2025-04-09T04:14:56Z

See issue #1061

codecov · 2025-04-09T04:59:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.63%. Comparing base (423c679) to head (f8bf7b6).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1080   +/-   ##
=======================================
  Coverage   96.62%   96.63%           
=======================================
  Files          93       93           
  Lines       15376    15410   +34     
=======================================
+ Hits        14857    14891   +34     
  Misses        519      519

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ond attempt with disabled jit

seanlaw

@NimaSarajpoor Should the unit tests be failing?

seanlaw · 2025-04-09T11:07:14Z

tests/test_precision.py

+
+    # This test raises an error if arithmetic operation in ...
+    # ... `gpu_stump._compute_and_update_PI_kernel` does not
+    # generates the same result if values of variable for mean and std


"same values AS"

NimaSarajpoor · 2025-04-09T15:18:39Z

@seanlaw

Should the unit tests be failing?

For the last two commits, I expected all tests to pass.. My expectation was wrong as I had tested it locally in a different environment.

I was able to re-produce the error raised by macOS latest, python 3.9 We are facing the same issue, i.e. the loss of precision in np.sum, again; BUT, this time, it is for the performant version (we are using np.sum in the performant code).

seanlaw · 2025-04-09T19:44:36Z

I was able to re-produce the error raised by macOS latest, python 3.9 We are facing the same issue, i.e. the loss of precision in np.sum, again; BUT, this time, it is for the performant version (we are using np.sum in the performant code).

I don't understand. How is np.sum in the performant version different from np.sum being used in naive.py? If they are "different" then can we implement a naive naive_np_sum function that mimics the same calculations as the real np.sum?

At the end of the day, what matters is that the answers are the same and not that the performant results are of ultra-high precision

NimaSarajpoor · 2025-04-10T05:00:05Z

How is np.sum in the performant version different from np.sum being used in naive.py?

np.sum is not different. The array that is passed as argument to np.sum, is different. And, depending on that array, np.sum sometimes shows its bad behaviour sometime it does not.

A SIMPLE EXAMPLE TO BETTER UNDERSTAND THE PROBLEM
Suppose that, in naive version, there are two arrays, A and A'. We want to compare "SUM(A)" and "SUM(A')", and based on that choose the snippet index and its corresponding subsequence. For the sake of simplicity, let's say there is the list lst = [A, A'], and I want to choose the index that corresponds to array with lowest sum. In our particular rare case, the arrays A and A' have the same elements but the order of elements is just different. So, the expectation is to get index 0 as the result. But, we recently realized that np.sum can be unstable, and we may get np.sum(A) > np.sum(A'), and that gives index 1 as the output. As you pointed out, we do not need to be worried as long as the performant version shows the same behaviour. Here is the tricky part though... the arrays A and A' in the performant version can be different than their naive version. These two arrays in the performant version have the same elements and np.sum might give correct output for those arrays, i.e. we might get np.sum(A) == np.sum(A') in the performant version.

A REAL EXAMPLE
To better understand this, we can take a look at this example:

# input
seed = 1615
np.random.seed(seed)
T = np.random.uniform(-1000.0, 1000.0, 64)

m = 10
s = 3
k = 3

mpdist_T_subseq_isconstant = functools.partial(
    naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
)

# other params with default
percentage = 1.0
mpdist_percentage = 0.05
mpdist_k = None

I would like to check the 2D array np.minimum(D, Q) in the last iteration of this for-loop in snippet algo. I would like to check the rows at indices 1 and 3 (let's call them arrays a and b) and answer the following questions:

(1) Do `a` and `b` have the same elements?
(2) With `math.fsum`: SUM(a)==SUM(b)?
(3) With `np.sum`: SUM(a)==SUM(b)?

And I want to answer those questions for three cases naive, performant, and performant with disabled JIT. You can see the result below for two different environments, followed by the code.

# macOS
# env1: python 3.9, numpy 2.0.2, numba 0.60.0

==================================================
naive
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False
==================================================
performant
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False
==================================================
performant with disabled JIT
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False

and, for a different environment:

# macOS
# env2: python 3.13, numpy 2.1.3, numba 0.61.0

==================================================
naive
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False
==================================================
performant
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True
==================================================
performant with disabled JIT
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True

Observation: In env1, np.sum shows its instability for both naive and performant . In env2, np.sum shows its instability in the naive version. So, to avoid that, we can use math.fsum for naive (which. is what we did in latest commit). However, this now results in failure in env1. As shown above, we can use math.fsum in performant as well. Haven't checked yet to see how much it is slower than np.sum, and how much it affects the performance of snippet.

Code

import functools
import math

import naive
import numba
import numpy as np

from stumpy.snippets import _get_all_profiles


def print_info(D, m, k, sum_func):
    Q = np.full(D.shape[-1], np.inf, dtype=np.float64)
    indices = np.arange(D.shape[0], dtype=np.int64) * m
    for i in range(k):
        min_DQ = np.minimum(D, Q)
        sum_min_DQ = sum_func(min_DQ, axis=1)
        profile_areas = sum_min_DQ
        idx = np.argmin(profile_areas)    
        Q[:] = np.minimum(D[idx], Q)

        snippet_index = indices[idx]

    # check min_DQ in latest iteration
    a = min_DQ[1]  # corresponds to snippet index 10
    b = min_DQ[3]  # corresponds to snippet index 30

    print('Do a and b have the same elements? ', np.all(np.sort(a) == np.sort(b)))
    print('With math.fsum: SUM(a)==SUM(b)? ', math.fsum(a) == math.fsum(b))
    print('With np.sum: SUM(a)==SUM(b)? ', np.sum(a) == np.sum(b))

    return


def check_snippets_rare_case():
    seed = 1615
    np.random.seed(seed)
    T = np.random.uniform(-1000.0, 1000.0, 64)

    m = 10
    s = 3
    k = 3

    mpdist_T_subseq_isconstant = functools.partial(
        naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
    )

    percentage = 1.0
    mpdist_percentage = 0.05
    mpdist_k = None
    
    # case: naive
    print("=" * 50)
    print("naive")
    D = naive.get_all_mpdist_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )

    print_info(D, m, k, sum_func=np.sum)


    # case: performant
    print("=" * 50)
    print("performant")
    D = _get_all_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )

    print_info(D, m, k, sum_func=np.sum)

    # case: performant with disabled JIT 
    print("=" * 50)
    print("performant with disabled JIT")
    numba.config.DISABLE_JIT = True
    D = _get_all_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )

    print_info(D, m, k, sum_func=np.sum)

    return 


if __name__ == "__main__":
    check_snippets_rare_case()

seanlaw · 2025-04-10T11:08:50Z

@NimaSarajpoor Maybe this is a stupid question/comment but I noticed that you generate the distance profiles using naive.get_all_mpdist_profiles for the naive version and then _get_all_profiles for the performant and JIT disabled versions. When I force all versions to use D = _get_all_profiles (i.e., I precompute D once and only once and then I reuse it everywhere then the results are consistent):

import functools
import math

import naive
import numba
import numpy as np

from stumpy.snippets import _get_all_profiles


def print_info(D, m, k, sum_func):
    Q = np.full(D.shape[-1], np.inf, dtype=np.float64)
    indices = np.arange(D.shape[0], dtype=np.int64) * m
    for i in range(k):
        min_DQ = np.minimum(D, Q)
        sum_min_DQ = sum_func(min_DQ, axis=1)
        profile_areas = sum_min_DQ
        idx = np.argmin(profile_areas)    
        Q[:] = np.minimum(D[idx], Q)

        snippet_index = indices[idx]

    # check min_DQ in latest iteration
    a = min_DQ[1]  # corresponds to snippet index 10
    b = min_DQ[3]  # corresponds to snippet index 30

    print('Do a and b have the same elements? ', np.all(np.sort(a) == np.sort(b)))
    print('With math.fsum: SUM(a)==SUM(b)? ', math.fsum(a) == math.fsum(b))
    print('With np.sum: SUM(a)==SUM(b)? ', np.sum(a) == np.sum(b))

    return


def check_snippets_rare_case():
    seed = 1615
    np.random.seed(seed)
    T = np.random.uniform(-1000.0, 1000.0, 64)

    m = 10
    s = 3
    k = 3

    mpdist_T_subseq_isconstant = functools.partial(
        naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
    )

    percentage = 1.0
    mpdist_percentage = 0.05
    mpdist_k = None

    D = _get_all_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )
    
    # case: naive
    print("=" * 50)
    print("naive")
    print_info(D.copy(), m, k, sum_func=np.sum)


    # case: performant
    print("=" * 50)
    print("performant")
    print_info(D.copy(), m, k, sum_func=np.sum)

    # case: performant with disabled JIT 
    print("=" * 50)
    print("performant with disabled JIT")
    numba.config.DISABLE_JIT = True
    print_info(D.copy(), m, k, sum_func=np.sum)

    return 


if __name__ == "__main__":
    check_snippets_rare_case()

Output:

==================================================
naive
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True
==================================================
performant
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True
==================================================
performant with disabled JIT
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True

Based on this, I am inclined to believe that as long as we can make naive.get_all_mpdist_profiles match the output of _get_all_profiles then we might be good?

Let me know what you think.

NimaSarajpoor · 2025-04-10T15:04:51Z

You are correct. This should resolve the MAIN root cause.

One approach is to break down the snippet function and creates a callee that accepts "D" as input (as a side note: this logic can also mean a user can provide their own D to compute snippet??). Then, we can compute D in both naive and performant, and if the values are all close , then we pass only one of them to the callees in both naive and performant versions. What do you suggest?

seanlaw · 2025-04-10T17:18:54Z

I think that the correct thing to do is to understand why the naive.get_all_mpdist_profiles and _get_all_profiles produce different results and then to fix the naive version. Then it shouldn't be a precision issue.

NimaSarajpoor · 2025-04-19T21:45:47Z

@seanlaw

I think that the correct thing to do is to understand why the naive.get_all_mpdist_profiles and _get_all_profiles produce different results and then to fix the naive version. Then it shouldn't be a precision issue.

Let D_ref and D_comp denote, respectively, the output of naive.get_all_mpdist_profiles, and the output of _get_all_profiles.

IIUC the goal is to understand the main cause of having different values inD_ref[i, j] and D_comp[i, j], and by resolving that, we can get EXACTLY the same output. Since the performant version not only has a different algo (implementation) but also is njit-decorated, I think I should follow the two following steps:

(1) First, disable JIT and see if the values are matching. If not, we can try to resolve it.
(2) Second, after resolving (1), enable JIT and see if the values are matching. If not, we can try to resolve it.

Please let me know if I misunderstood your point.

seanlaw · 2025-04-20T00:54:18Z

Please let me know if I misunderstood your point.

I think you've captured it though I wouldn't focus on the changing the performant/NJIT version. If you recall, when we use to compare our performant stump algo with the naive version and the order of indices were different because, fundamentally, the order of operations was different enough in the naive version. Nonetheless, you eventually implemented a naive version that then followed and ensured the same traversal order as the performant version.

My suspicion is that the difference between naive.get_all_mpdist_profiles and _get_all_profiles may be due to something similar (i.e., some slight algorithmic deviation in the naive version. Both naive and performant functions also call an underlying mpdist_vect function, which may also produced different results and so we should fix the discrepancies at the root of it (by modifying the naive version so that all sub-functions indeed produce the same results).

Does that make sense?

NimaSarajpoor · 2025-04-21T10:17:02Z

Does that make sense?

YES!!

My suspicion is that the difference between naive.get_all_mpdist_profiles and _get_all_profiles may be due to something similar

Right. Let me dig into both and report back.

Add rare-case tests for snippets

10f4651

NimaSarajpoor requested a review from seanlaw as a code owner April 9, 2025 04:14

replace np.sum with math.fsum to fix loss of precision

ff3cd66

move precision-related test functions to test_precision and allow sec…

99ef190

…ond attempt with disabled jit

seanlaw requested changes Apr 9, 2025

View reviewed changes

swap assertions to get better visibility when unit testing fails

f8bf7b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed #1061 failure in snippet unit tests due to the instability of `np.sum` for array with many small floating point numbers #1080

Fixed #1061 failure in snippet unit tests due to the instability of `np.sum` for array with many small floating point numbers #1080

NimaSarajpoor commented Apr 9, 2025

codecov bot commented Apr 9, 2025 •

edited

Loading

seanlaw left a comment

seanlaw Apr 9, 2025

seanlaw Apr 9, 2025

NimaSarajpoor commented Apr 9, 2025 •

edited

Loading

seanlaw commented Apr 9, 2025 •

edited

Loading

NimaSarajpoor commented Apr 10, 2025 •

edited

Loading

seanlaw commented Apr 10, 2025 •

edited

Loading

NimaSarajpoor commented Apr 10, 2025

seanlaw commented Apr 10, 2025 •

edited

Loading

NimaSarajpoor commented Apr 19, 2025

seanlaw commented Apr 20, 2025 •

edited

Loading

NimaSarajpoor commented Apr 21, 2025

Fixed #1061 failure in snippet unit tests due to the instability of np.sum for array with many small floating point numbers #1080

Are you sure you want to change the base?

Fixed #1061 failure in snippet unit tests due to the instability of np.sum for array with many small floating point numbers #1080

Conversation

NimaSarajpoor commented Apr 9, 2025

codecov bot commented Apr 9, 2025 • edited Loading

Codecov Report

seanlaw left a comment

Choose a reason for hiding this comment

seanlaw Apr 9, 2025

Choose a reason for hiding this comment

seanlaw Apr 9, 2025

Choose a reason for hiding this comment

NimaSarajpoor commented Apr 9, 2025 • edited Loading

seanlaw commented Apr 9, 2025 • edited Loading

NimaSarajpoor commented Apr 10, 2025 • edited Loading

seanlaw commented Apr 10, 2025 • edited Loading

NimaSarajpoor commented Apr 10, 2025

seanlaw commented Apr 10, 2025 • edited Loading

NimaSarajpoor commented Apr 19, 2025

seanlaw commented Apr 20, 2025 • edited Loading

NimaSarajpoor commented Apr 21, 2025

Fixed #1061 failure in snippet unit tests due to the instability of `np.sum` for array with many small floating point numbers #1080

Fixed #1061 failure in snippet unit tests due to the instability of `np.sum` for array with many small floating point numbers #1080

codecov bot commented Apr 9, 2025 •

edited

Loading

NimaSarajpoor commented Apr 9, 2025 •

edited

Loading

seanlaw commented Apr 9, 2025 •

edited

Loading

NimaSarajpoor commented Apr 10, 2025 •

edited

Loading

seanlaw commented Apr 10, 2025 •

edited

Loading

seanlaw commented Apr 10, 2025 •

edited

Loading

seanlaw commented Apr 20, 2025 •

edited

Loading