Skip to content

Fixed #1061 failure in snippet unit tests due to the instability of np.sum for array with many small floating point numbers #1080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NimaSarajpoor
Copy link
Collaborator

See issue #1061

@NimaSarajpoor NimaSarajpoor requested a review from seanlaw as a code owner April 9, 2025 04:14
Copy link

codecov bot commented Apr 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.63%. Comparing base (423c679) to head (f8bf7b6).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1080   +/-   ##
=======================================
  Coverage   96.62%   96.63%           
=======================================
  Files          93       93           
  Lines       15376    15410   +34     
=======================================
+ Hits        14857    14891   +34     
  Misses        519      519           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor Should the unit tests be failing?


# This test raises an error if arithmetic operation in ...
# ... `gpu_stump._compute_and_update_PI_kernel` does not
# generates the same result if values of variable for mean and std
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"generate"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"same values AS"

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Apr 9, 2025

@seanlaw

Should the unit tests be failing?

For the last two commits, I expected all tests to pass.. My expectation was wrong as I had tested it locally in a different environment.

I was able to re-produce the error raised by macOS latest, python 3.9 We are facing the same issue, i.e. the loss of precision in np.sum, again; BUT, this time, it is for the performant version (we are using np.sum in the performant code).

@seanlaw
Copy link
Contributor

seanlaw commented Apr 9, 2025

I was able to re-produce the error raised by macOS latest, python 3.9 We are facing the same issue, i.e. the loss of precision in np.sum, again; BUT, this time, it is for the performant version (we are using np.sum in the performant code).

I don't understand. How is np.sum in the performant version different from np.sum being used in naive.py? If they are "different" then can we implement a naive naive_np_sum function that mimics the same calculations as the real np.sum?

At the end of the day, what matters is that the answers are the same and not that the performant results are of ultra-high precision

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Apr 10, 2025

How is np.sum in the performant version different from np.sum being used in naive.py?

np.sum is not different. The array that is passed as argument to np.sum, is different. And, depending on that array, np.sum sometimes shows its bad behaviour sometime it does not.

A SIMPLE EXAMPLE TO BETTER UNDERSTAND THE PROBLEM
Suppose that, in naive version, there are two arrays, A and A'. We want to compare "SUM(A)" and "SUM(A')", and based on that choose the snippet index and its corresponding subsequence. For the sake of simplicity, let's say there is the list lst = [A, A'], and I want to choose the index that corresponds to array with lowest sum. In our particular rare case, the arrays A and A' have the same elements but the order of elements is just different. So, the expectation is to get index 0 as the result. But, we recently realized that np.sum can be unstable, and we may get np.sum(A) > np.sum(A'), and that gives index 1 as the output. As you pointed out, we do not need to be worried as long as the performant version shows the same behaviour. Here is the tricky part though... the arrays A and A' in the performant version can be different than their naive version. These two arrays in the performant version have the same elements and np.sum might give correct output for those arrays, i.e. we might get np.sum(A) == np.sum(A') in the performant version.

A REAL EXAMPLE
To better understand this, we can take a look at this example:

# input
seed = 1615
np.random.seed(seed)
T = np.random.uniform(-1000.0, 1000.0, 64)

m = 10
s = 3
k = 3

mpdist_T_subseq_isconstant = functools.partial(
    naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
)

# other params with default
percentage = 1.0
mpdist_percentage = 0.05
mpdist_k = None

I would like to check the 2D array np.minimum(D, Q) in the last iteration of this for-loop in snippet algo. I would like to check the rows at indices 1 and 3 (let's call them arrays a and b) and answer the following questions:

(1) Do `a` and `b` have the same elements?
(2) With `math.fsum`: SUM(a)==SUM(b)?
(3) With `np.sum`: SUM(a)==SUM(b)?

And I want to answer those questions for three cases naive, performant, and performant with disabled JIT. You can see the result below for two different environments, followed by the code.

# macOS
# env1: python 3.9, numpy 2.0.2, numba 0.60.0

==================================================
naive
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False
==================================================
performant
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False
==================================================
performant with disabled JIT
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False

and, for a different environment:

# macOS
# env2: python 3.13, numpy 2.1.3, numba 0.61.0

==================================================
naive
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  False
==================================================
performant
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True
==================================================
performant with disabled JIT
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True

Observation: In env1, np.sum shows its instability for both naive and performant . In env2, np.sum shows its instability in the naive version. So, to avoid that, we can use math.fsum for naive (which. is what we did in latest commit). However, this now results in failure in env1. As shown above, we can use math.fsum in performant as well. Haven't checked yet to see how much it is slower than np.sum, and how much it affects the performance of snippet.

Code
import functools
import math

import naive
import numba
import numpy as np

from stumpy.snippets import _get_all_profiles


def print_info(D, m, k, sum_func):
    Q = np.full(D.shape[-1], np.inf, dtype=np.float64)
    indices = np.arange(D.shape[0], dtype=np.int64) * m
    for i in range(k):
        min_DQ = np.minimum(D, Q)
        sum_min_DQ = sum_func(min_DQ, axis=1)
        profile_areas = sum_min_DQ
        idx = np.argmin(profile_areas)    
        Q[:] = np.minimum(D[idx], Q)

        snippet_index = indices[idx]

    # check min_DQ in latest iteration
    a = min_DQ[1]  # corresponds to snippet index 10
    b = min_DQ[3]  # corresponds to snippet index 30

    print('Do a and b have the same elements? ', np.all(np.sort(a) == np.sort(b)))
    print('With math.fsum: SUM(a)==SUM(b)? ', math.fsum(a) == math.fsum(b))
    print('With np.sum: SUM(a)==SUM(b)? ', np.sum(a) == np.sum(b))

    return


def check_snippets_rare_case():
    seed = 1615
    np.random.seed(seed)
    T = np.random.uniform(-1000.0, 1000.0, 64)

    m = 10
    s = 3
    k = 3

    mpdist_T_subseq_isconstant = functools.partial(
        naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
    )

    percentage = 1.0
    mpdist_percentage = 0.05
    mpdist_k = None
    
    # case: naive
    print("=" * 50)
    print("naive")
    D = naive.get_all_mpdist_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )

    print_info(D, m, k, sum_func=np.sum)


    # case: performant
    print("=" * 50)
    print("performant")
    D = _get_all_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )

    print_info(D, m, k, sum_func=np.sum)

    # case: performant with disabled JIT 
    print("=" * 50)
    print("performant with disabled JIT")
    numba.config.DISABLE_JIT = True
    D = _get_all_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )

    print_info(D, m, k, sum_func=np.sum)

    return 


if __name__ == "__main__":
    check_snippets_rare_case()

@seanlaw
Copy link
Contributor

seanlaw commented Apr 10, 2025

@NimaSarajpoor Maybe this is a stupid question/comment but I noticed that you generate the distance profiles using naive.get_all_mpdist_profiles for the naive version and then _get_all_profiles for the performant and JIT disabled versions. When I force all versions to use D = _get_all_profiles (i.e., I precompute D once and only once and then I reuse it everywhere then the results are consistent):

import functools
import math

import naive
import numba
import numpy as np

from stumpy.snippets import _get_all_profiles


def print_info(D, m, k, sum_func):
    Q = np.full(D.shape[-1], np.inf, dtype=np.float64)
    indices = np.arange(D.shape[0], dtype=np.int64) * m
    for i in range(k):
        min_DQ = np.minimum(D, Q)
        sum_min_DQ = sum_func(min_DQ, axis=1)
        profile_areas = sum_min_DQ
        idx = np.argmin(profile_areas)    
        Q[:] = np.minimum(D[idx], Q)

        snippet_index = indices[idx]

    # check min_DQ in latest iteration
    a = min_DQ[1]  # corresponds to snippet index 10
    b = min_DQ[3]  # corresponds to snippet index 30

    print('Do a and b have the same elements? ', np.all(np.sort(a) == np.sort(b)))
    print('With math.fsum: SUM(a)==SUM(b)? ', math.fsum(a) == math.fsum(b))
    print('With np.sum: SUM(a)==SUM(b)? ', np.sum(a) == np.sum(b))

    return


def check_snippets_rare_case():
    seed = 1615
    np.random.seed(seed)
    T = np.random.uniform(-1000.0, 1000.0, 64)

    m = 10
    s = 3
    k = 3

    mpdist_T_subseq_isconstant = functools.partial(
        naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
    )

    percentage = 1.0
    mpdist_percentage = 0.05
    mpdist_k = None

    D = _get_all_profiles(
        T,
        m,
        percentage,
        s,
        mpdist_percentage,
        mpdist_k,
        mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
    )
    
    # case: naive
    print("=" * 50)
    print("naive")
    print_info(D.copy(), m, k, sum_func=np.sum)


    # case: performant
    print("=" * 50)
    print("performant")
    print_info(D.copy(), m, k, sum_func=np.sum)

    # case: performant with disabled JIT 
    print("=" * 50)
    print("performant with disabled JIT")
    numba.config.DISABLE_JIT = True
    print_info(D.copy(), m, k, sum_func=np.sum)

    return 


if __name__ == "__main__":
    check_snippets_rare_case()

Output:

==================================================
naive
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True
==================================================
performant
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True
==================================================
performant with disabled JIT
Do a and b have the same elements?  True
With math.fsum: SUM(a)==SUM(b)?  True
With np.sum: SUM(a)==SUM(b)?  True

Based on this, I am inclined to believe that as long as we can make naive.get_all_mpdist_profiles match the output of _get_all_profiles then we might be good?

Let me know what you think.

@NimaSarajpoor
Copy link
Collaborator Author

You are correct. This should resolve the MAIN root cause.

One approach is to break down the snippet function and creates a callee that accepts "D" as input (as a side note: this logic can also mean a user can provide their own D to compute snippet??). Then, we can compute D in both naive and performant, and if the values are all close , then we pass only one of them to the callees in both naive and performant versions. What do you suggest?

@seanlaw
Copy link
Contributor

seanlaw commented Apr 10, 2025

I think that the correct thing to do is to understand why the naive.get_all_mpdist_profiles and _get_all_profiles produce different results and then to fix the naive version. Then it shouldn't be a precision issue.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

I think that the correct thing to do is to understand why the naive.get_all_mpdist_profiles and _get_all_profiles produce different results and then to fix the naive version. Then it shouldn't be a precision issue.

Let D_ref and D_comp denote, respectively, the output of naive.get_all_mpdist_profiles, and the output of _get_all_profiles.

IIUC the goal is to understand the main cause of having different values inD_ref[i, j] and D_comp[i, j], and by resolving that, we can get EXACTLY the same output. Since the performant version not only has a different algo (implementation) but also is njit-decorated, I think I should follow the two following steps:

(1) First, disable JIT and see if the values are matching. If not, we can try to resolve it.
(2) Second, after resolving (1), enable JIT and see if the values are matching. If not, we can try to resolve it.

Please let me know if I misunderstood your point.

@seanlaw
Copy link
Contributor

seanlaw commented Apr 20, 2025

Please let me know if I misunderstood your point.

I think you've captured it though I wouldn't focus on the changing the performant/NJIT version. If you recall, when we use to compare our performant stump algo with the naive version and the order of indices were different because, fundamentally, the order of operations was different enough in the naive version. Nonetheless, you eventually implemented a naive version that then followed and ensured the same traversal order as the performant version.

My suspicion is that the difference between naive.get_all_mpdist_profiles and _get_all_profiles may be due to something similar (i.e., some slight algorithmic deviation in the naive version. Both naive and performant functions also call an underlying mpdist_vect function, which may also produced different results and so we should fix the discrepancies at the root of it (by modifying the naive version so that all sub-functions indeed produce the same results).

Does that make sense?

@NimaSarajpoor
Copy link
Collaborator Author

Does that make sense?

YES!!

My suspicion is that the difference between naive.get_all_mpdist_profiles and _get_all_profiles may be due to something similar

Right. Let me dig into both and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants