-
Notifications
You must be signed in to change notification settings - Fork 332
Fixed #1061 failure in snippet unit tests due to the instability of np.sum
for array with many small floating point numbers
#1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1080 +/- ##
=======================================
Coverage 96.62% 96.63%
=======================================
Files 93 93
Lines 15376 15410 +34
=======================================
+ Hits 14857 14891 +34
Misses 519 519 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ond attempt with disabled jit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NimaSarajpoor Should the unit tests be failing?
|
||
# This test raises an error if arithmetic operation in ... | ||
# ... `gpu_stump._compute_and_update_PI_kernel` does not | ||
# generates the same result if values of variable for mean and std |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"generate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"same values AS"
For the last two commits, I expected all tests to pass.. My expectation was wrong as I had tested it locally in a different environment. I was able to re-produce the error raised by macOS latest, python 3.9 We are facing the same issue, i.e. the loss of precision in |
I don't understand. How is At the end of the day, what matters is that the answers are the same and not that the performant results are of ultra-high precision |
A SIMPLE EXAMPLE TO BETTER UNDERSTAND THE PROBLEM A REAL EXAMPLE # input
seed = 1615
np.random.seed(seed)
T = np.random.uniform(-1000.0, 1000.0, 64)
m = 10
s = 3
k = 3
mpdist_T_subseq_isconstant = functools.partial(
naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
)
# other params with default
percentage = 1.0
mpdist_percentage = 0.05
mpdist_k = None I would like to check the 2D array
And I want to answer those questions for three cases
and, for a different environment:
Observation: In Codeimport functools
import math
import naive
import numba
import numpy as np
from stumpy.snippets import _get_all_profiles
def print_info(D, m, k, sum_func):
Q = np.full(D.shape[-1], np.inf, dtype=np.float64)
indices = np.arange(D.shape[0], dtype=np.int64) * m
for i in range(k):
min_DQ = np.minimum(D, Q)
sum_min_DQ = sum_func(min_DQ, axis=1)
profile_areas = sum_min_DQ
idx = np.argmin(profile_areas)
Q[:] = np.minimum(D[idx], Q)
snippet_index = indices[idx]
# check min_DQ in latest iteration
a = min_DQ[1] # corresponds to snippet index 10
b = min_DQ[3] # corresponds to snippet index 30
print('Do a and b have the same elements? ', np.all(np.sort(a) == np.sort(b)))
print('With math.fsum: SUM(a)==SUM(b)? ', math.fsum(a) == math.fsum(b))
print('With np.sum: SUM(a)==SUM(b)? ', np.sum(a) == np.sum(b))
return
def check_snippets_rare_case():
seed = 1615
np.random.seed(seed)
T = np.random.uniform(-1000.0, 1000.0, 64)
m = 10
s = 3
k = 3
mpdist_T_subseq_isconstant = functools.partial(
naive.isconstant_func_stddev_threshold, quantile_threshold=0.05
)
percentage = 1.0
mpdist_percentage = 0.05
mpdist_k = None
# case: naive
print("=" * 50)
print("naive")
D = naive.get_all_mpdist_profiles(
T,
m,
percentage,
s,
mpdist_percentage,
mpdist_k,
mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
)
print_info(D, m, k, sum_func=np.sum)
# case: performant
print("=" * 50)
print("performant")
D = _get_all_profiles(
T,
m,
percentage,
s,
mpdist_percentage,
mpdist_k,
mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
)
print_info(D, m, k, sum_func=np.sum)
# case: performant with disabled JIT
print("=" * 50)
print("performant with disabled JIT")
numba.config.DISABLE_JIT = True
D = _get_all_profiles(
T,
m,
percentage,
s,
mpdist_percentage,
mpdist_k,
mpdist_T_subseq_isconstant=mpdist_T_subseq_isconstant,
)
print_info(D, m, k, sum_func=np.sum)
return
if __name__ == "__main__":
check_snippets_rare_case() |
@NimaSarajpoor Maybe this is a stupid question/comment but I noticed that you generate the distance profiles using
Output:
Based on this, I am inclined to believe that as long as we can make Let me know what you think. |
You are correct. This should resolve the MAIN root cause. One approach is to break down the snippet function and creates a callee that accepts "D" as input (as a side note: this logic can also mean a user can provide their own D to compute snippet??). Then, we can compute D in both naive and performant, and if the values are all close , then we pass only one of them to the callees in both naive and performant versions. What do you suggest? |
I think that the correct thing to do is to understand why the |
Let IIUC the goal is to understand the main cause of having different values in (1) First, disable JIT and see if the values are matching. If not, we can try to resolve it. Please let me know if I misunderstood your point. |
I think you've captured it though I wouldn't focus on the changing the performant/NJIT version. If you recall, when we use to compare our performant My suspicion is that the difference between Does that make sense? |
YES!!
Right. Let me dig into both and report back. |
See issue #1061