[Fix] Same Subnormal value over all platform #141

SwayamInSync · 2025-09-04T08:11:50Z

This PR adds the patch to provide same value of smallest_subnormal over all platforms with different endianess and fixes #140 .

quaddtype/tests/test_quaddtype.py

ngoldbaum · 2025-09-04T16:14:14Z

I think you might have found a thread safety issue in the Dragon4 float128 printing. Here's the traceback for the segfault if I remove the mutex you added:

* thread #42, stop reason = EXC_BAD_ACCESS (code=2, address=0x100e98000)
    frame #0: 0x0000000100e6dec0 _quaddtype_main.cpython-313t-darwin.so`Dragon4 [inlined] BigInt_Multiply(result=0x0000000100e91dd8, lhs=0x0000000100e92dd8, rhs=<unavailable>) at dragon4.c:534:24 [opt]
   531 	            } while (largeCur != large->blocks + large->length);
   532
   533 	            DEBUG_ASSERT(resultCur < result->blocks + maxResultLen);
-> 534 	            *resultCur = (npy_uint32)(carry & bitmask_u64(32));
   535 	        }
   536 	    }
   537
Target 0: (python) stopped.
warning: _quaddtype_main.cpython-313t-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #42, stop reason = EXC_BAD_ACCESS (code=2, address=0x100e98000)
  * frame #0: 0x0000000100e6dec0 _quaddtype_main.cpython-313t-darwin.so`Dragon4 [inlined] BigInt_Multiply(result=0x0000000100e91dd8, lhs=0x0000000100e92dd8, rhs=<unavailable>) at dragon4.c:534:24 [opt]
    frame #1: 0x0000000100e6de2c _quaddtype_main.cpython-313t-darwin.so`Dragon4 [inlined] BigInt_Pow10(result=<unavailable>, exponent=1, temp=<unavailable>) at dragon4.c:633:13 [opt]
    frame #2: 0x0000000100e6ddb4 _quaddtype_main.cpython-313t-darwin.so`Dragon4(bigints=0x0000000100e8cdd8, exponent=<unavailable>, mantissaBit=<unavailable>, hasUnequalMargins='\0', digitMode=<unavailable>, cutoffMode=<unavailable>, cutoff_max=<unavailable>, cutoff_min=<unavailable>, pOutBuffer=<unavailable>, bufferSize=16384, pOutExponent=0x0000000172e1eb60) at dragon4.c:1169:9 [opt]
    frame #3: 0x0000000100e6d218 _quaddtype_main.cpython-313t-darwin.so`Dragon4_PrintFloat_Sleef_quad [inlined] FormatScientific(buffer=<unavailable>, bufferSize=16384, mantissa=<unavailable>, exponent=<unavailable>, signbit=<unavailable>, mantissaBit=0, hasUnequalMargins='\0', digit_mode=DigitMode_Unique, precision=33, min_digits=0, trim_mode=TrimMode_LeaveOneZero, digits_left=<unavailable>, exp_digits=3) at dragon4.c:1694:17 [opt]
    frame #4: 0x0000000100e6d1cc _quaddtype_main.cpython-313t-darwin.so`Dragon4_PrintFloat_Sleef_quad [inlined] Format_floatbits(buffer=<unavailable>, bufferSize=16384, mantissa=<unavailable>, exponent=<unavailable>, signbit=<unavailable>, mantissaBit=0, hasUnequalMargins='\0', opt=<unavailable>) at dragon4.c:1840:16 [opt]
    frame #5: 0x0000000100e6d12c _quaddtype_main.cpython-313t-darwin.so`Dragon4_PrintFloat_Sleef_quad(value=<unavailable>, opt=<unavailable>) at dragon4.c:1914:12 [opt]
    frame #6: 0x0000000100e6ee84 _quaddtype_main.cpython-313t-darwin.so`Dragon4_Scientific_QuadDType [inlined] Dragon4_Scientific_QuadDType_opt(val=<unavailable>, opt=0x0000000172e1ebe8) at dragon4.c:1954:9 [opt]
    frame #7: 0x0000000100e6ee7c _quaddtype_main.cpython-313t-darwin.so`Dragon4_Scientific_QuadDType(val=<unavailable>, digit_mode=DigitMode_Unique, precision=<unavailable>, min_digits=<unavailable>, sign=<unavailable>, trim=<unavailable>, pad_left=<unavailable>, exp_digits=<unavailable>) at dragon4.c:1978:12 [opt]
    frame #8: 0x0000000100e6a764 _quaddtype_main.cpython-313t-darwin.so`QuadPrecision_repr_dragon4(self=0x000002401cc9e2c0) at scalar.c:0 [opt]
    frame #9: 0x000000010077b044 libpython3.13t.dylib`PyObject_Repr + 108
    frame #10: 0x0000000100775ec8 libpython3.13t.dylib`cfunction_vectorcall_O + 408
    frame #11: 0x00000001007116f8 libpython3.13t.dylib`PyObject_Vectorcall + 88
    frame #12: 0x0000000100855c0c libpython3.13t.dylib`_PyEval_EvalFrameDefault + 36992
    frame #13: 0x0000000100714560 libpython3.13t.dylib`method_vectorcall + 316
    frame #14: 0x0000000100940230 libpython3.13t.dylib`thread_run + 128
    frame #15: 0x00000001008d64d8 libpython3.13t.dylib`pythread_wrapper + 28
    frame #16: 0x000000019c4f3c0c libsystem_pthread.dylib`_pthread_start + 136

I'm going to try this again with a TSan build of Python to see where the first data race happens...

quaddtype/numpy_quaddtype/src/scalar.c

ngoldbaum · 2025-09-04T16:27:56Z

Here's the race:

WARNING: ThreadSanitizer: data race (pid=15439)
  Write of size 4 at 0x000164124ea0 by thread T391:
    #0 Dragon4_PrintFloat_Sleef_quad dragon4.c:1913 (_quaddtype_main.cpython-314t-darwin.so:arm64+0x100dc)
    #1 Dragon4_Scientific_QuadDType dragon4.c:1978 (_quaddtype_main.cpython-314t-darwin.so:arm64+0x122a4)
    #2 QuadPrecision_repr_dragon4 scalar.c (_quaddtype_main.cpython-314t-darwin.so:arm64+0xb288)
    #3 PyObject_Repr object.c:779 (libpython3.14t.dylib:arm64+0x1356bc)
    #4 builtin_repr bltinmodule.c:2571 (libpython3.14t.dylib:arm64+0x276b50)
    #5 cfunction_vectorcall_O methodobject.c:536 (libpython3.14t.dylib:arm64+0x12d044)
    #6 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8aea8)
    #7 _PyEval_EvalFrameDefault generated_cases.c.h:1619 (libpython3.14t.dylib:arm64+0x27f504)
    #8 _PyEval_Vector ceval.c:1965 (libpython3.14t.dylib:arm64+0x27b278)
    #9 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8b4fc)
    #10 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x8fa34)
    #11 context_run context.c:728 (libpython3.14t.dylib:arm64+0x2c9790)
    #12 _PyEval_EvalFrameDefault generated_cases.c.h:3744 (libpython3.14t.dylib:arm64+0x2859c4)
    #13 _PyEval_Vector ceval.c:1965 (libpython3.14t.dylib:arm64+0x27b278)
    #14 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8b4fc)
    #15 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x8fa34)
    #16 _PyObject_Call call.c:348 (libpython3.14t.dylib:arm64+0x8b160)
    #17 PyObject_Call call.c:373 (libpython3.14t.dylib:arm64+0x8b1d8)
    #18 thread_run _threadmodule.c:359 (libpython3.14t.dylib:arm64+0x41e014)
    #19 pythread_wrapper thread_pthread.h:242 (libpython3.14t.dylib:arm64+0x36e878)

  Previous write of size 4 at 0x000164124ea0 by thread T392:
    #0 Dragon4_PrintFloat_Sleef_quad dragon4.c:1913 (_quaddtype_main.cpython-314t-darwin.so:arm64+0x100dc)
    #1 Dragon4_Scientific_QuadDType dragon4.c:1978 (_quaddtype_main.cpython-314t-darwin.so:arm64+0x122a4)
    #2 QuadPrecision_repr_dragon4 scalar.c (_quaddtype_main.cpython-314t-darwin.so:arm64+0xb288)
    #3 PyObject_Repr object.c:779 (libpython3.14t.dylib:arm64+0x1356bc)
    #4 builtin_repr bltinmodule.c:2571 (libpython3.14t.dylib:arm64+0x276b50)
    #5 cfunction_vectorcall_O methodobject.c:536 (libpython3.14t.dylib:arm64+0x12d044)
    #6 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8aea8)
    #7 _PyEval_EvalFrameDefault generated_cases.c.h:1619 (libpython3.14t.dylib:arm64+0x27f504)
    #8 _PyEval_Vector ceval.c:1965 (libpython3.14t.dylib:arm64+0x27b278)
    #9 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8b4fc)
    #10 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x8fa34)
    #11 context_run context.c:728 (libpython3.14t.dylib:arm64+0x2c9790)
    #12 _PyEval_EvalFrameDefault generated_cases.c.h:3744 (libpython3.14t.dylib:arm64+0x2859c4)
    #13 _PyEval_Vector ceval.c:1965 (libpython3.14t.dylib:arm64+0x27b278)
    #14 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8b4fc)
    #15 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x8fa34)
    #16 _PyObject_Call call.c:348 (libpython3.14t.dylib:arm64+0x8b160)
    #17 PyObject_Call call.c:373 (libpython3.14t.dylib:arm64+0x8b1d8)
    #18 thread_run _threadmodule.c:359 (libpython3.14t.dylib:arm64+0x41e014)
    #19 pythread_wrapper thread_pthread.h:242 (libpython3.14t.dylib:arm64+0x36e878)

  Location is global '_bigint_static' at 0x000164124ea0 (_quaddtype_main.cpython-314t-darwin.so+0x40ea0)

And indeed _bigint_static seems like a likely name for a global variable!

It looks like it is declared to be NPY_TLS, but I guess that's not actually correctly coming out to be the correct incantation to make this variable thread-local. We might have to do something more complicated.

IMO adding a global mutex is not correct, we should fix this issue with the thread-local variable not being thread-local instead.

ngoldbaum · 2025-09-04T16:29:12Z

BTW, if you want to experiment with using TSan or want to set up TSan CI, we have docker images you can use: https://github.com/nascheme/cpython_sanity

ngoldbaum · 2025-09-04T16:29:41Z

And more docs on using TSan with Python here: https://py-free-threading.github.io/thread_sanitizer/

ngoldbaum · 2025-09-04T16:35:00Z

I guess in order to actually use NPY_TLS in C, you also need to define these HAVE_ variables defined:

https://github.com/numpy/numpy/blob/908e468aff6e6ec00c1f4678dae428ee98a2291a/numpy/_core/include/numpy/npy_common.h#L128-L140

which require some runtime checks in the meson configuration. Here's where that happens in NumPy's meson configuration:

https://github.com/numpy/numpy/blob/908e468aff6e6ec00c1f4678dae428ee98a2291a/numpy/_core/meson.build#L268-L292

Sorry for the trouble, I didn't realize there was this wrinkle around using NPY_TLS until now.

SwayamInSync · 2025-09-04T16:53:34Z

I guess in order to actually use NPY_TLS in C, you also need to define these HAVE_ variables defined:
numpy/numpy@908e468/numpy/_core/include/numpy/npy_common.h#L128-L140

This make sense, I was thinking my code-editor is lazy that I wasn't able to see the definition of NPY_TLS.
but this way NPY_TLS should be already defined as we install numpy to use its public includes and probably it'll be expanding to nothing on CI (making it a normal global variable)

Okay I can try changing the logic of dragon4.c to work without defining a global TLS variable.

ngoldbaum · 2025-09-04T16:55:25Z

Okay I can try changing the logic of dragon4.c to work without defining a global TLS variable.

No, please don't do that. We should just fix the meson configuration so it defines the thread-local annotation correctly.

ngoldbaum · 2025-09-04T17:13:54Z

I think we require C11, don't we? Maybe you can just use _Thread_local unconditionally?

SwayamInSync · 2025-09-04T17:25:18Z

I think we require C11, don't we? Maybe you can just use _Thread_local unconditionally?

I made the changes in meson and it was working as expected locally, detecting the type and setting up the macro, lets see for CI

SwayamInSync · 2025-09-04T17:30:02Z

 ../numpy_quaddtype/src/dragon4.c:34:2: warning: "NPY_TLS Thread-local storage support detected." [-W#warnings]
  #warning "NPY_TLS Thread-local storage support detected.

Okay so it seems to be expanding correctly but issue remains (after removing mutex)

ngoldbaum · 2025-09-04T17:41:48Z

Nice, everything seems to be working with this latest push. I'd say go ahead and merge this. I would also do a bugfix release for this.

quaddtype/numpy_quaddtype/src/dragon4.c

SwayamInSync · 2025-09-04T17:54:50Z

I would also do a bugfix release for this.

Yeah so about that, we need to do some tweaks with meson or add explicitly in toml file, in the previous sdist - submodules, LISCENSE and sleef are not packaged. So this is causing some issues integrating with conda-forge

I checked meson docs that we can define the extra things we want to include in the toml file. So we can also fix this part and then make the release

ngoldbaum · 2025-09-04T17:58:04Z

Makes sense. I hope this has been a fun learning experience in Python Packaging for you 😀

quaddtype/numpy_quaddtype/src/quaddtype_main.c

SwayamInSync · 2025-09-04T19:43:22Z

I think its ready to merge now @ngoldbaum

ngoldbaum · 2025-09-04T20:25:57Z

OK cool merging. I'm not sure if you all decided that this is a bug in SLEEF. If you think it is, it would also be awesome if one of you could report a bug to the upstream SLEEF project.

SwayamInSync · 2025-09-04T21:25:49Z

OK cool merging. I'm not sure if you all decided that this is a bug in SLEEF. If you think it is, it would also be awesome if one of you could report a bug to the upstream SLEEF project.

Atleast for the 3.8 the way they defined SLEEF_QUAD_DENORM_MIN is incorrect,
I managed to get the exact value via sleef_q(+0x0000000000000LL, 0x0000000000000001ULL, -16383) so this seems to be the correct definition.

To check in the current latest version I need to compile it and then verify whether it is same or fixed; Github code of sleef is pretty unreadable as "everything dispatches" during build time

fixing subnormal cross-platform

76d823f

SwayamInSync commented Sep 4, 2025

View reviewed changes

quaddtype/tests/test_quaddtype.py Show resolved Hide resolved

SwayamInSync added the numpy_quaddtype label Sep 4, 2025

SwayamInSync added 2 commits September 4, 2025 18:10

using mutex before memcpy

41bce9a

precomputing the value

e41c832

SwayamInSync marked this pull request as draft September 4, 2025 13:37

SwayamInSync added 7 commits September 4, 2025 19:10

precomputing the value

1addb52

static union

1ff0191

precompile cache

e7104b7

using pymutex on object creation

ea212dd

repr string building with mutex

4b25029

repr string building with mutex

378e86a

Merge pull request #7 from SwayamInSync/subnormal

0ebf6f6

ngoldbaum reviewed Sep 4, 2025

View reviewed changes

quaddtype/numpy_quaddtype/src/scalar.c Outdated Show resolved Hide resolved

selecting TLS support

1d9514b

adding explicit macro def inside dragon4.c

501cf7e

ngoldbaum reviewed Sep 4, 2025

View reviewed changes

quaddtype/numpy_quaddtype/src/dragon4.c Outdated Show resolved Hide resolved

removing inline code warnings

0a9d1e3

juntyr reviewed Sep 4, 2025

View reviewed changes

quaddtype/numpy_quaddtype/src/quaddtype_main.c Outdated Show resolved Hide resolved

remvoing redundant SLEEF_QUAD compile check

1ced8a6

SwayamInSync marked this pull request as ready for review September 4, 2025 19:24

ngoldbaum merged commit 380fb83 into numpy:main Sep 4, 2025
7 checks passed

Uh oh!

[Fix] Same Subnormal value over all platform #141

[Fix] Same Subnormal value over all platform #141

Conversation

SwayamInSync commented Sep 4, 2025

Uh oh!

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

SwayamInSync commented Sep 4, 2025

Uh oh!

SwayamInSync commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

Uh oh!

SwayamInSync commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

Uh oh!

SwayamInSync commented Sep 4, 2025

Uh oh!

ngoldbaum commented Sep 4, 2025

Uh oh!

Uh oh!

SwayamInSync commented Sep 4, 2025

Uh oh!

Uh oh!

ngoldbaum commented Sep 4, 2025 •

edited

Loading

SwayamInSync commented Sep 4, 2025 •

edited

Loading