Rework initialization of constants & class variables #15333

ysbaddaden · 2025-01-09T13:38:30Z

Follow up and closes #15216 by @BlobCodes. Also incorporates the LLVM optimization from master...BlobCodes:crystal:perf/crystal-once-v2.

Similar to the original PR, this changes the flag to have 3 states instead of 2. This led to change the signature of the __crystal_once[_init] functions that now take an Int8 (i8) instead of Bool (i1). In addition it drops the "once state" that isn't needed anymore.

This requires a new compiler build to benefit from the improvement. The legacy versions of the __crystal_once[_init] methods are still supported by both the stdlib and the compiler to keep both forward & backward compatibility (1.15- can build 1.16+ and 1.16+ can build 1.15-).

Unlike the original PR, this one doesn't use any atomics: the mutex already protects the flag, and the mutex itself is explicitly initialized by the main thread before any other thread is started (i.e. no parallelism issues).

A follow-up could leverage ReferenceStorage and .unsafe_construct to inline the Mutex instead of allocating in the GC heap. Along with #15330 then __crystal_once_init could become allocation free, which could prove useful for such a core/low level feature.

src/crystal/once.cr

oprypin · 2025-01-09T17:11:01Z

This should have a Co-authored-by: attribution somewhere in some commit

ysbaddaden · 2025-01-10T16:41:38Z

Rebased to add the co-authored-by to the individual commits 👍

oprypin · 2025-01-10T17:09:33Z

I think it's not registering in the interface. Maybe it has to be on the last line 🤔

straight-shoota · 2025-01-10T17:20:55Z

FWIW we could also add the co-author attribution in the squash commit when we merge this PR (though we'd need to remember that, so it's best to already add it to the original commits 😅)

Blacksmoke16 · 2025-01-10T17:24:23Z

I'm pretty sure the easiest way would be to cherry pick their commits into a branch, then push up your own. If there are commits from more than 1 author on a branch GH should automatically add the co-authored-by to the squash commit message for all additional authors.

ysbaddaden · 2025-01-10T18:37:58Z

I added a commit to only enable the feature from crystal 1.16.0-dev. Let's merge after updating the version in the master branch.

EDIT: why is CI getting completely broken because I changed the version comparison from 1.15.0-dev to 1.16.0-dev?

ysbaddaden · 2025-01-10T19:20:21Z

Trying to work on #14905 I notice that this is still concurrency unsafe because we only enable Mutex for preview_mt. The example thus fails with "recursive initialization" which is wrong because they're concurrent accesses.

ysbaddaden · 2025-01-13T09:31:31Z

Moving back to draft while I'm working on #14905. I might change the new __crystal_once signature to take a closure_data pointer, so we can pass any proc.

ysbaddaden · 2025-01-13T18:50:27Z

Ready again. I won't need change to the fun signatures, so it's 👌

@BlobCodes

Co-authored-by: David Keller <[email protected]> Based on the PR by @BlobCodes: crystal-lang#15216 The performance improvement is two-fold: 1. the usage of a i8 instead of an i1 boolean to have 3 states instead of 2, which permits to quickly detect recursive calls without an array; 2. inline tricks to optimize the fast and slow paths. Unlike the PR: 1. Doesn't use atomics: it already uses a mutex that guarantees acquire release memory ordering semantics, and __crystal_once_init is only ever called in the main thread before any other thread is started. 2. Removes the need for a state maintained by the compiler, yet keeps forward and backward compatibility (both signatures are supported).

@BlobCodes

Co-authored-by: David Keller <[email protected]> @BlobCodes: I noticed that adding this code prevents LLVM from re-running the once mechanism multiple times for the same variable. Modified to avoid an undefined behavior when the assumption doesn't hold that doubles as a safety net (print error + exit).

@BlobCodes

Co-authored-by: David Keller <[email protected]> @BlobCodes: I think it would be better to print the bug message in `Crystal.once` instead of `__crystal_once` to reduce complexity at the callsite. The previous unreachable method can then be used in the inlined `__crystal_once` so LLVM also knows it doesn't have to re-run the method. It's now even safe because `Crystal.once` would panic if it failed; it should already be impossible, but let's err on the safe side.

ysbaddaden · 2025-01-14T11:57:55Z

CI is finally all green.

src/crystal/once.cr

src/compiler/crystal/codegen/once.cr

straight-shoota · 2025-01-16T14:54:27Z

src/crystal/once.cr

+    # tell LLVM that it can optimize away repeated `__crystal_once` calls for
+    # this global (e.g. repeated access to constant in a single funtion);
+    # this is truly unreachable otherwise `Crystal.once` would have panicked
+    Crystal.once_unreachable unless flag.value.initialized?


praise: This is pretty darn clever 👏 Kudos to @BlobCodes

I second that: thanks a lot @BlobCodes 🙇

src/crystal/once.cr

BlobCodes · 2025-01-16T15:54:50Z

src/crystal/once.cr

+    enum OnceState : Int8
+      Processing    = -1
+      Uninitialized = 0
+      Initialized   = 1
+    end


This is very nit-picky, but in my v2 branch, I replaced this trinary enum with a Int8 so all comparisons are done with 0 (== Initialized becomes > 0).

Comparing with 0 results in fewer (or smaller) assembly instructions on most CPUs
The only arch I'm really familiar with is risc-v, so here's an example in risc-v:

# enum comparison `return if state == Initialized` li t0, 1 # load 1 into t0 bne t0, a0, init # initialize if needed ret # early return (or normal program flow if inlined) init: # stuff # comparison `return if state <= 0` blez a0, init # initialize if needed ret # early-return (or normal program flow if inlined) init: # stuff

Of course this is micro-optimization, but if this is inlined into every const access, it could be noticable.

EDIT: I fixed the examples below... I stupidly used a UInt8 🤦

On x86_64 the only difference is in the jump instruction:

; flag.value == 1 cmpb $0x1,(%rdi) jne 39 <foo+0x9> ; flag.value > 0 cmpb $0x0,(%rdi) jle 39 <foo+0x9>

Same for ARM32:

; flag.value == 1 ldrb r0, [r0] cmp r0, #1 moveq pc, lr ; flag.value > 0 ldrb r0, [r0] cmp r0, #1 movge pc, lr

~~But AArch64 indeed only needs one instruction instead of two for the equality check:~~ And same for AArch64:

; flag.value == 1 ldrb w8, [x0] cmp w8, #0x1 b.ne 40 <foo+0x10> ; flag.value > 0 ldrb w8, [x0] cmp w8, #0x1 b.lt 40 <*foo+0x10>

NOTE: I'm not fluent in the assembly of each arch. I compiled a tiny program with --cross-compile --target=... then used objdump --disassemble from a crosschain build of binutils to compare the LLVM generated assembly.

; flag.value > 0 cmpb $0x0,(%rdi) je 39 <foo+0x9>

To me that looks like an unsigned comparison. So it only checks x != 0.

I just wrote a commit, but didn't push it (yet): I'm wondering if we'd really benefit from the change in practice 🤔

I tested again, it's actually a jle!

@BlobCodes I run my tests again and updated my comment above.

We can probably do better in manually written assembly, but LLVM actually generates the same assembly for all 3 architectures. The only difference stands in the jump instruction 🤷

Let's see in a follow up if we can squeeze even more performance with this idea.

src/crystal/once.cr

oprypin · 2025-01-17T13:11:56Z

Do not forget!! Co-authored-by to be moved to the bottom

straight-shoota · 2025-01-17T13:18:09Z

@oprypin: GitHub seems to recognize it. Correctly populates the Co-Authored-By tag for the commit message:

oprypin · 2025-01-17T13:49:35Z

@straight-shoota That's not right. The suggested commit message that I get is huge, with all messages concatenated. Maybe you edited it yourself. Try to clear browser caches.

straight-shoota · 2025-01-17T15:07:48Z

I'm using https://github.com/refined-github/refined-github which is probably adjusting the commit message from the default experience (or it's GitHub's New Merge Experience feature). Anyway, as a result I get the correct attribution pre-filled. So it must be discoverable somehow. I

oprypin · 2025-01-17T18:13:14Z

Hmm well in that case it's sad that it completely loses the concatenated commit descriptions. That seems undesirable in general, not only here. Or maybe you consider it a positive, I don't know

ysbaddaden added kind:refactor topic:stdlib:runtime labels Jan 9, 2025

ysbaddaden self-assigned this Jan 9, 2025

ysbaddaden added the topic:compiler:codegen label Jan 9, 2025

Sija reviewed Jan 9, 2025

View reviewed changes

src/crystal/once.cr Show resolved Hide resolved

ysbaddaden force-pushed the refactor/crystal-once branch from d1de2eb to ce6e9a2 Compare January 10, 2025 16:40

ysbaddaden marked this pull request as draft January 13, 2025 09:30

ysbaddaden mentioned this pull request Jan 13, 2025

Add fiber safety to __crystal_once & class_[getter|property]?(&) macros #15340

Closed

ysbaddaden marked this pull request as ready for review January 13, 2025 18:49

ysbaddaden added 6 commits January 14, 2025 09:53

Fix: macro 'class_property' must be defined

6ebfcf4

Fix: only enabled for 1.16.0-dev (not 1.15.0-dev)

71a1c4d

Add LLVM optimization to legacy __crystal_once

2296069

ysbaddaden force-pushed the refactor/crystal-once branch from f7e50b7 to 2296069 Compare January 14, 2025 08:57

Fix: LLVM 14 and below need explicit pointer cast (i8* -> i1*)

2c1c20b

Fix: missing :nodoc:

273978b

straight-shoota reviewed Jan 16, 2025

View reviewed changes

BlobCodes reviewed Jan 16, 2025

View reviewed changes

src/crystal/once.cr Show resolved Hide resolved

ysbaddaden added 4 commits January 16, 2025 19:46

Extract Intrinsics.unreachable

e7527d3

Only cast i8* to i1* on LLVM < 15

e42a575

No need for Crystal.once_mutex getter

4f5092e

Fix: crystal tool format

cf58312

straight-shoota approved these changes Jan 16, 2025

View reviewed changes

ysbaddaden added this to the 1.16.0 milestone Jan 17, 2025

straight-shoota merged commit 8d02c8b into crystal-lang:master Jan 20, 2025
70 checks passed

ysbaddaden deleted the refactor/crystal-once branch January 30, 2025 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework initialization of constants & class variables #15333

Rework initialization of constants & class variables #15333

ysbaddaden commented Jan 9, 2025 •

edited

Loading

oprypin commented Jan 9, 2025

ysbaddaden commented Jan 10, 2025

oprypin commented Jan 10, 2025

straight-shoota commented Jan 10, 2025

Blacksmoke16 commented Jan 10, 2025

ysbaddaden commented Jan 10, 2025 •

edited

Loading

ysbaddaden commented Jan 10, 2025 •

edited

Loading

ysbaddaden commented Jan 13, 2025

ysbaddaden commented Jan 13, 2025

ysbaddaden commented Jan 14, 2025 •

edited

Loading

straight-shoota Jan 16, 2025

ysbaddaden Jan 17, 2025

BlobCodes Jan 16, 2025 •

edited

Loading

ysbaddaden Jan 16, 2025 •

edited

Loading

BlobCodes Jan 16, 2025

ysbaddaden Jan 16, 2025

ysbaddaden Jan 16, 2025

ysbaddaden Jan 16, 2025

ysbaddaden Jan 17, 2025

oprypin commented Jan 17, 2025

straight-shoota commented Jan 17, 2025

oprypin commented Jan 17, 2025 •

edited

Loading

straight-shoota commented Jan 17, 2025

oprypin commented Jan 17, 2025

Rework initialization of constants & class variables #15333

Rework initialization of constants & class variables #15333

Conversation

ysbaddaden commented Jan 9, 2025 • edited Loading

oprypin commented Jan 9, 2025

ysbaddaden commented Jan 10, 2025

oprypin commented Jan 10, 2025

straight-shoota commented Jan 10, 2025

Blacksmoke16 commented Jan 10, 2025

ysbaddaden commented Jan 10, 2025 • edited Loading

ysbaddaden commented Jan 10, 2025 • edited Loading

ysbaddaden commented Jan 13, 2025

ysbaddaden commented Jan 13, 2025

ysbaddaden commented Jan 14, 2025 • edited Loading

straight-shoota Jan 16, 2025

Choose a reason for hiding this comment

ysbaddaden Jan 17, 2025

Choose a reason for hiding this comment

BlobCodes Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

ysbaddaden Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

BlobCodes Jan 16, 2025

Choose a reason for hiding this comment

ysbaddaden Jan 16, 2025

Choose a reason for hiding this comment

ysbaddaden Jan 16, 2025

Choose a reason for hiding this comment

ysbaddaden Jan 16, 2025

Choose a reason for hiding this comment

ysbaddaden Jan 17, 2025

Choose a reason for hiding this comment

oprypin commented Jan 17, 2025

straight-shoota commented Jan 17, 2025

oprypin commented Jan 17, 2025 • edited Loading

straight-shoota commented Jan 17, 2025

oprypin commented Jan 17, 2025

ysbaddaden commented Jan 9, 2025 •

edited

Loading

ysbaddaden commented Jan 10, 2025 •

edited

Loading

ysbaddaden commented Jan 10, 2025 •

edited

Loading

ysbaddaden commented Jan 14, 2025 •

edited

Loading

BlobCodes Jan 16, 2025 •

edited

Loading

ysbaddaden Jan 16, 2025 •

edited

Loading

oprypin commented Jan 17, 2025 •

edited

Loading