-
Notifications
You must be signed in to change notification settings - Fork 2
[Draft] Arancini Calling Convention #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I cannot convert this to a draft since I'm lacking some permissions. |
923d9da to
8a8c8e5
Compare
acfc159 to
f89eff1
Compare
5cfb628 to
de6ec95
Compare
de6ec95 to
1af785e
Compare
a2da007 to
6c75a9c
Compare
|
For the microbenchmark of virt fn calls, this speeds those up a bit:
|
4528a08 to
907eb6d
Compare
ee16c9f to
f9f42e4
Compare
We now no longer return the cpu_state, as the caller always has a reference to the CPU state. This change requires using the latest version of our LLVM fork, as that skips one return register for the Arancini calling convention. This should free up one register for certain optimizations, if the cpu state is not accessed from one point in the function anymore. There is an issue with the call to the internal_main_loop. If LLVM is allowed to tail-optimize it, something is broken and the translated binary crashes. For now, we just set the call to not tail-optimize. At some point in the future, we should fix this.
|
Function signatures now look like this: define arancini { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } @func(ptr noalias nocapture noundef nonnull align 16 dereferenceable(2232) %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i64 %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, i64 %14, i64 %15, i64 %16, i64 %17)We pass the CPU state + PC + 16 * GPR + FS + GS and return the same except for the CPU state, as the caller already has the CPU state. To ensure that we then don't need to shuffle stuff around, we skip one return register in the calling convention. |
This adds an arancini calling convention in LLVM that passes and returns all 16 GPRS + PC + FS + GS in host registers.
This should, in theory, decrease the amount of loads/stores from and to the cpu state.
Untested so far.
Example for a small function with just three params/return values on AArch64.
lowers to the following:
There's some stuff in here that should not be merged. Also, the LLVM code for this currently lives at https://github.com/martin-fink/llvm-project/tree/arancini-calling-conv, we might want to move it here if this makes sense.