Skip to content

Conversation

@martin-fink
Copy link
Collaborator

@martin-fink martin-fink commented Feb 6, 2024

This adds an arancini calling convention in LLVM that passes and returns all 16 GPRS + PC + FS + GS in host registers.
This should, in theory, decrease the amount of loads/stores from and to the cpu state.

Untested so far.

Example for a small function with just three params/return values on AArch64.

define arancini { i64, i64, i64 } @func2(i64 %0, i64 %1, i64 %2) {
  %4 = add i64 %0, 1
  %5 = add i64 %1, 1
  %6 = add i64 %2, 1

  %ret_val = insertvalue { i64, i64, i64 } undef, i64 %4, 0
  %ret_val2 = insertvalue { i64, i64, i64 } %ret_val, i64 %5, 1
  %ret_val3 = insertvalue { i64, i64, i64 } %ret_val2, i64 %6, 2

  ret { i64, i64, i64 } %ret_val3
}

lowers to the following:

add	x0, x0, #1
add	x1, x1, #1
add	x2, x2, #1
ret

There's some stuff in here that should not be merged. Also, the LLVM code for this currently lives at https://github.com/martin-fink/llvm-project/tree/arancini-calling-conv, we might want to move it here if this makes sense.

@martin-fink
Copy link
Collaborator Author

I cannot convert this to a draft since I'm lacking some permissions.

@martin-fink martin-fink changed the title Arancini Calling Convention [Draft] Arancini Calling Convention Feb 6, 2024
@martin-fink martin-fink force-pushed the arancini-cc branch 2 times, most recently from 923d9da to 8a8c8e5 Compare February 14, 2024 09:49
@martin-fink martin-fink changed the base branch from sr/registers to mf/static-func-map February 28, 2024 15:26
@martin-fink martin-fink force-pushed the arancini-cc branch 2 times, most recently from 5cfb628 to de6ec95 Compare March 20, 2024 08:09
@martin-fink martin-fink changed the base branch from mf/static-func-map to sr/registers+link March 26, 2024 13:36
@martin-fink martin-fink force-pushed the arancini-cc branch 3 times, most recently from a2da007 to 6c75a9c Compare March 26, 2024 15:25
@martin-fink
Copy link
Collaborator Author

For the microbenchmark of virt fn calls, this speeds those up a bit:

baseline arancini-cc re-added switch internal static lookup fn
3,041.24 ms 810.88 ms 795.16 ms 709.98 ms

@martin-fink martin-fink changed the base branch from sr/registers+link to main May 27, 2024 12:47
@martin-fink martin-fink force-pushed the arancini-cc branch 2 times, most recently from 4528a08 to 907eb6d Compare May 27, 2024 12:50
@martin-fink martin-fink force-pushed the arancini-cc branch 3 times, most recently from ee16c9f to f9f42e4 Compare July 2, 2024 10:45
We now no longer return the cpu_state, as the caller always has a
reference to the CPU state. This change requires using the latest
version of our LLVM fork, as that skips one return register for the
Arancini calling convention.

This should free up one register for certain optimizations, if the cpu
state is not accessed from one point in the function anymore.

There is an issue with the call to the internal_main_loop. If LLVM
is allowed to tail-optimize it, something is broken and the translated
binary crashes. For now, we just set the call to not tail-optimize. At
some point in the future, we should fix this.
@martin-fink
Copy link
Collaborator Author

Function signatures now look like this:

define arancini { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } @func(ptr noalias nocapture noundef nonnull align 16 dereferenceable(2232) %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i64 %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, i64 %14, i64 %15, i64 %16, i64 %17)

We pass the CPU state + PC + 16 * GPR + FS + GS and return the same except for the CPU state, as the caller already has the CPU state. To ensure that we then don't need to shuffle stuff around, we skip one return register in the calling convention.

@martin-fink martin-fink marked this pull request as draft February 19, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants