Skip to content

8336845: [lworld] Virtual threads don't support the value class calling convention #1399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 24 commits into
base: lworld
Choose a base branch
from

Conversation

TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Mar 14, 2025

Background

The scalarized calling convention will pass value objects by-value by passing their field values instead of a reference. Now this is only done by C2 while the interpreter and C1 still use references to heap "buffers". When calling from the interpreter or C1 to C2 compiled code, we nee to "unpack" from the heap buffer. The difficulty is that unpacking of value type arguments in the nmethod entry point might require additional stack space. For example, let's assume we have a method that takes two value type arguments v1 and v2 of a type that has four int fields f1 - f4.

The register/stack layout after method entry for an un-scalarized call would look like this:

(1)

  rsi: v1
  rdx: v2
  rcx:
   r8:
   r9:
  rdi:
 0x18: ...    <- sp of caller
 0x10: return_address
 0x08: rbp
 0x00: locals

And for the scalarized call it would look like this:

(2)

  rsi: v1.f1
  rdx: v1.f2
  rcx: v1.f3
   r8: v1.f4
   r9: v2.f1
  rdi: v2.f2
 0x28: ...    <- sp of caller
 0x20: v2.f4
 0x18: v2.f3
 0x10: return_address
 0x08: rbp
 0x00: locals

In this case, the scalarized calling convention requires two additional stack slots to pass v2.f3 and v2.f4 because we don't have enough registers. Since we cannot overwrite stack slots in the caller frame, we need to extend the callee frame before unpacking.

To convert from (1) to (2), we have a [Verified Value Entry Point] that will extend the stack:

[Verified Value Entry Point]
 pop    %r13             ; save return address
 sub    $0x20,%rsp       ; extend stack (make sure stack remains aligned)
 push   %r13             ; copy return address to top of stack
 ...                     ; unpack value type arguments
 push   %rbp             ; save rbp
 sub    $0x10,%rsp       ; create frame (constant size)
 movq   $0x38,0x8(%rsp)  ; save stack increment 0x10 + size(rbp) + 0x20
 jmpq   [Entry]

This is implemented in MacroAssembler::extend_stack_for_inline_args:

int MacroAssembler::extend_stack_for_inline_args(int args_on_stack) {
// Two additional slots to account for return address
int sp_inc = (args_on_stack + 2) * VMRegImpl::stack_slot_size;
sp_inc = align_up(sp_inc, StackAlignmentInBytes);
// Save the return address, adjust the stack (make sure it is properly
// 16-byte aligned) and copy the return address to the new top of the stack.
// The stack will be repaired on return (see MacroAssembler::remove_frame).
assert(sp_inc > 0, "sanity");
pop(r13);
subptr(rsp, sp_inc);
push(r13);
return sp_inc;

In addition, we have a [Verified Entry Point] that will be used for scalarized to scalarized calls and will not extend the stack. It still needs to set the stack increment:

[Verified Entry Point]
 push   %rbp             ; save rbp
 sub    $0x10,%rsp       ; create frame (constant size)
 movq   $0x18,0x8(%rsp)  ; save stack increment 0x10 + size(rbp)

[Entry]
 ...                     ; method body
 mov    0x10(%rsp),%rbp  ; restore rbp
 add    0x8(%rsp),%rsp   ; repair stack
 retq

This is implemented in C2_MacroAssembler::verified_entry:

if (C->needs_stack_repair()) {
// Save stack increment just below the saved rbp (also account for fixed framesize and rbp)
assert((sp_inc & (StackAlignmentInBytes-1)) == 0, "stack increment not aligned");
movptr(Address(rsp, framesize - wordSize), sp_inc + framesize + wordSize);
}

This protocol allows for an efficient conversion between the calling conventions with minimal impact to scalarized code. The only difference is that the epilog that used to have a constant stack increment now has to repair the stack:

[...]
 add    $0x10,%rsp      ; repair stack
 pop    %rbp            ; restore rbp
 retq

This is implemented in MacroAssembler::remove_frame:

if (needs_stack_repair) {
movq(rbp, Address(rsp, initial_framesize));
// The stack increment resides just below the saved rbp
addq(rsp, Address(rsp, initial_framesize - wordSize));

Also, stack walking in frame::safe_for_sender and frame::sender_for_compiled_frame now needs to repair the stack to get a correct sender_sp if the current frame extended the stack. This is done in frame::repair_sender_sp() by adding the sp_inc if necessary:

// Check for a method with scalarized inline type arguments that needs
// a stack repair and return the repaired sender stack pointer.
intptr_t* frame::repair_sender_sp(intptr_t* sender_sp, intptr_t** saved_fp_addr) const {
nmethod* nm = _cb->as_nmethod_or_null();
if (nm != nullptr && nm->needs_stack_repair()) {
// The stack increment resides just below the saved rbp on the stack
// and does not account for the return address.
intptr_t* real_frame_size_addr = (intptr_t*) (saved_fp_addr - 1);
int real_frame_size = ((*real_frame_size_addr) + wordSize) / wordSize;
assert(real_frame_size >= _cb->frame_size() && real_frame_size <= 1000000, "invalid frame size");
sender_sp = unextended_sp() + real_frame_size;
}
return sender_sp;

Deoptimization uses the same stack walking code and is therefore not affected.

This PR

Code for freezing and thawing virtual threads walks stack frames and therefore needs to be aware of C2 compiled frames that have a dynamic size and might need a stack repair.

Thanks,
Tobias


Progress

  • Change must not contain extraneous whitespace

Issue

  • JDK-8336845: [lworld] Virtual threads don't support the value class calling convention (Bug - P2)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1399/head:pull/1399
$ git checkout pull/1399

Update a local copy of the PR:
$ git checkout pull/1399
$ git pull https://git.openjdk.org/valhalla.git pull/1399/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1399

View PR using the GUI difftool:
$ git pr show -t 1399

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1399.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 14, 2025

👋 Welcome back thartmann! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 14, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant