8336845: [lworld] Virtual threads don't support the value class calling convention #1399

TobiHartmann · 2025-03-14T15:07:49Z

Background

The scalarized calling convention will pass value objects by-value by passing their field values instead of a reference. Now this is only done by C2 while the interpreter and C1 still use references to heap "buffers". When calling from the interpreter or C1 to C2 compiled code, we need to "unpack" from the heap buffer. The difficulty is that unpacking of value type arguments in the nmethod entry point might require additional stack space. For example, let's assume we have a method that takes two value type arguments v1 and v2 of a type that has four int fields f1 - f4.

The register/stack layout after method entry for an un-scalarized call would look like this:

(1)

  rsi: v1
  rdx: v2
  rcx:
   r8:
   r9:
  rdi:
 0x18: ...    <- sp of caller
 0x10: return_address
 0x08: rbp
 0x00: locals

And for the scalarized call it would look like this:

(2)

  rsi: v1.f1
  rdx: v1.f2
  rcx: v1.f3
   r8: v1.f4
   r9: v2.f1
  rdi: v2.f2
 0x28: ...    <- sp of caller
 0x20: v2.f4
 0x18: v2.f3
 0x10: return_address
 0x08: rbp
 0x00: locals

In this case, the scalarized calling convention requires two additional stack slots to pass v2.f3 and v2.f4 because we don't have enough registers. Since we cannot overwrite stack slots in the caller frame, we need to extend the callee frame before unpacking.

To convert from (1) to (2), we have a [Verified Value Entry Point] that will extend the stack:

[Verified Value Entry Point]
 pop    %r13             ; save return address
 sub    $0x20,%rsp       ; extend stack (make sure stack remains aligned)
 push   %r13             ; copy return address to top of stack
 ...                     ; unpack value type arguments
 push   %rbp             ; save rbp
 sub    $0x10,%rsp       ; create frame (constant size)
 movq   $0x38,0x8(%rsp)  ; save stack increment 0x10 + size(rbp) + 0x20
 jmpq   [Entry]

This is implemented in MacroAssembler::extend_stack_for_inline_args:

valhalla/src/hotspot/cpu/x86/macroAssembler_x86.cpp

Lines 7028 to 7039 in 12e5dda

    
           int MacroAssembler::extend_stack_for_inline_args(int args_on_stack) { 
        
             // Two additional slots to account for return address 
        
             int sp_inc = (args_on_stack + 2) * VMRegImpl::stack_slot_size; 
        
             sp_inc = align_up(sp_inc, StackAlignmentInBytes); 
        
             // Save the return address, adjust the stack (make sure it is properly 
        
             // 16-byte aligned) and copy the return address to the new top of the stack. 
        
             // The stack will be repaired on return (see MacroAssembler::remove_frame). 
        
             assert(sp_inc > 0, "sanity"); 
        
             pop(r13); 
        
             subptr(rsp, sp_inc); 
        
             push(r13); 
        
             return sp_inc;

In addition, we have a [Verified Entry Point] that will be used for scalarized to scalarized calls and will not extend the stack. It still needs to set the stack increment:

[Verified Entry Point]
 push   %rbp             ; save rbp
 sub    $0x10,%rsp       ; create frame (constant size)
 movq   $0x18,0x8(%rsp)  ; save stack increment 0x10 + size(rbp)

[Entry]
 ...                     ; method body
 mov    0x10(%rsp),%rbp  ; restore rbp
 add    0x8(%rsp),%rsp   ; repair stack
 retq

This is implemented in C2_MacroAssembler::verified_entry:

valhalla/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

Lines 124 to 128 in 12e5dda

    
           if (C->needs_stack_repair()) { 
        
             // Save stack increment just below the saved rbp (also account for fixed framesize and rbp) 
        
             assert((sp_inc & (StackAlignmentInBytes-1)) == 0, "stack increment not aligned"); 
        
             movptr(Address(rsp, framesize - wordSize), sp_inc + framesize + wordSize); 
        
           }

This protocol allows for an efficient conversion between the calling conventions with minimal impact to scalarized code. The only difference is that the epilog that used to have a constant stack increment now has to repair the stack:

[...]
 add    $0x10,%rsp      ; repair stack
 pop    %rbp            ; restore rbp
 retq

This is implemented in MacroAssembler::remove_frame:

valhalla/src/hotspot/cpu/x86/macroAssembler_x86.cpp

Lines 7271 to 7274 in 12e5dda

    
           if (needs_stack_repair) { 
        
             movq(rbp, Address(rsp, initial_framesize)); 
        
             // The stack increment resides just below the saved rbp 
        
             addq(rsp, Address(rsp, initial_framesize - wordSize));

Also, stack walking in frame::safe_for_sender and frame::sender_for_compiled_frame now needs to repair the stack to get a correct sender_sp if the current frame extended the stack. This is done in frame::repair_sender_sp() by adding the sp_inc if necessary:

valhalla/src/hotspot/cpu/x86/frame_x86.cpp

Lines 702 to 714 in 12e5dda

    
           // Check for a method with scalarized inline type arguments that needs 
        
           // a stack repair and return the repaired sender stack pointer. 
        
           intptr_t* frame::repair_sender_sp(intptr_t* sender_sp, intptr_t** saved_fp_addr) const { 
        
             nmethod* nm = _cb->as_nmethod_or_null(); 
        
             if (nm != nullptr && nm->needs_stack_repair()) { 
        
               // The stack increment resides just below the saved rbp on the stack 
        
               // and does not account for the return address. 
        
               intptr_t* real_frame_size_addr = (intptr_t*) (saved_fp_addr - 1); 
        
               int real_frame_size = ((*real_frame_size_addr) + wordSize) / wordSize; 
        
               assert(real_frame_size >= _cb->frame_size() && real_frame_size <= 1000000, "invalid frame size"); 
        
               sender_sp = unextended_sp() + real_frame_size; 
        
             } 
        
             return sender_sp;

Deoptimization uses the same stack walking code and is therefore not affected.

This PR

Code for freezing and thawing virtual threads walks stack frames and therefore needs to be aware of C2 compiled frames that have a dynamic size and might need a stack repair.

Thanks,
Tobias

Progress

Change must not contain extraneous whitespace

Issue

JDK-8336845: [lworld] Virtual threads don't support the value class calling convention (Bug - P2)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1399/head:pull/1399
$ git checkout pull/1399

Update a local copy of the PR:
$ git checkout pull/1399
$ git pull https://git.openjdk.org/valhalla.git pull/1399/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1399

View PR using the GUI difftool:
$ git pr show -t 1399

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1399.diff

bridgekeeper · 2025-03-14T15:08:42Z

👋 Welcome back thartmann! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-03-14T15:09:11Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

bridgekeeper · 2025-05-09T19:31:21Z

@TobiHartmann This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper · 2025-07-05T01:01:10Z

@TobiHartmann This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

TobiHartmann added 24 commits August 29, 2024 14:17

First hacky prototype

e24b8a3

Added test

ce7a234

More fixes

677c2e0

More tests and some code cleanup

3909780

Cleanups

e589ebb

More cleanups

c05e3f4

more

069985a

All tests pass

87afebd

Re-enabled asserts

37240b1

jtreg test

0c69381

Refactoring

d4278fd

Merged with lworld

315f00e

More cleanups

80f4ed6

More fixes

fa8f570

More fixes

383c701

More fixes

b5c3992

More fixes

00cabbf

More fixes

88c9466

More fixes

bb3b87f

More fixes

26818be

More

70e1607

More tests

3bda1c3

Added garbage producer thread to test to stress GC

af869ef

Merge

f4e9861

bridgekeeper bot closed this Jul 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8336845: [lworld] Virtual threads don't support the value class calling convention #1399

8336845: [lworld] Virtual threads don't support the value class calling convention #1399

Uh oh!

TobiHartmann commented Mar 14, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Mar 14, 2025

Uh oh!

openjdk bot commented Mar 14, 2025

Uh oh!

bridgekeeper bot commented May 9, 2025

Uh oh!

bridgekeeper bot commented Jul 5, 2025

Uh oh!

Uh oh!

	int MacroAssembler::extend_stack_for_inline_args(int args_on_stack) {
	// Two additional slots to account for return address
	int sp_inc = (args_on_stack + 2) * VMRegImpl::stack_slot_size;
	sp_inc = align_up(sp_inc, StackAlignmentInBytes);
	// Save the return address, adjust the stack (make sure it is properly
	// 16-byte aligned) and copy the return address to the new top of the stack.
	// The stack will be repaired on return (see MacroAssembler::remove_frame).
	assert(sp_inc > 0, "sanity");
	pop(r13);
	subptr(rsp, sp_inc);
	push(r13);
	return sp_inc;

	if (C->needs_stack_repair()) {
	// Save stack increment just below the saved rbp (also account for fixed framesize and rbp)
	assert((sp_inc & (StackAlignmentInBytes-1)) == 0, "stack increment not aligned");
	movptr(Address(rsp, framesize - wordSize), sp_inc + framesize + wordSize);
	}

	if (needs_stack_repair) {
	movq(rbp, Address(rsp, initial_framesize));
	// The stack increment resides just below the saved rbp
	addq(rsp, Address(rsp, initial_framesize - wordSize));

	// Check for a method with scalarized inline type arguments that needs
	// a stack repair and return the repaired sender stack pointer.
	intptr_t* frame::repair_sender_sp(intptr_t* sender_sp, intptr_t** saved_fp_addr) const {
	nmethod* nm = _cb->as_nmethod_or_null();
	if (nm != nullptr && nm->needs_stack_repair()) {
	// The stack increment resides just below the saved rbp on the stack
	// and does not account for the return address.
	intptr_t* real_frame_size_addr = (intptr_t*) (saved_fp_addr - 1);
	int real_frame_size = ((*real_frame_size_addr) + wordSize) / wordSize;
	assert(real_frame_size >= _cb->frame_size() && real_frame_size <= 1000000, "invalid frame size");
	sender_sp = unextended_sp() + real_frame_size;
	}
	return sender_sp;

8336845: [lworld] Virtual threads don't support the value class calling convention #1399

8336845: [lworld] Virtual threads don't support the value class calling convention #1399

Uh oh!

Conversation

TobiHartmann commented Mar 14, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

This PR

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Mar 14, 2025

Uh oh!

openjdk bot commented Mar 14, 2025

Uh oh!

bridgekeeper bot commented May 9, 2025

Uh oh!

bridgekeeper bot commented Jul 5, 2025

Uh oh!

Uh oh!

TobiHartmann commented Mar 14, 2025 •

edited by openjdk bot

Loading