-
Notifications
You must be signed in to change notification settings - Fork 774
x86: Preserve ymm/zmm registers under option #22692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
x86: Preserve ymm/zmm registers under option #22692
Conversation
|
|
||
| ifdef({ASM_J9VM_ENV_DATA64},{ | ||
| pop r8 | ||
| forloop({REG_CTR}, 0, 31, {SAVE_ZMM_REG(REG_CTR, J9TR_cframe_jitFPRs+(REG_CTR*64))}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Linux, the System ABI specifies that all XMM, YMM, and ZMM registers are volatile.
On Windows, the x64 ABI specifies that XMM6-XMM15 are preserved. For YMM/ZMM6-15, the upper bits are volatile and the lower bits preserved. XMM/YMM/ZMM 0-5 and 16-31 are all volatile.
|
|
||
| ifdef({ASM_J9VM_ENV_DATA64},{ | ||
| pop r8 | ||
| forloop({REG_CTR}, 0, 15, {vmovdqu ymmword ptr J9TR_cframe_jitFPRs+(REG_CTR*32)[_rsp],ymm{}REG_CTR}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to zmm, which exactly registers do we need to save/restore?
| @@ -0,0 +1,126 @@ | |||
| dnl Copyright IBM Corp. and others 2023 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit was authored in 2021, so this should say 2021.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure because xvector.m4 did not exist at the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It never got committed before, so I suppose the date should be this year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin worked on this and committed it no later than in 2021. I believe that should be the first copyright year.
|
|
||
| #if JAVA_SPEC_VERSION >= 17 | ||
| /* Extended vector register preservation (ymm/zmm/opmask registers) requires a VM option */ | ||
| if ((argIndex = FIND_AND_CONSUME_VMARG(EXACT_MATCH, VMOPT_PRESERVE_VECTORS, NULL)) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant >=, not > here; using -XPreserveExtendedRegs should preserve those registers, but as written will do so only if that option is not specified.
We should have a pair of options:
#define VMOPT_PRESERVE_VECTORS "-XX:+PreserveExtendedRegs"
#define VMOPT_NO_PRESERVE_VECTORS "-XX:-PreserveExtendedRegs"so that the command line can override environment variables.
This should then be:
argIndex = FIND_AND_CONSUME_VMARG(EXACT_MATCH, VMOPT_PRESERVE_VECTORS, NULL);
argIndex2 = FIND_AND_CONSUME_VMARG(EXACT_MATCH, VMOPT_NO_PRESERVE_VECTORS, NULL);
if (argIndex2 > argIndex) {
vm->extendedRuntimeFlags3 &= ~J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_256;
vm->extendedRuntimeFlags3 &= ~J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_512;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant >=, not > here; using -XPreserveExtendedRegs should preserve those registers, but as written will do so only if that option is not specified.
So extended preservation is supposed to be disabled unless explicitly enabled by an option. This code resets those flags if no option is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still have a pair of options (one might appear in an environment variable), the other is needed to override that (on the command line).
d1b4280 to
3ced5a4
Compare
This commit restores eclipse-openj9#14632 and applies a number of changes. Preserving extended registers (ymm/zmm/opmask) registers is only done if the "-XPreserveExtendedRegs" vm option is provided. Two new VM flags are introduced. One flag marks AVX-256 register preservation, the other for AVX-512. Vector lengths 256/512 are preserved in an out-of-line sequence assigned by a function pointer if extended vector preservation is enabled. This is done to prevent speculative execution from causing CPU frequency regressions even when AVX-2 / AVX-512 code is never executed. Co-authored-by: Devin Nakamura <[email protected]> Co-authored-by: Bradley Wood <[email protected]>
3ced5a4 to
2724324
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me (once all of Keith's comments are addressed). Question: why the fence in the vector code?
| if (J9_ARE_ANY_BITS_SET(vm->extendedRuntimeFlags3, J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_512)) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting: braces:
if (J9_ARE_ANY_BITS_SET(vm->extendedRuntimeFlags3, J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_512)) {
jitConfig->saveVectorRegisters = (void *)jitSaveVectorRegistersAVX512;
jitConfig->restoreVectorRegisters = (void *)jitRestoreVectorRegistersAVX512;
} else if (J9_ARE_ANY_BITS_SET(vm->extendedRuntimeFlags3, J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_256)) {
jitConfig->saveVectorRegisters = (void *)jitSaveVectorRegistersAVX;
jitConfig->restoreVectorRegisters = (void *)jitRestoreVectorRegistersAVX;
}| @@ -0,0 +1,126 @@ | |||
| dnl Copyright IBM Corp. and others 2023 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin worked on this and committed it no later than in 2021. I believe that should be the first copyright year.
| #define VMOPT_XCONCURRENTBACKGROUND "-Xconcurrentbackground" | ||
| #define VMOPT_XGCTHREADS "-Xgcthreads" | ||
| #define VMOPT_XGCMAXTHREADS "-Xgcmaxthreads" | ||
| #define VMOPT_PRESERVE_VECTORS "-XPreserveExtendedRegs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I repeat my suggestion that we should have both positive and negative versions of this option:
#define VMOPT_PRESERVE_VECTORS "-XX:+PreserveExtendedRegs"
#define VMOPT_NO_PRESERVE_VECTORS "-XX:-PreserveExtendedRegs"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - the convention should be followed.
| LABEL(skip_vzu{}VZU_COUNT): | ||
| }) | ||
|
|
||
| dnl for(<symbol>=<start>; <symbol> <= <end>; ++<symbol>) { <expr> } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space both sides of =, please.
| define({forloop}, | ||
| {define({$1}, {$2})$4 | ||
| ifelse({$2}, {$3}, {},{$0({$1}, incr({$2}), {$3}, {$4})})}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My earlier comments have not been addressed; see #22692.
|
|
||
| #if JAVA_SPEC_VERSION >= 17 | ||
| /* Extended vector register preservation (ymm/zmm/opmask registers) requires a VM option */ | ||
| if ((argIndex = FIND_AND_CONSUME_VMARG(EXACT_MATCH, VMOPT_PRESERVE_VECTORS, NULL)) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still have a pair of options (one might appear in an environment variable), the other is needed to override that (on the command line).
| vm->extendedRuntimeFlags3 &= ~J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_256; | ||
| vm->extendedRuntimeFlags3 &= ~J9_EXTENDED_RUNTIME3_USE_VECTOR_LENGTH_512; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment:
#endif /* JAVA_SPEC_VERSION >= 17 */
This commit restores #14632 and applies a number of changes. Preserving extended registers (ymm/zmm/opmask) registers is only done if the "-XPreserveExtendedRegs" vm option is provided. Two new VM flags are introduced. One flag marks AVX-256 register preservation, the other for AVX-512. Vector lengths 256/512 are preserved in an out-of-line sequence assigned by a function pointer if extended vector preservation is enabled. This is done to prevent speculative execution from causing CPU frequency regressions even when AVX-2 / AVX-512 code is never executed.