Skip to content

8357551: RISC-V: support CMoveF/D #25341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from
Draft

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented May 21, 2025

This patch enable the vectorization of statement like fd_1 bop fd_2 ? res_1 : res_2 in a loop.

The current behaviour on other platforms support vecatorization of fd_1 bop fd_2 ? res_1 : res_2 in a loop only when fd and res have the same size, but this constraint seems not necessary at least not necessary on riscv, so I relax this constraint on riscv, maybe on other platforms it can be relaxed too, but currently I only made it work on riscv.
Besides of this, I also relax the constraint on transforming Op_CMoveI/L to Op_VectorBlend on riscv, this bring some extra benefit when the res is not float or double types.
Both relaxation bring performance benefit via vectorization.

Compared with other runs (master, master with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally turned on, patch without flags turned on), average improvement introduced by the patch with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally turned on is more than 2.1 times, in some cases it can bring more than 4 times improvement.
When -XX:-UseVectorCmov -XX:-UseCMoveUnconditionally turned off, there is no regression on average.

Test

Performance

Test: o.o.b.j.l.FPComparison

Column names meanings:

  • p: with patch
  • p+v: with patch, -XX:-UseVectorCmov -XX:-UseCMoveUnconditionally turned on
  • m: without patch
  • m+v: without patch, -XX:-UseVectorCmov -XX:-UseCMoveUnconditionally turned on

Average improvement

data

Opt (m/p) Opt (m+v/p+v) Opt (p/p+v) Opt (m/p+v)
1.022782609 2.198717391 2.162673913 2.199

Improvement

data

Benchmark p m p+v m+v Opt (m/p) Opt (m+v/p+v) Opt (p/p+v) Opt (m/p+v)
equalDouble 7256.157 7183.714 3377.111 7459.347 0.99 2.209 2.149 2.127
equalDoubleResDouble 7877.737 8646.54 6077.6 8691.099 1.098 1.43 1.296 1.423
equalDoubleResFloat 7181.564 8194.786 3409.252 8123.738 1.141 2.383 2.106 2.404
equalDoubleResLong 7806.422 8010.545 3335.97 7922.735 1.026 2.375 2.34 2.401
equalFloat 6802.995 6901.461 1789.033 7012.751 1.014 3.92 3.803 3.858
equalFloatResDouble 8371.707 8265.009 3431.889 8275.083 0.987 2.411 2.439 2.408
equalFloatResFloat 7148.96 8156.945 3233.043 8098.961 1.141 2.505 2.211 2.523
equalFloatResLong 7853.929 8003.017 3401.985 8097.994 1.019 2.38 2.309 2.352
greaterDouble 6941.015 6894.978 3416.193 6934.395 0.993 2.03 2.032 2.018
greaterDoubleResDouble 7882.554 7821.291 6124.731 7812.596 0.992 1.276 1.287 1.277
greaterDoubleResFloat 7358.43 7375.28 3411.382 7355.785 1.002 2.156 2.157 2.162
greaterDoubleResLong 7225.83 7165.23 3331.277 7373.934 0.992 2.214 2.169 2.151
greaterEqualDouble 6767.552 6737.533 3414.404 6720.414 0.996 1.968 1.982 1.973
greaterEqualDoubleResDouble 7255.272 8050.17 6074.58 8014.26 1.11 1.319 1.194 1.325
greaterEqualDoubleResFloat 6810.635 7588.857 3412.366 7724.462 1.114 2.264 1.996 2.224
greaterEqualDoubleResLong 7356.979 7273.975 3405.726 7202.324 0.989 2.115 2.16 2.136
greaterEqualFloat 6301.524 6250.825 1725.419 6190.227 0.992 3.588 3.652 3.623
greaterEqualFloatResDouble 7770.324 7619.463 3515.615 7652.038 0.981 2.177 2.21 2.167
greaterEqualFloatResFloat 6539.097 7433.364 3237.981 7459.479 1.137 2.304 2.019 2.296
greaterEqualFloatResLong 7282.165 7285.625 3408.542 7272.183 1 2.134 2.136 2.137
greaterFloat 6741.444 6775.978 1777.942 6609.607 1.005 3.718 3.792 3.811
greaterFloatResDouble 7376.615 7386.81 3451.468 7413.341 1.001 2.148 2.137 2.14
greaterFloatResFloat 7260.812 7227.177 3233.878 7194.408 0.995 2.225 2.245 2.235
greaterFloatResLong 7156.073 7218.269 3483.248 7395.894 1.009 2.123 2.054 2.072
isFiniteDouble 8383.339 8486.119 8520.461 8805.231 1.012 1.033 0.984 0.996
isFiniteFloat 8327.357 8469.08 8438.468 8458.09 1.017 1.002 0.987 1.004
isInfiniteDouble 8731.787 8403.307 8797.517 8559.53 0.962 0.973 0.993 0.955
isInfiniteFloat 8402.357 8311.963 8408.47 8445.983 0.989 1.004 0.999 0.989
isNanDouble 5603.906 6339.909 2708.193 5619.242 1.131 2.075 2.069 2.341
isNanFloat 6149.923 5421.851 1412.968 5415.815 0.882 3.833 4.352 3.837
lessDouble 6879.061 6891.171 3380.181 6881.82 1.002 2.036 2.035 2.039
lessDoubleResDouble 7809.712 7799.506 6116.715 7802.105 0.999 1.276 1.277 1.275
lessDoubleResFloat 7350.426 7379.593 3371.683 7349.37 1.004 2.18 2.18 2.189
lessDoubleResLong 7220.939 7160.987 3395.771 7572.061 0.992 2.23 2.126 2.109
lessEqualDouble 6782.899 6728.732 3431.742 6755.882 0.992 1.969 1.977 1.961
lessEqualDoubleResDouble 7147.814 8055.307 6075.177 7989.099 1.127 1.315 1.177 1.326
lessEqualDoubleResFloat 6915.612 7589.454 3412.457 7671.782 1.097 2.248 2.027 2.224
lessEqualDoubleResLong 7266.967 7214.049 3391.35 7222.03 0.993 2.13 2.143 2.127
lessEqualFloat 6240.432 6291.458 1768.777 6216.421 1.008 3.515 3.528 3.557
lessEqualFloatResDouble 7706.662 7725.626 3498.608 7677.536 1.002 2.194 2.203 2.208
lessEqualFloatResFloat 6592.504 7497.226 3214.976 7420.118 1.137 2.308 2.051 2.332
lessEqualFloatResLong 7256.94 7218.381 3393.99 7228.696 0.995 2.13 2.138 2.127
lessFloat 6766.048 6725.079 1733.222 6621.539 0.994 3.82 3.904 3.88
lessFloatResDouble 7397.894 7400.036 3402.64 7363.842 1 2.164 2.174 2.175
lessFloatResFloat 7242.137 7191.374 3240.398 7259.417 0.993 2.24 2.235 2.219
lessFloatResLong 7202.009 7172.072 3514.138 7357.007 0.996 2.094 2.049 2.041

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Warnings

 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/BorderLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/FlowLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridBagLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridBagLayout-2.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridLayout-2.gif)

Issues

  • JDK-8357551: RISC-V: support CMoveF/D (Enhancement - P4)
  • JDK-8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25341/head:pull/25341
$ git checkout pull/25341

Update a local copy of the PR:
$ git checkout pull/25341
$ git pull https://git.openjdk.org/jdk.git pull/25341/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25341

View PR using the GUI difftool:
$ git pr show -t 25341

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25341.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 21, 2025

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 21, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented May 21, 2025

@Hamlin-Li The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@Hamlin-Li Hamlin-Li changed the title Cmove fd 8357551: RISC-V: support CMoveF/D May 22, 2025
@Hamlin-Li
Copy link
Author

/solves JDK-8357554

@openjdk
Copy link

openjdk bot commented May 22, 2025

@Hamlin-Li
Adding additional issue to solves list: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv).

@Hamlin-Li
Copy link
Author

Hi @eme64 , do you mind help to have a look at the patch? Thanks!
Need to change some shared code in superword and vectornode to make it work (tracked by JDK-8357554, but addressed in this pr, I can also seperate it from this pr if it's better for you) and noticed that you have an umbrella bug (https://bugs.openjdk.org/browse/JDK-8317424) tracking the related changes.

@eme64
Copy link
Contributor

eme64 commented May 22, 2025

@Hamlin-Li The table in the PR description is a little hard to read, can you find a way to improve the formatting?
image

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hamlin-Li That looks like exciting work!

I hope to come back to CMove soon myself, there are a few things to improve there!

I left some initial comments below.

Generally, splitting is nice, especially if the patch is so large.
But we probably also don't want to just add backend instructions that are not yet used. Can the VectorAPI use those CMove instructions you are about to add?

Comment on lines 2356 to 2361
return type2aelembytes(use_bt) == type2aelembytes(def_bt);
return (type2aelembytes(use_bt) == type2aelembytes(def_bt)) ||
(support_vectorize_cmovefd_bool_unconditionally() && use->is_CMove() && def->is_Bool());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get us a merge conflict with #23413.

Also: our general approach is to ask for VectorNode::implemented and alike. Could we have some sort of check like that?

See what @jaskarth is doing in #23413:
image

Is that at all an option?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it also works. I'll change it.

Comment on lines 90 to 93
case Op_CMoveI:
return ((SuperWord::support_vectorize_cmovefd_bool_unconditionally() && bt == T_INT) ? Op_VectorBlend : 0);
case Op_CMoveL:
return ((SuperWord::support_vectorize_cmovefd_bool_unconditionally() && bt == T_LONG) ? Op_VectorBlend : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always return Op_VectorBlend? And then check elsewhere if that is actually implemented for the expected types?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check, sounds better if we can do so, as I don't like the current way either. : )

@Hamlin-Li
Copy link
Author

Hamlin-Li commented May 22, 2025

@Hamlin-Li The table in the PR description is a little hard to read, can you find a way to improve the formatting?

Let me check, it's scrollable in preview mode.

Edit: I modified the data a bit, so it looks better now.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented May 22, 2025

@Hamlin-Li That looks like exciting work!

I hope to come back to CMove soon myself, there are a few things to improve there!

Great!

I left some initial comments below.

Generally, splitting is nice, especially if the patch is so large.

I can do it.

But we probably also don't want to just add backend instructions that are not yet used.

These instructs can be used by a normal a op b : r1 ? r2 statement, and TestVectorConditionalMove.java can be used to test them before the loop is vectorized, it also means on a cpu without vector instructions, it can use these CMoveF/D instructs, e.g. on a riscv machine without rvv support.

Edit: I might have missed the unsigned version and maybe P/N too. I'll check it later.

Can the VectorAPI use those CMove instructions you are about to add?

For this part, I'm not sure, but I'll have a look later.

@Hamlin-Li
Copy link
Author

@eme64 I've splited the share code changes to #25336.

@Hamlin-Li
Copy link
Author

Edit: I might have missed the unsigned version and maybe P/N too. I'll check it later.

I think the unsigned ones are also used by something like Integer.compareUnsigned or Long.compareUnsigned.

@openjdk
Copy link

openjdk bot commented Jun 3, 2025

@Hamlin-Li this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout cmove-fd
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs [email protected] hotspot [email protected] merge-conflict Pull request has merge conflict with target branch
Development

Successfully merging this pull request may close these issues.

2 participants