Skip to content

Conversation

@theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Nov 27, 2025

Please use this link to view the files changed.

Profile counters scale very badly.

The overhead for profiled code isn't too bad with one thread, but as the thread count increases, things go wrong very quickly.

For example, here's a benchmark from the OpenJDK test suite, run at TieredLevel 3 with one thread, then three threads:

Benchmark (randomized) Mode Cnt Score Error Units
InterfaceCalls.test2ndInt5Types false avgt 4 27.468 ± 2.631 ns/op
InterfaceCalls.test2ndInt5Types false avgt 4 240.010 ± 6.329 ns/op

This slowdown is caused by high memory contention on the profile counters. Not only is this slow, but it can also lose profile counts.

This patch is for C1 only. It'd be easy to randomize C1 counters as well in another PR, if anyone thinks it's worth doing.

One other thing to note is that randomized profile counters degrade very badly with small decimation ratios. For example, using a ratio of 2 with -XX:ProfileCaptureRatio=2 with a single thread results in

Benchmark                        (randomized)  Mode  Cnt   Score   Error  Units
InterfaceCalls.test2ndInt5Types         false  avgt    4  80.147 ± 9.991  ns/op

The problem is that the branch prediction rate drops away very badly, leading to many mispredictions. It only really makes sense to use higher decimation ratios, e.g. 64.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8372701: Randomized profile counters (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28541/head:pull/28541
$ git checkout pull/28541

Update a local copy of the PR:
$ git checkout pull/28541
$ git pull https://git.openjdk.org/jdk.git pull/28541/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28541

View PR using the GUI difftool:
$ git pr show -t 28541

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28541.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 27, 2025

👋 Welcome back aph! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 27, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Nov 27, 2025

@theRealAph this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8134940
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk
Copy link

openjdk bot commented Nov 27, 2025

⚠️ @theRealAph This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch hotspot [email protected] labels Nov 27, 2025
@openjdk
Copy link

openjdk bot commented Nov 27, 2025

@theRealAph The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 27, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 27, 2025

Webrevs

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Nov 27, 2025
@shipilev
Copy link
Member

shipilev commented Nov 27, 2025

Impressive work.

Clashes a bit with #25305, which commons the type profile check and makes it more robust. It would be trivial to resolve, as that PR has only one place where counter is updated. Also gives you some additional budget to spare for more instructions in profiled code. So it would be nice if that PR (and probably its AArch64 version) lands first.

@theRealAph
Copy link
Contributor Author

Impressive work.

Clashes a bit with #25305, which commons the type profile check and makes it more robust. It would be trivial to resolve, as that PR has only one place where counter is updated. Also gives you some additional budget to spare for more instructions in profiled code. So it would be nice if that PR (and probably its AArch64 version) lands first.

Thanks.

Sure, it can wait for that PR.

@theRealAph
Copy link
Contributor Author

The inlined profile update code is moved to a stub, then in its place we put:

  ubfx x8, rng, #26, #6  // extract the top 6 bits of the random-number generator
  cbz x8, update         // if they are zero, jump to the stub that updates the profile counter
  next_random rng        // generate the next random number

At the moment, several C2 IR tests fail with randomized profile counters because they are acutely sensitive to small changes in profile counts. I think this can probably be fixed.

Also, I believe there are some kinds of event that should never be missed, even when subsampling profile counters in this way. I'd like people to advise me which events these are.

@theRealAph
Copy link
Contributor Author

I have only made the back-end changes to AArch64 and x86. The back-end changes are simple to make for other architectures, and will need to be done if this PR is to be merged into mainline.

@shipilev
Copy link
Member

shipilev commented Nov 27, 2025

Also, I believe there are some kinds of event that should never be missed, even when subsampling profile counters in this way. I'd like people to advise me which events these are

One other thing that comes into mind: the initial swing from 0 -> 1 for a type counter is important, since 0 means "never seen the type at all", and >0 means "maybe the type is present, however rare". I would suspect subsampling a small count to 0 would cause performance anomalies. Especially if, say, this anomaly causes a deopt - reprofile - compile cycle. It would doubly hurt, if reprofile would miss the type again. Probably hard to do with RNG, but maybe we should be doing the initial counter seed on installation without consulting RNG. I don't think current patch does it, but maybe I am looking at the wrong place. Would be fairly trivial to do after #25305.

@theRealAph
Copy link
Contributor Author

Also, I believe there are some kinds of event that should never be missed, even when subsampling profile counters in this way. I'd like people to advise me which events these are

One other thing that comes into mind: the initial swing from 0 -> 1 for a type counter is important, since 0 means "never seen the type at all", and >0 means "maybe the type is present, however rare". I would suspect subsampling a small count to 0 would cause performance anomalies. Especially if, say, this anomaly causes a deopt - reprofile - compile cycle. It would doubly hurt, if reprofile would miss the type again. Probably hard to do with RNG, but maybe we should be doing the initial counter seed on installation without consulting RNG. I don't think current patch does it, but maybe I am looking at the wrong place. Would be fairly trivial to do after #25305.

OK, all useful thoughts. I'll have a look.

@cl4es
Copy link
Member

cl4es commented Nov 27, 2025

Happy to see a serious contender for a resolution to this long-standing issue. While it's a bit unclear how problematic it is in practice we see issues related to this in thread-heavy benchmarks (such as SPECjvm2008) regularly.

It'd be easy to randomize C1 counters as well in another PR, if anyone thinks it's worth doing.

I assume you mean interpreter counters?

@theRealAph
Copy link
Contributor Author

Happy to see a serious contender for a resolution to this long-standing issue. While it's a bit unclear how problematic it is in practice we see issues related to this in thread-heavy benchmarks (such as SPECjvm2008) regularly.

It'd be easy to randomize C1 counters as well in another PR, if anyone thinks it's worth doing.

I assume you mean interpreter counters?

Oops. yes, of course, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants