Skip to content

Conversation

archiecobbs
Copy link
Contributor

@archiecobbs archiecobbs commented Oct 6, 2025

The refactoring in JDK-8348611 caused a regression in compiler performance. In the the refactoring there were a couple of simple optimizations that were missed. When put in place, these seem to (mostly) address the performance issue.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8369039: JDK-8348611 caused regression in Javac-Hot-Generate (Bug - P2)

Reviewers

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27651/head:pull/27651
$ git checkout pull/27651

Update a local copy of the PR:
$ git checkout pull/27651
$ git pull https://git.openjdk.org/jdk.git pull/27651/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27651

View PR using the GUI difftool:
$ git pr show -t 27651

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27651.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 6, 2025

👋 Welcome back acobbs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 6, 2025

@archiecobbs This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8369039: JDK-8348611 caused regression in Javac-Hot-Generate

Co-authored-by: Claes Redestad <[email protected]>
Reviewed-by: jlahoda, redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 139 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Oct 6, 2025

@archiecobbs The following label will be automatically applied to this pull request:

  • compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 6, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 6, 2025

Webrevs

@cl4es
Copy link
Member

cl4es commented Oct 7, 2025

I haven't been able to run this through our wider performance lab yet due to some issues on my end, but in local testing I'm not seeing that much of an improvement in my diagnostic tests (the tests are a bit noisy).

Running the benchmark with -prof gc shows that the regression correlates with a 17% increase in allocation pressure. This PR only reduces allocation pressure by about 1% from the regressed state:

26-b11: 15.06 GB/op
26-b12: 17.61 GB/op
pr/27651: 17.46 GB/op

Looking at hot methods I see one of the lambdas in LintMapper$FileInfo (allocated in the constructor) being relatively hot and I think the abundant use of capturing lambdas in JDK-8348611 might be cause for some unexpected allocation overheads.

For example desugaring the stream in the FileInfo constructor (which was the only thing I saw on jfr view hot-methods that seemed related to JDK-8348611):

            tree.defs.stream()
              .filter(this::isTopLevelDecl)
              .map(decl -> new Span(decl, tree.endPositions)) 
              .forEach(unmappedDecls::add);

to this:

            for (JCTree decl : tree.defs) {
                if (isTopLevelDecl(decl)) {
                    unmappedDecls.add(new Span(decl, tree.endPositions));
                }
            }

.. reduces allocation to 16.68GB/op (reducing the relative increase in allocations by about 1/3 compared to pr/27651)

@archiecobbs
Copy link
Contributor Author

Hi @cl4es,

This PR only reduces allocation pressure by about 1% from the regressed state:

Now you're moving the goalposts :) I was looking at the performance benchmark, not allocation pressure...

FWIW this is what I saw on my laptop:

JDK 25

Benchmark                        (stopStage)  Mode  Cnt   Score   Error  Units
SingleJavacBenchmark.compileHot     Generate    ss   10  14.566 ± 0.868   s/op

JDK 26

Benchmark                        (stopStage)  Mode  Cnt   Score   Error  Units
SingleJavacBenchmark.compileHot     Generate    ss   10  16.403 ± 0.214   s/op

JDK 26 + this patch

Benchmark                        (stopStage)  Mode  Cnt   Score   Error  Units
SingleJavacBenchmark.compileHot     Generate    ss   10  14.807 ± 0.381   s/op

So that patch did seem to help resolve most of the benchmark difference. Are you not seeing the same thing?

But also that patch differs from the one currently committed, in that it includes the ArrayList to LinkedList changes that were later taken out. Without those changes, things get slower again:

Benchmark                        (stopStage)  Mode  Cnt   Score   Error  Units
SingleJavacBenchmark.compileHot     Generate    ss   10  15.832 ± 0.135   s/op

So surprisingly, they have an effect. So I've put them back into the PR.

For example desugaring the stream in the FileInfo constructor ... reduces allocation

<side-note>

Argh, this is frustrating. Are we supposed to use the Stream API to help improve code clarity, or avoid it to prevent slowdowns? I've seen it argued both ways...

</side-note>

Anyway, unstreaming the loop you suggested plus another one, plus the other stuff, yeilds this on my laptop:

Benchmark                        (stopStage)  Mode  Cnt   Score   Error  Units
SingleJavacBenchmark.compileHot     Generate    ss   10  14.932 ± 0.370   s/op

That is with this patch. Let me know what you see on the allocation pressure front... thanks.

@cl4es
Copy link
Member

cl4es commented Oct 7, 2025

FYI I'm not a core maintainer/reviewer of javac but specialize on isolating and figuring out causes of performance regressions; you're free to keep the streams in in favor of perceived code clarity if you make the case that impact on performance is negligible.

Transient allocation pressure can be a non-issue on one system but cause a lot of GC pauses and slowdown on another, so it's one of those things I like to keep tabs on when diagnosing a regression. Ideally we shouldn't increase allocation pressure too much. On one system I was looking at the Score was way worse after 26-b12 and didn't improve that much with this PR. On your system it seems the allocation pressure isn't a large contributor to Score. YMMV.

I'll look at the numbers on your latest version.

(FWIW I ran an experiment baselined on an earlier state of this PR and posted it here archiecobbs#1 -- that showed that desugaring all the streams in LintMapper got allocation pressure down below 26-b11 levels)

@archiecobbs
Copy link
Contributor Author

I ran an experiment baselined on an earlier state of this PR

Thanks, I'll include that as well. See c7d5d30.

/contributor add cl4es

@openjdk
Copy link

openjdk bot commented Oct 7, 2025

@archiecobbs cl4es was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <[email protected]>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

@archiecobbs
Copy link
Contributor Author

/contributor add @cl4es

@openjdk
Copy link

openjdk bot commented Oct 7, 2025

@archiecobbs
Contributor Claes Redestad <[email protected]> successfully added.

@lahodaj
Copy link
Contributor

lahodaj commented Oct 7, 2025

I did a quick pass through the code, and looks sensible so far. Assuming it helps with resolving the problem. I'll go through the change in more detail tomorrow.

@cl4es
Copy link
Member

cl4es commented Oct 8, 2025

Latest look good on both score and allocation pressure, e.g., on my M1 MacBook:

Name       Cnt   Base   Error    Test   Error Unit  Change
compileHot  10 15.915 ± 0.094  14.463 ± 0.235 s/op   1.10x (p = 0.000*)
  * = significant

Allocations: 16.9GB/op -> 13.95GB/op

26-b11 for reference on the same system: 14,285 ± 0,379 s/op
Allocations 14.2GB/op

So more or less recuperated on score and with slightly less allocation pressure to boot 👍


final LintRange rootRange; // the root LintRange (covering the entire source file)
final List<Span> unmappedDecls = new ArrayList<>(); // unmapped top-level declarations awaiting attribution
final List<Span> unmappedDecls = new LinkedList<>(); // unmapped top-level declarations awaiting attribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkedList has a ton of nodes, which translate to extra object headers, forward/backward pointers, and worse cache locality. To add N items, it requires O(N) allocations. In contrast, ArrayList requires O(log(N)) allocations (resizing) and is almost always better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@archiecobbs measured a win. I haven't seen it in profiles but it could be that the code in afterAttr contribute to the case for LinkedList as it is set up to remove matching item from the list, likely at an early index. Which is one of the few things a LinkedList actually does better than an ArrayList.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree ArrayLists are often better, @archiecobbs measured the LinkedLists perform better here, as @cl4es says.

I am not quite clear why linked lists are faster, but it might be many of the lists are either empty (all the children of the leaf LintRange instances will be empty lists, I think, and an empty LinkedList is, I think, cheaper than an empty ArrayList), and some of them will have only a few entries (like the unmappedDecls list here: AFAIK, this has one entry for each top-level declaration, and hence is highly unlikely to have more than 2 entries - one for the package clause, and one for the top-level class). If ArrayLists with substantial number of entries are only a small minority, it might explain why the use of LinkedLists leads to better results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right way to fix is not to use LinkedList, but to update afterAttr to use List.removeIf - this was added back in 8 to avoid the overhead of shifting from multiple removals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look my last analysis was wrong. This is removing one element which is probably better done as an operation on a TreeSet, which is okay. However, I don't find why we would use a LinkedList for LintRange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this has one entry for each top-level declaration, and hence is highly unlikely to have more than 2 entries - one for the package clause, and one for the top-level class

Actually it will most likely only have one entry - package declarations are not included because they can never contain @SuppressWarnings annotations. This may explain why LinkedList is faster on average.

I don't find why we would use a LinkedList for LintRange.

The same answer may apply here, but honestly I'm just guessing at this. I was also surprised that LinkedList was faster than ArrayList, but the numbers seem to be saying that.

In fact, as you point out, unmappedDecls could be a Set instead of a List. But whether that would actually help is not clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it will most likely only have one entry - package declarations are not included because they can never contain @SuppressWarnings annotations. This may explain why LinkedList is faster on average.

Argh, ignore that, I was looking at something else. So "2" is the right number on average.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, ignore that, I was looking at something else. So "2" is the right number on average.

But there's no reason for that! 99.99% of package declarations have no annotations and therefore do not need to wait for attribution. So we can get this number down to "1".

I was curious if that would make any difference (using this additional patch). When I try it on my laptop I get a slight improvement:

Benchmark                        (stopStage)  Mode  Cnt   Score   Error  Units
SingleJavacBenchmark.compileHot     Generate    ss   10  14.841 ± 0.618   s/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious if that would make any difference (using this additional patch). When I try it on my laptop I get a slight improvement:

Can't reallt establish any statistically significant difference either way here, neither on score or allocations.

So just to understand this a bit better I instrumented to see how common that condition is and it seems that when tree.getTag() == Tree.PACKAGEDEF then tree.annotations is never empty. At least in this benchmark. Perhaps the case when there are no annotations are skipped earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to understand this a bit better I instrumented to see how common that condition is and it seems that when tree.getTag() == Tree.PACKAGEDEF then tree.annotations is never empty. At least in this benchmark. Perhaps the case when there are no annotations are skipped earlier?

Really? I'm not seeing that. When I apply this patch, I see size=0 unless there's actually an annotation there:

--- a/src/jdk.compiler/share/classes/com/sun/tools/javac/code/LintMapper.java
+++ b/src/jdk.compiler/share/classes/com/sun/tools/javac/code/LintMapper.java
@@ -179,6 +179,7 @@ private static class FileInfo {
         FileInfo(Lint rootLint, JCCompilationUnit tree) {
             rootRange = new LintRange(rootLint);
             for (JCTree decl : tree.defs) {
+if (decl instanceof JCPackageDecl p) System.out.println("package "+p.pid+": annotations.size="+p.annotations.size());
                 if (isTopLevelDecl(decl))
                     unmappedDecls.add(new Span(decl, tree.endPositions));
             }
         }

Copy link
Contributor

@lahodaj lahodaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. (I didn't run tests, though, so far.) Thanks!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 8, 2025
@archiecobbs
Copy link
Contributor Author

@lahodaj thanks for the review!

Since the current version seems to address the performance issue as reported in the bug, I'll plan to integrate later this evening unless there are objections.

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to resolve most or all of the regression - thanks!

@archiecobbs
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Oct 9, 2025

Going to push as commit 5873c4b.
Since your change was applied there have been 139 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 9, 2025
@openjdk openjdk bot closed this Oct 9, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 9, 2025
@openjdk
Copy link

openjdk bot commented Oct 9, 2025

@archiecobbs Pushed as commit 5873c4b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler [email protected] integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants