-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8369039: JDK-8348611 caused regression in Javac-Hot-Generate #27651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back acobbs! A progress list of the required criteria for merging this PR into |
@archiecobbs This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 139 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@archiecobbs The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
I haven't been able to run this through our wider performance lab yet due to some issues on my end, but in local testing I'm not seeing that much of an improvement in my diagnostic tests (the tests are a bit noisy). Running the benchmark with
Looking at hot methods I see one of the lambdas in For example desugaring the stream in the
to this:
.. reduces allocation to 16.68GB/op (reducing the relative increase in allocations by about 1/3 compared to pr/27651) |
Hi @cl4es,
Now you're moving the goalposts :) I was looking at the performance benchmark, not allocation pressure... FWIW this is what I saw on my laptop: JDK 25
JDK 26
JDK 26 + this patch
So that patch did seem to help resolve most of the benchmark difference. Are you not seeing the same thing? But also that patch differs from the one currently committed, in that it includes the
So surprisingly, they have an effect. So I've put them back into the PR.
Argh, this is frustrating. Are we supposed to use the
Anyway, unstreaming the loop you suggested plus another one, plus the other stuff, yeilds this on my laptop:
That is with this patch. Let me know what you see on the allocation pressure front... thanks. |
FYI I'm not a core maintainer/reviewer of javac but specialize on isolating and figuring out causes of performance regressions; you're free to keep the streams in in favor of perceived code clarity if you make the case that impact on performance is negligible. Transient allocation pressure can be a non-issue on one system but cause a lot of GC pauses and slowdown on another, so it's one of those things I like to keep tabs on when diagnosing a regression. Ideally we shouldn't increase allocation pressure too much. On one system I was looking at the Score was way worse after 26-b12 and didn't improve that much with this PR. On your system it seems the allocation pressure isn't a large contributor to Score. YMMV. I'll look at the numbers on your latest version. (FWIW I ran an experiment baselined on an earlier state of this PR and posted it here archiecobbs#1 -- that showed that desugaring all the streams in LintMapper got allocation pressure down below 26-b11 levels) |
Thanks, I'll include that as well. See c7d5d30. /contributor add cl4es |
@archiecobbs Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
/contributor add @cl4es |
@archiecobbs |
I did a quick pass through the code, and looks sensible so far. Assuming it helps with resolving the problem. I'll go through the change in more detail tomorrow. |
Latest look good on both score and allocation pressure, e.g., on my M1 MacBook:
Allocations: 16.9GB/op -> 13.95GB/op 26-b11 for reference on the same system: 14,285 ± 0,379 s/op So more or less recuperated on score and with slightly less allocation pressure to boot 👍 |
|
||
final LintRange rootRange; // the root LintRange (covering the entire source file) | ||
final List<Span> unmappedDecls = new ArrayList<>(); // unmapped top-level declarations awaiting attribution | ||
final List<Span> unmappedDecls = new LinkedList<>(); // unmapped top-level declarations awaiting attribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinkedList has a ton of nodes, which translate to extra object headers, forward/backward pointers, and worse cache locality. To add N items, it requires O(N) allocations. In contrast, ArrayList requires O(log(N)) allocations (resizing) and is almost always better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archiecobbs measured a win. I haven't seen it in profiles but it could be that the code in afterAttr
contribute to the case for LinkedList
as it is set up to remove matching item from the list, likely at an early index. Which is one of the few things a LinkedList
actually does better than an ArrayList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree ArrayList
s are often better, @archiecobbs measured the LinkedList
s perform better here, as @cl4es says.
I am not quite clear why linked lists are faster, but it might be many of the lists are either empty (all the children
of the leaf LintRange
instances will be empty lists, I think, and an empty LinkedList
is, I think, cheaper than an empty ArrayList
), and some of them will have only a few entries (like the unmappedDecls
list here: AFAIK, this has one entry for each top-level declaration, and hence is highly unlikely to have more than 2 entries - one for the package clause, and one for the top-level class). If ArrayList
s with substantial number of entries are only a small minority, it might explain why the use of LinkedList
s leads to better results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right way to fix is not to use LinkedList, but to update afterAttr to use List.removeIf - this was added back in 8 to avoid the overhead of shifting from multiple removals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look my last analysis was wrong. This is removing one element which is probably better done as an operation on a TreeSet, which is okay. However, I don't find why we would use a LinkedList for LintRange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, this has one entry for each top-level declaration, and hence is highly unlikely to have more than 2 entries - one for the package clause, and one for the top-level class
Actually it will most likely only have one entry - package declarations are not included because they can never contain @SuppressWarnings
annotations. This may explain why LinkedList
is faster on average.
I don't find why we would use a LinkedList for LintRange.
The same answer may apply here, but honestly I'm just guessing at this. I was also surprised that LinkedList
was faster than ArrayList
, but the numbers seem to be saying that.
In fact, as you point out, unmappedDecls
could be a Set
instead of a List
. But whether that would actually help is not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it will most likely only have one entry - package declarations are not included because they can never contain
@SuppressWarnings
annotations. This may explain whyLinkedList
is faster on average.
Argh, ignore that, I was looking at something else. So "2" is the right number on average.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, ignore that, I was looking at something else. So "2" is the right number on average.
But there's no reason for that! 99.99% of package declarations have no annotations and therefore do not need to wait for attribution. So we can get this number down to "1".
I was curious if that would make any difference (using this additional patch). When I try it on my laptop I get a slight improvement:
Benchmark (stopStage) Mode Cnt Score Error Units
SingleJavacBenchmark.compileHot Generate ss 10 14.841 ± 0.618 s/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious if that would make any difference (using this additional patch). When I try it on my laptop I get a slight improvement:
Can't reallt establish any statistically significant difference either way here, neither on score or allocations.
So just to understand this a bit better I instrumented to see how common that condition is and it seems that when tree.getTag() == Tree.PACKAGEDEF
then tree.annotations
is never empty. At least in this benchmark. Perhaps the case when there are no annotations are skipped earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to understand this a bit better I instrumented to see how common that condition is and it seems that when
tree.getTag() == Tree.PACKAGEDEF
thentree.annotations
is never empty. At least in this benchmark. Perhaps the case when there are no annotations are skipped earlier?
Really? I'm not seeing that. When I apply this patch, I see size=0
unless there's actually an annotation there:
--- a/src/jdk.compiler/share/classes/com/sun/tools/javac/code/LintMapper.java
+++ b/src/jdk.compiler/share/classes/com/sun/tools/javac/code/LintMapper.java
@@ -179,6 +179,7 @@ private static class FileInfo {
FileInfo(Lint rootLint, JCCompilationUnit tree) {
rootRange = new LintRange(rootLint);
for (JCTree decl : tree.defs) {
+if (decl instanceof JCPackageDecl p) System.out.println("package "+p.pid+": annotations.size="+p.annotations.size());
if (isTopLevelDecl(decl))
unmappedDecls.add(new Span(decl, tree.endPositions));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. (I didn't run tests, though, so far.) Thanks!
@lahodaj thanks for the review! Since the current version seems to address the performance issue as reported in the bug, I'll plan to integrate later this evening unless there are objections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to resolve most or all of the regression - thanks!
/integrate |
Going to push as commit 5873c4b.
Your commit was automatically rebased without conflicts. |
@archiecobbs Pushed as commit 5873c4b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The refactoring in JDK-8348611 caused a regression in compiler performance. In the the refactoring there were a couple of simple optimizations that were missed. When put in place, these seem to (mostly) address the performance issue.
Progress
Issue
Reviewers
Contributors
<[email protected]>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27651/head:pull/27651
$ git checkout pull/27651
Update a local copy of the PR:
$ git checkout pull/27651
$ git pull https://git.openjdk.org/jdk.git pull/27651/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27651
View PR using the GUI difftool:
$ git pr show -t 27651
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27651.diff
Using Webrev
Link to Webrev Comment