-
Notifications
You must be signed in to change notification settings - Fork 56
8368465: [leyden] Improve precompiler method selection code #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: premain
Are you sure you want to change the base?
8368465: [leyden] Improve precompiler method selection code #99
Conversation
|
javac test (1000 iterations trained, 50 iterations production)
|
|
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
|
Larger benchmarks all improve with 1-core tests. quarkus-getting-started: helidon-quickstart-se micronaut-first-app spring-boot-getting-started: spring-petclinic: |
|
❗ This change is not yet ready to be integrated. |
Webrevs
|
|
Ready for review, folks. There are performance benefits of doing this, very apparently. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions
|
|
||
| default: fatal("%d", _search_level); | ||
| } | ||
| // Otherwise, break the tie by code size: largest methods go first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for larger method be first? Can we use compile ID here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my logic was like with the hot methods. If we have lost the game of "preload the AOT code before JIT got triggered", and JIT got triggered, we want to then prioritize larger methods, as they are more likely to take more time to JIT compile. In other words, I think if you lost to JIT timing-wise, you want to preempt the fattest JIT compiles first. But it is only a bet. If we ever record compilation time in nmethods/profiles, we could have used that to break the tie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure understand how different order of pre/AOT-compilation can affect performance of production run.
We bulk load all "Preload" AOT code - ordering does not matter for it. Even if we load in selected order. It is one thread which do loading and it is blocking (I actually playing with spreading this preload on all compiler threads - did not see much affect on startup).
The only explanation is that preload happens only when C2 compiler threads are initialized (Preload AOT code is C2 compiled code) and it happens simultaneously with C1 threads initialization which could be available sooner for C1 compilation than we finish preloading. Especially on small number of cores machines. I did observed that we start loading A1 and A2 code first (even normal C1 compilations) before we start preload AP4.
Is it what you are trying to solve here?
The invocation counters should be roughly the same for methods without loops (10000 to trigger C2 compilation). They could be different if code was deoptimized and run in interpreter. The only difference id backedge counter. So in this sense you push methods with hot loop up front as we talked about in other comment. Which may affect performance but it would depend on application.
I agree with ordering by size (or time spant in compilation) but only for methods which did not have A1 or A2 code. Which should not be the case - if we have AP4 we will have A1 and A2 for it. I am still not convince by this.
May be we should try to move AOTCodeCache::preload_code() just after SystemDictionary::compute_java_loaders() because it does not depend on training data. So we can have AP4 sooner. MMaping directly into CodeCache will also speedup preloading - it is on our list to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We bulk load all "Preload" AOT code - ordering does not matter for it. Even if we load in selected order. It is one thread which do loading and it is blocking (I actually playing with spreading this preload on all compiler threads - did not see much affect on startup).
Um, I don't think that's the case? Preloading is asynchronous. See CompileBroker::compile_method:
bool is_blocking = ReplayCompiles ||
!directive->BackgroundCompilationOption ||
(PreloadBlocking && (compile_reason == CompileTask::Reason_Preload));
compile_method_base(method, osr_bci, comp_level, hot_count, compile_reason, requires_online_compilation, is_blocking, THREAD);
We have the option to make preload blocking (PreloadBlocking), but it is turned off by default. We know enabling +PreloadBlocking is counter-productive, because it could easily take hundreds of milliseconds.
So while compilers are working through preloading the code, the application runs and can trigger compilations. Sorting preloading methods allows loading hottest code before that code transits to normal compilation. This is the problem I am trying to mitigate.
Maybe the compilation policy should actually participate in preloading: i.e. if there is a hot code that transitions from T0 to any other level, attempt the preload first, in case normal preloading is lagging behind. That would be more intrusive, though, so as the conservative approach I would like to prioritize more profitable preload code first.
|
I think we have small performance "issue" how we replace existing JITed code with new one which AOT code loading could be more sensitive. We deoptimize old code before new code is set under lock If lock is held by other thread we may deoptimize previous code and go into interpreter before new code is set for use. This is present in mainline but with normal JIT compilation replacement it may be not noticeable. |
|
An other suggestion for this concurrent preloading would be to split A4 preload code. One set is the current which needs to wait |
|
@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
Getting back to this... |


Forked from JDK-8366681: there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements:
Additional testing:
runtime/cdsProgress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99$ git checkout pull/99Update a local copy of the PR:
$ git checkout pull/99$ git pull https://git.openjdk.org/leyden.git pull/99/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 99View PR using the GUI difftool:
$ git pr show -t 99Using diff file
Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/99.diff
Using Webrev
Link to Webrev Comment