Skip to content

8359110: Log accumulated GC and process CPU time upon VM exit #25779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

JonasNorlinder
Copy link
Contributor

@JonasNorlinder JonasNorlinder commented Jun 12, 2025

Add support to log CPU cost for GC during VM exit with -Xlog:gc.

[1.500s][info ][gc] GC CPU cost: 1.75%

Additionally, detailed information may be retrieved with -Xlog:gc=trace

[1.500s][trace][gc] Process CPU time: 4.945370s
[1.500s][trace][gc] GC CPU time: 0.086382s
[1.500s][info ][gc] GC CPU cost: 1.75%

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8359110: Log accumulated GC and process CPU time upon VM exit (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25779/head:pull/25779
$ git checkout pull/25779

Update a local copy of the PR:
$ git checkout pull/25779
$ git pull https://git.openjdk.org/jdk.git pull/25779/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25779

View PR using the GUI difftool:
$ git pr show -t 25779

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25779.diff

Using Webrev

Link to Webrev Comment

@JonasNorlinder
Copy link
Contributor Author

/label add hotspot-gc hotspot-runtime

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 12, 2025

👋 Welcome back JonasNorlinder! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 12, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jun 12, 2025

@JonasNorlinder
The hotspot-gc label was successfully added.

The hotspot-runtime label was successfully added.

@JonasNorlinder JonasNorlinder changed the title 8359110: Log GC CPU cost upon VM exit 8359110: Log accumulated GC and process CPU time upon VM exit Jun 12, 2025
@JonasNorlinder JonasNorlinder marked this pull request as ready for review June 12, 2025 12:28
@JonasNorlinder
Copy link
Contributor Author

This PR is ready for review

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 12, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 12, 2025

@JonasNorlinder
Copy link
Contributor Author

Refactored code per @kstefanj suggestions

@tschatzl
Copy link
Contributor

Fwiw, I would prefer to have one message containing all the information, and add the exit tag. This decreases clutter (timestamp and tags), and allows direct selection of that message.

@tschatzl
Copy link
Contributor

Fwiw, I would prefer to have one message containing all the information, and add the exit tag. This decreases clutter (timestamp and tags), and allows direct selection of that message.

Also reduces the amount of parsing needed in scripts etc. (I.e. three regexps vs. one). These three values are not really too much to digest for human readers.

Another problem seems to be the large amount of digits after the comma for the times; maybe use a different time scale (ms/us).

@JonasNorlinder
Copy link
Contributor Author

Fwiw, I would prefer to have one message containing all the information, and add the exit tag. This decreases clutter (timestamp and tags), and allows direct selection of that message.

Thank you for sharing your concern. I'm OK with putting the CPU times currently in trace into the exit tag but I strongly believe we should keep the
[1.500s][info ][gc] GC CPU cost: 1.75% as is. I discussed the exit tag option with @kstefanj but I thought that hiding the nominal values in the trace level would suffice, but can change that if we think that is preferable. I think that exposing this number at -Xlog:gc-level may be an important tool for users to understand that they may running with too small heap. We expose a lot of way for users to understand the memory footprint, but not too much about the CPU footprint. Putting this number behind exit may increase the risk that the typical user will not discover it. Adding one line at the end while at the same time logging information about each GC cycle does not clutter the message log IMO.

Another option could be to not log the nominal values at all. If one have the percentage and measure process CPU time with e.g. perf one could calculate it yourself anyways. What do you think about that?

Another problem seems to be the large amount of digits after the comma for the times; maybe use a different time scale (ms/us).

Thanks for pointing that out, I will fix that.

@JonasNorlinder JonasNorlinder requested a review from tschatzl June 12, 2025 15:21
@JonasNorlinder
Copy link
Contributor Author

FYI; I removed nominal logging

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string deduplication thread, which is to some degree also a GC helper thread, is not considered here. Not sure if it should, slightly tending to add it.

Comment on lines +119 to +123
if (thread->is_ConcurrentGC_thread() ||
strstr(thread->name(), "ZWorker") != nullptr) {
Atomic::add(&_vtime, os::thread_cpu_time(thread));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this exclude threads like the ZDirector and other ZGC background threads? That thread seems to clearly be relevant to ZGC operation, doing so would make the measurement incomplete.
The change does not exclude e.g. some random G1 "director" threads either, even if they do not contribute much to the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does not exclude ZDirector, etc. Adding printf("%s\n", thread->name()); to prove my point results in:

ZDirector
ZDriverMajor
ZDriverMinor
ZStat
ZUncommitter#0
ZWorkerYoung#0
ZWorkerYoung#1
ZWorkerYoung#2
ZWorkerYoung#3
ZWorkerOld#0
ZWorkerOld#1
ZWorkerOld#2
ZWorkerOld#3

This code is working around the fact that ZCollectedHeap::gc_threads_do also calls _runtime_workers.threads_do which I believe do not participate in GC related work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Parallel and G1 reuse the same gc-worker threads in safepoint_workers() for non-gc work, just fyi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW; I confirmed with @stefank that _runtime_workers shoud not be accounted for GC CPU time for ZGC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then runtime tasks performed by the GC worker threads when using them for runtime work is differently counted towards those GCs that do this sharing/shadowing.

I looked a bit what they are doing, after JDK-8329488 they are only used by heap inspection and heap dumping. Which seems to be solely GC related task, so I kind of think they should be counted against GC.
At least make the accounting uniform across collectors.

So one option is duplicating these workers in G1/Parallel too, and fix https://bugs.openjdk.org/browse/JDK-8277394. Since we can't share GC workers and these "runtime workers" any more due to this change, the safepoint workers management should probably be moved to CollectedHeap, and they shouldn't be advertised as general purpose workers everyone can hook into.

Or just let ZGC's _runtime_threads also count towards GC time. After all both of these VM operations are VM_GC_Operations.

@tschatzl
Copy link
Contributor

FYI; I removed nominal logging

Okay, these can be re-added if needed. I also see your point about that this is just one message at VM exit, so we do not need the "exit" label. I would prefer if it had, so I won't insist on it given that others do not mind either. It would fit the purpose of the exit label perfectly though.

The argument that "the user could forget specifying it", is somewhat weak imo - in that case one could argue why there are those labels, and I kind of doubt that GC cpu usage at the end only is that important to have for everyone every time.

I.e. if there is need to monitor it, only printing it at the end seems insufficient, as that kind of monitoring is continuous. It helps benchmarking though.

@JonasNorlinder
Copy link
Contributor Author

The argument that "the user could forget specifying it", is somewhat weak imo - in that case one could argue why there are those labels, and I kind of doubt that GC cpu usage at the end only is that important to have for everyone every time.

I disagree, it is equally important as reporting pre and post compaction heap usage like we do now with -Xlog:gc. Users who are not experts in GC may underestimate the CPU cost of GC at a given heap max. Even experts in academia tend to run with too small heap. I maintain my position that adding it at the end is crucial.

@JonasNorlinder
Copy link
Contributor Author

Additionally, if we want to we can also add capabilities to track it continuously with JFR and/or MXBeans. But that may introduce a performance penalty as sampling may not be free so I want to still keep logging it at the end as a base case.

@tschatzl
Copy link
Contributor

The argument that "the user could forget specifying it", is somewhat weak imo - in that case one could argue why there are those labels, and I kind of doubt that GC cpu usage at the end only is that important to have for everyone every time.

I disagree, it is equally important as reporting pre and post compaction heap usage like we do now with -Xlog:gc. Users who are not experts in GC may underestimate the CPU cost of GC at a given heap max. Even experts in academia tend to run with too small heap. I maintain my position that adding it at the end is crucial.

I do not disagree about the usefulness of the message (it is - I even liked the nominal output), I only somewhat disagree about making a message purposefully printed at VM exit, to state the cpu usage at exit, not having the "exit" label.
(Regarding the nominal logging: my points were that I thought it was a waste to use three messages, and the presentation format)

We do not need to agree on everything 100%.

Additionally, if we want to we can also add capabilities to track it continuously with JFR and/or MXBeans. But that may introduce a performance penalty as sampling may not be free so I want to still keep logging it at the end as a base case.

That's fine.

Fwiw, there has even been interest from me (https://bugs.openjdk.org/browse/JDK-8349476) about regularly printing these statistics at even higher detail.

However as far as I understand, there are jstat/perf counters already for that, and they are in use (in industry). There is the jcmd command (even before that) that prints per-thread cpu usage for some time now - one could filter out the interesting threads manually...

@JonasNorlinder
Copy link
Contributor Author

The string deduplication thread, which is to some degree also a GC helper thread, is not considered here. Not sure if it should, slightly tending to add it.

I can see arguments for both ways, but do not have a strong opinion. Unless someone object I will add it.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable though I'm unclear on some details. Could you give a high-level description of what times we capture for what threads and when, and the calculations involved. Thanks.

Some minor style nits.

inline VTimeScope::VTimeScope(VMThread *thread, bool operation_is_gc)
: _start(0), _enabled(os::is_thread_cpu_time_supported()),
_gcLogging(operation_is_gc && (log_is_enabled(Info, gc) || log_is_enabled(Info, gc, cpu))),
_thread((Thread*)thread) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast should not be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not casting requires adding #include "runtime/vmThread.hpp". Will add that since that seems to be preferred.

Comment on lines +119 to +123
if (thread->is_ConcurrentGC_thread() ||
strstr(thread->name(), "ZWorker") != nullptr) {
Atomic::add(&_vtime, os::thread_cpu_time(thread));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then runtime tasks performed by the GC worker threads when using them for runtime work is differently counted towards those GCs that do this sharing/shadowing.

I looked a bit what they are doing, after JDK-8329488 they are only used by heap inspection and heap dumping. Which seems to be solely GC related task, so I kind of think they should be counted against GC.
At least make the accounting uniform across collectors.

So one option is duplicating these workers in G1/Parallel too, and fix https://bugs.openjdk.org/browse/JDK-8277394. Since we can't share GC workers and these "runtime workers" any more due to this change, the safepoint workers management should probably be moved to CollectedHeap, and they shouldn't be advertised as general purpose workers everyone can hook into.

Or just let ZGC's _runtime_threads also count towards GC time. After all both of these VM operations are VM_GC_Operations.

@@ -271,14 +272,14 @@ static void post_vm_operation_event(EventExecuteVMOperation* event, VM_Operation

void VMThread::evaluate_operation(VM_Operation* op) {
ResourceMark rm;

{
PerfTraceTime vm_op_timer(perf_accumulated_vm_operation_time());
HOTSPOT_VMOPS_BEGIN(
(char *) op->name(), strlen(op->name()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dholmes-ora given that I'm editing VMThread::evaluate_operation and your comments about code style - do you want me to also change this type cast to (char*) to align with the preferred style?

@JonasNorlinder
Copy link
Contributor Author

This seems reasonable though I'm unclear on some details. Could you give a
high-level description of what times we capture for what threads and when, and
the calculations involved. Thanks.

What threads:

  • All GC threads e.g. director thread, sample threads, worker threads, driver
    threads (except for ZGC where I currently exclude the "runtime" threads which
    we may decide to change, see @tschatzl comments above)

  • The VM thread, but only include VM operations related to GC

  • The string deduplication thread

What times:

  • CPU time for all the threads above

  • CPU time for the entire process

When:

  • VM thread:

    • If VM operation is GC and -Xlog:gc or more is enabled, sample start, and end CPU time
      for the GC operation. Add end - start to gc_vm_vtime.
  • GC threads:

    • If -Xlog:gc or more is enabled, sample all GC threads and store the CPU
      time sum to gc_threads_vtime during VM exit.
  • String deduplication thread:

    • If string deduplication is enabled and if -Xlog:gc or more is enabled
      sample CPU time during VM exit for that thread and store to string_deduplication_vtime
  • Process:

    • If -Xlog:gc or more is enabled, sample the CPU time for the entire process

Log accumulated CPU time gc_threads_vtime + gc_vm_vtime + string_deduplication_vtime
as a percentage of the total process CPU time. We sample the total process CPU
time with the new method os::elapsed_process_vtime.

It should be noted from the above that calling elapsed_gc_vtime assumes that
-Xlog:gc or more is enabled. If one breaks this assumption the GC CPU time will
not include VM thread and may be horribly wrong (e.g. if one runs Serial).

@JonasNorlinder
Copy link
Contributor Author

After all both of these VM operations are VM_GC_Operations.

@tschatzl Thanks for pointing that out, that's an excellent point!

Your comment caused me to reflect over that I have currently defined all operations as GC operations if they inherit from VM_GC_Sync_Operation. However the following should probably not be strictly counted as a GC activity: VM_GC_HeapInspection, VM_PopulateDynamicDumpSharedSpace, VM_Verify, VM_PopulateDumpSharedSpace. I will update the PR.

So one option is duplicating these workers in G1/Parallel too, and fix https://bugs.openjdk.org/browse/JDK-8277394.

Thanks for the suggestion, I agree that we should do that to fix the root issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants