Skip to content

8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention #1933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dtmhuang
Copy link

@dtmhuang dtmhuang commented Jun 30, 2025

Backport for ShenandoahLock performance regression issue. The fix involves sleeping for a very short duration every 3 yields, with the number of yields picked through manual testing.

Clean backport, ran GHA sanity checks and locally tested tier1, tier2, and hotspot_gc_shenandoah. test/jdk/java/nio/channels/FileChannel/directio/DirectIOTest.java sometimes fails locally, but it also sometimes failed before the backport.
test/jdk/java/nio/channels/DatagramChannel/SendReceiveMaxSize.java fails locally, but it also fails locally before the backport.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • JDK-8350285 needs maintainer approval

Issue

  • JDK-8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk21u-dev.git pull/1933/head:pull/1933
$ git checkout pull/1933

Update a local copy of the PR:
$ git checkout pull/1933
$ git pull https://git.openjdk.org/jdk21u-dev.git pull/1933/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1933

View PR using the GUI difftool:
$ git pr show -t 1933

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk21u-dev/pull/1933.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 30, 2025

👋 Welcome back dtmhuang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 30, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title Backport bd8ad309b59bceb3073a8d6411cca74e73508885 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention Jun 30, 2025
@openjdk
Copy link

openjdk bot commented Jun 30, 2025

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk bot added backport Port of a pull request already in a different code base clean Identical backport; no merge resolution required labels Jun 30, 2025
@dtmhuang dtmhuang marked this pull request as ready for review June 30, 2025 22:38
@openjdk
Copy link

openjdk bot commented Jun 30, 2025

⚠️ @dtmhuang This change is now ready for you to apply for maintainer approval. This can be done directly in each associated issue or by using the /approval command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 30, 2025
@dtmhuang
Copy link
Author

/approval request for backport of JDK-8350285 Shenandoah: Regression caused by ShenandoahLock under extreme contention

Motivation: Without this change, ShenandoahLock has some performance regression

Risk: Low, since change has been present in tip since February, 2025. Ran GHA Sanity Checks, and tier 1 and tier 2 tests locally. Patch is clean.

@openjdk
Copy link

openjdk bot commented Jun 30, 2025

@dtmhuang
8350285: The approval request has been created successfully.

@openjdk openjdk bot added the approval Requires approval; will be removed when approval is received label Jun 30, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 30, 2025

Webrevs

@GoeLin
Copy link
Member

GoeLin commented Jul 2, 2025

Hi @dtmhuang
Besides running the tier tests, can you please run some large applications that cause contention on the problematic code?

@dtmhuang
Copy link
Author

dtmhuang commented Jul 2, 2025

Sure!

The original fix had code to reproduce the bug:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Semaphore;

public class Alloc {
    static final CountDownLatch startSignal = new CountDownLatch(1);
    static final Semaphore semaphore = new Semaphore(128);
    static final int THREADS = 1024; //64 threads per CPU core, 16 cores
    static final Object[] sinks = new Object[64 * THREADS];
    static volatile boolean start;
    static volatile boolean stop;

    private static void waitOnStartSignal() {
        try {
            startSignal.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String... args) throws Throwable {
        for (int t = 0; t < THREADS; t++) {
            int ft = t;
            new Thread(() -> work(ft * 64)).start();
        }

        Thread.sleep(1000);
        startSignal.countDown();
        Thread.sleep(30_000);
        stop = true;
    }

    public static void work(int idx) {
        waitOnStartSignal();
        while (!stop) {
            semaphore.acquireUninterruptibly();
            try {
                sinks[idx] = new byte[128];
            } catch (Throwable ex) {
                throw new RuntimeException(ex);
            } finally {
                semaphore.release();
            }
        }
    }
}

I ran this on the command line with
.build/linux-x86_64-server-release/jdk/bin/java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr

Running this without the fix gives at-safepoint times

22273444
11615507
11297887
10424031
10117190
9789552
9754920
9599965
7477300
6897913

Running with the backported fix gives at-safepoint times

15667088
8279113
3800276
853206
464314
399752
387322
381562
378641
358231

@GoeLin
Copy link
Member

GoeLin commented Jul 4, 2025

Hi @dtmhuang
well, that shows that the change does what it intends to do. That's not my point.
I want to make sure you cause no regressions of any other kind. jdk25, which you mention wrt. testing is not live yet.
I don't see why we should take any risk for a performance optimization that helps only in rare situations.

@openjdk openjdk bot removed the approval Requires approval; will be removed when approval is received label Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Port of a pull request already in a different code base clean Identical backport; no merge resolution required rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

2 participants