|
| 1 | +--- |
| 2 | +title: Benchmarking via JMH |
| 3 | +weight: 6 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +Now that you’ve built and run the Tomcat-like response, you can use it to test the JVM performance using JMH. You can also use it to test the performance difference between Cobalt 100 instances and other similar D series x86_64 based instances. |
| 10 | +## Run the performance tests using JMH |
| 11 | + |
| 12 | +JMH (Java Microbenchmark Harness) is a Java benchmarking framework developed by the JVM team at Oracle to measure the performance of small code snippets with high precision. It accounts for JVM optimizations like JIT and warm-up to ensure accurate and reproducible results. It measures the throughput, average latency, or execution time. Below steps help benchmark the Tomcat-like operation: |
| 13 | + |
| 14 | + |
| 15 | +Install Maven: |
| 16 | + |
| 17 | +```console |
| 18 | +sudo apt install maven -y |
| 19 | +``` |
| 20 | +Create Benchmark Project: |
| 21 | + |
| 22 | +```console |
| 23 | +mvn archetype:generate \ |
| 24 | + -DinteractiveMode=false \ |
| 25 | + -DarchetypeGroupId=org.openjdk.jmh \ |
| 26 | + -DarchetypeArtifactId=jmh-java-benchmark-archetype \ |
| 27 | + -DarchetypeVersion=1.37 \ |
| 28 | + -DgroupId=com.example \ |
| 29 | + -DartifactId=jmh-benchmark \ |
| 30 | + -Dversion=1.0 |
| 31 | +cd jmh-benchmark |
| 32 | +``` |
| 33 | + |
| 34 | +Edit the `src/main/java/com/example/MyBenchmark.java` file and add the below code on it: |
| 35 | + |
| 36 | +```java |
| 37 | +package com.example; |
| 38 | + |
| 39 | +import org.openjdk.jmh.annotations.Benchmark; |
| 40 | + |
| 41 | +public class MyBenchmark { |
| 42 | + |
| 43 | + @Benchmark |
| 44 | + public void benchmarkHttpResponse() { |
| 45 | + String body = "Benchmarking a Tomcat-like operation"; |
| 46 | + StringBuilder sb = new StringBuilder(); |
| 47 | + sb.append("HTTP/1.1 200 OK\r\n"); |
| 48 | + sb.append("Content-Type: text/plain\r\n"); |
| 49 | + sb.append("Content-Length: ").append(body.length()).append("\r\n\r\n"); |
| 50 | + sb.append(body); |
| 51 | + |
| 52 | + // Prevent dead-code elimination |
| 53 | + if (sb.length() == 0) { |
| 54 | + throw new RuntimeException(); |
| 55 | + } |
| 56 | + } |
| 57 | +} |
| 58 | +``` |
| 59 | +This simulates HTTP response generation similar to Tomcat. |
| 60 | + |
| 61 | +Build the Benchmark: |
| 62 | + |
| 63 | +```console |
| 64 | +mvn clean install |
| 65 | +``` |
| 66 | + |
| 67 | +After the build is complete, the JMH benchmark jar will be in the target/ directory. |
| 68 | + |
| 69 | +Run the Benchmark: |
| 70 | + |
| 71 | +```console |
| 72 | +java -jar target/benchmarks.jar |
| 73 | +``` |
| 74 | + |
| 75 | +You should see an output similar to: |
| 76 | +```output |
| 77 | +# JMH version: 1.37 |
| 78 | +# VM version: JDK 21.0.8, OpenJDK 64-Bit Server VM, 21.0.8+9-Ubuntu-0ubuntu124.04.1 |
| 79 | +# VM invoker: /usr/lib/jvm/java-21-openjdk-arm64/bin/java |
| 80 | +# VM options: <none> |
| 81 | +# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) |
| 82 | +# Warmup: 5 iterations, 10 s each |
| 83 | +# Measurement: 5 iterations, 10 s each |
| 84 | +# Timeout: 10 min per iteration |
| 85 | +# Threads: 1 thread, will synchronize iterations |
| 86 | +# Benchmark mode: Throughput, ops/time |
| 87 | +# Benchmark: com.example.MyBenchmark.benchmarkHttpResponse |
| 88 | +
|
| 89 | +# Run progress: 0.00% complete, ETA 00:08:20 |
| 90 | +# Fork: 1 of 5 |
| 91 | +# Warmup Iteration 1: 33509694.060 ops/s |
| 92 | +# Warmup Iteration 2: 36783933.354 ops/s |
| 93 | +# Warmup Iteration 3: 35202103.615 ops/s |
| 94 | +# Warmup Iteration 4: 36493073.361 ops/s |
| 95 | +# Warmup Iteration 5: 36470050.153 ops/s |
| 96 | +Iteration 1: 35188405.658 ops/s |
| 97 | +Iteration 2: 35011856.616 ops/s |
| 98 | +Iteration 3: 36282916.441 ops/s |
| 99 | +Iteration 4: 34558682.952 ops/s |
| 100 | +Iteration 5: 34878375.325 ops/s |
| 101 | +
|
| 102 | +# Run progress: 20.00% complete, ETA 00:06:41 |
| 103 | +# Fork: 2 of 5 |
| 104 | +# Warmup Iteration 1: 33055148.091 ops/s |
| 105 | +# Warmup Iteration 2: 36374390.556 ops/s |
| 106 | +# Warmup Iteration 3: 35020852.850 ops/s |
| 107 | +# Warmup Iteration 4: 36463924.398 ops/s |
| 108 | +# Warmup Iteration 5: 35116009.523 ops/s |
| 109 | +Iteration 1: 36604427.854 ops/s |
| 110 | +Iteration 2: 35151064.855 ops/s |
| 111 | +Iteration 3: 35171529.012 ops/s |
| 112 | +Iteration 4: 35092144.416 ops/s |
| 113 | +Iteration 5: 36670199.634 ops/s |
| 114 | +
|
| 115 | +# Run progress: 40.00% complete, ETA 00:05:00 |
| 116 | +# Fork: 3 of 5 |
| 117 | +# Warmup Iteration 1: 34021525.130 ops/s |
| 118 | +# Warmup Iteration 2: 35796028.914 ops/s |
| 119 | +# Warmup Iteration 3: 36813541.649 ops/s |
| 120 | +# Warmup Iteration 4: 34424554.094 ops/s |
| 121 | +# Warmup Iteration 5: 35100074.155 ops/s |
| 122 | +Iteration 1: 33533209.090 ops/s |
| 123 | +Iteration 2: 34755031.947 ops/s |
| 124 | +Iteration 3: 36463135.748 ops/s |
| 125 | +Iteration 4: 34961009.997 ops/s |
| 126 | +Iteration 5: 36496001.612 ops/s |
| 127 | +
|
| 128 | +# Run progress: 60.00% complete, ETA 00:03:20 |
| 129 | +# Fork: 4 of 5 |
| 130 | +# Warmup Iteration 1: 33393091.940 ops/s |
| 131 | +# Warmup Iteration 2: 35235407.288 ops/s |
| 132 | +# Warmup Iteration 3: 36203077.665 ops/s |
| 133 | +# Warmup Iteration 4: 34580888.238 ops/s |
| 134 | +# Warmup Iteration 5: 35984836.776 ops/s |
| 135 | +Iteration 1: 34896194.779 ops/s |
| 136 | +Iteration 2: 36479405.215 ops/s |
| 137 | +Iteration 3: 35010049.135 ops/s |
| 138 | +Iteration 4: 36277296.075 ops/s |
| 139 | +Iteration 5: 36340953.266 ops/s |
| 140 | +
|
| 141 | +# Run progress: 80.00% complete, ETA 00:01:40 |
| 142 | +# Fork: 5 of 5 |
| 143 | +# Warmup Iteration 1: 35482444.435 ops/s |
| 144 | +# Warmup Iteration 2: 37116032.766 ops/s |
| 145 | +# Warmup Iteration 3: 35389871.716 ops/s |
| 146 | +# Warmup Iteration 4: 36814888.849 ops/s |
| 147 | +# Warmup Iteration 5: 35462220.484 ops/s |
| 148 | +Iteration 1: 36896452.473 ops/s |
| 149 | +Iteration 2: 35362724.405 ops/s |
| 150 | +Iteration 3: 36992383.389 ops/s |
| 151 | +Iteration 4: 35535471.437 ops/s |
| 152 | +Iteration 5: 36881529.760 ops/s |
| 153 | +
|
| 154 | +
|
| 155 | +Result "com.example.MyBenchmark.benchmarkHttpResponse": |
| 156 | + 35659618.044 ±(99.9%) 686946.011 ops/s [Average] |
| 157 | + (min, avg, max) = (33533209.090, 35659618.044, 36992383.389), stdev = 917053.272 |
| 158 | + CI (99.9%): [34972672.032, 36346564.055] (assumes normal distribution) |
| 159 | +
|
| 160 | +
|
| 161 | +# Run complete. Total time: 00:08:21 |
| 162 | +
|
| 163 | +REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on |
| 164 | +why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial |
| 165 | +experiments, perform baseline and negative tests that provide experimental control, make sure |
| 166 | +the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. |
| 167 | +Do not assume the numbers tell you what you want them to tell. |
| 168 | +
|
| 169 | +NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise |
| 170 | +extra caution when trusting the results, look into the generated code to check the benchmark still |
| 171 | +works, and factor in a small probability of new VM bugs. Additionally, while comparisons between |
| 172 | +different JVMs are already problematic, the performance difference caused by different Blackhole |
| 173 | +modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons. |
| 174 | +
|
| 175 | +Benchmark Mode Cnt Score Error Units |
| 176 | +MyBenchmark.benchmarkHttpResponse thrpt 25 35659618.044 ± 686946.011 ops/s |
| 177 | +``` |
| 178 | + |
| 179 | +### Benchmark Metrics Explained |
| 180 | + |
| 181 | +- **Run Count**: The total number of benchmark iterations executed. A higher run count increases statistical reliability and reduces the effect of outliers. |
| 182 | +- **Average Throughput**: The mean number of operations executed per second across all iterations. This metric represents the overall sustained performance of the benchmarked workload. |
| 183 | +- **Standard Deviation**: Indicates the amount of variation or dispersion from the average throughput. A smaller standard deviation means more consistent performance. |
| 184 | +- **Confidence Interval (99.9%)**: The statistical range within which the true average throughput is expected to fall, with 99.9% certainty. Narrow intervals imply more reliable results. |
| 185 | +- **Min Throughput**: The lowest throughput observed across all iterations, reflecting the worst-case performance scenario. |
| 186 | +- **Max Throughput**: The highest throughput observed across all iterations, reflecting the best-case performance scenario. |
| 187 | + |
| 188 | +### Benchmark summary on Arm64 |
| 189 | + |
| 190 | +Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pro 24.04 LTS virtual machine**. |
| 191 | +| Metric | Value | |
| 192 | +|--------------------------------|---------------------------| |
| 193 | +| **Java Version** | OpenJDK 21.0.8 | |
| 194 | +| **Run Count** | 25 iterations | |
| 195 | +| **Average Throughput** | 35.66M ops/sec | |
| 196 | +| **Standard Deviation** | ±0.92M ops/sec | |
| 197 | +| **Confidence Interval (99.9%)**| [34.97M, 36.34M] ops/sec | |
| 198 | +| **Min Throughput** | 33.53M ops/sec | |
| 199 | +| **Max Throughput** | 36.99M ops/sec | |
| 200 | + |
| 201 | +### Benchmark summary on x86 |
| 202 | + |
| 203 | +Here is a summary of benchmark results collected on x86 **D4s_v6 Ubuntu Pro 24.04 LTS virtual machine**. |
| 204 | + |
| 205 | +| Metric | Value | |
| 206 | +|--------------------------------|---------------------------| |
| 207 | +| **Java Version** | OpenJDK 21.0.8 | |
| 208 | +| **Run Count** | 25 iterations | |
| 209 | +| **Average Throughput** | 16.78M ops/sec | |
| 210 | +| **Standard Deviation** | ±0.06M ops/sec | |
| 211 | +| **Confidence Interval (99.9%)**| [16.74M, 16.83M] ops/sec | |
| 212 | +| **Min Throughput** | 16.64M ops/sec | |
| 213 | +| **Max Throughput** | 16.88M ops/sec | |
| 214 | + |
| 215 | + |
| 216 | +### Benchmark comparison insights |
| 217 | +When comparing the results on Arm64 vs x86_64 virtual machines: |
| 218 | + |
| 219 | +- **High Throughput:** Achieved an average of **35.66M ops/sec**, with peak performance reaching **36.99M ops/sec**. |
| 220 | +- **Stable Performance:** Standard deviation of **±0.92M ops/sec**, with results tightly bounded within the 99.9% confidence interval **[34.97M, 36.34M]**. |
| 221 | +- **Consistent Efficiency:** Demonstrates the reliability of Arm64 architecture for sustaining high-throughput Java workloads on Azure Ubuntu Pro environments. |
| 222 | + |
| 223 | +You have now benchmarked Java on an Azure Cobalt 100 Arm64 virtual machine and compared results with x86_64. |
0 commit comments