Skip to content

Commit a866b5e

Browse files
authored
Merge pull request #2225 from iJobsYuYing/main
add tune-network-workloads-on-bare-metal in servers-and-cloud-computing
2 parents 386c5ed + 80e3918 commit a866b5e

File tree

10 files changed

+704
-0
lines changed

10 files changed

+704
-0
lines changed
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
---
2+
title: Tomcat benchmark set up
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
## Overview
11+
12+
There are numerouse client-server and network-based workloads, and Tomcat is a typical example of such applications, which provide services via HTTP/HTTPS network requests.
13+
14+
In this section, you'll set up a benchmark environment using Apache Tomcat and `wrk2` to simulate HTTP load and evaluate performance on an Arm-based bare-metal (**__`Nvidia-Grace`__**).
15+
16+
## Set up the Tomcat benchmark server on **Nvidia Grace**
17+
[Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that runs Java web applications, handles HTTP requests, and serves dynamic content. It supports technologies such as Servlet, JSP, and WebSocket.
18+
19+
## Install the Java Development Kit (JDK)
20+
21+
Install OpenJDK 21 on your Arm-based Ubuntu 24 bare-metal:
22+
23+
```bash
24+
sudo apt update
25+
sudo apt install -y openjdk-21-jdk
26+
```
27+
28+
## Install Tomcat
29+
30+
Download and extract Tomcat:
31+
32+
```bash
33+
wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.9/bin/apache-tomcat-11.0.9.tar.gz
34+
tar xzf apache-tomcat-11.0.9.tar.gz
35+
```
36+
Alternatively, you can build Tomcat [from source](https://github.com/apache/tomcat).
37+
38+
## Enable access to Tomcat examples
39+
40+
To access the built-in examples from your local network or external IP, use a text editor to modify the `context.xml` file by updating the `RemoteAddrValve` configuration to allow all IP addresses.
41+
42+
The file is at:
43+
```bash
44+
apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
45+
```
46+
47+
```xml
48+
<!-- Before -->
49+
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />
50+
51+
<!-- After -->
52+
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=".*" />
53+
```
54+
55+
## Start the Tomcat server
56+
{{% notice Note %}}
57+
To achieve maximum performance of Tomcat, the maximum number of file descriptors that a single process can open simultaneously should be sufficiently large.
58+
{{% /notice %}}
59+
60+
Start the server:
61+
62+
```bash
63+
ulimit -n 65535 && ./apache-tomcat-11.0.9/bin/startup.sh
64+
```
65+
66+
You should see output like:
67+
68+
```output
69+
Using CATALINA_BASE: /home/ubuntu/apache-tomcat-11.0.9
70+
Using CATALINA_HOME: /home/ubuntu/apache-tomcat-11.0.9
71+
Using CATALINA_TMPDIR: /home/ubuntu/apache-tomcat-11.0.9/temp
72+
Using JRE_HOME: /usr
73+
Using CLASSPATH: /home/ubuntu/apache-tomcat-11.0.9/bin/bootstrap.jar:/home/ubuntu/apache-tomcat-11.0.9/bin/tomcat-juli.jar
74+
Using CATALINA_OPTS:
75+
Tomcat started.
76+
```
77+
78+
## Confirm server access
79+
80+
In your browser, open: `http://${tomcat_ip}:8080/examples`.
81+
82+
You should see the Tomcat welcome page and examples, as shown below:
83+
84+
![Screenshot of the Tomcat homepage showing version and welcome panel alt-text#center](./_images/lp-tomcat-homepage.png "Apache Tomcat homepage")
85+
86+
![Screenshot of the Tomcat examples page showing servlet and JSP demo links alt-text#center](./_images/lp-tomcat-examples.png "Apache Tomcat examples")
87+
88+
{{% notice Note %}}Make sure port 8080 is open in the security group of the IP address for your Arm-based Linux machine.{{% /notice%}}
89+
90+
## Set up the benchmarking client using wrk2
91+
[Wrk2](https://github.com/giltene/wrk2) is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers.
92+
93+
{{% notice Note %}}
94+
Currently `wrk2` is only supported on x86 machines. Run the benchmark client steps below on an `x86_64` server running Ubuntu 24.
95+
{{% /notice %}}
96+
97+
## Install dependencies
98+
99+
Install the required packages:
100+
101+
```bash
102+
sudo apt-get update
103+
sudo apt-get install -y build-essential libssl-dev git zlib1g-dev
104+
```
105+
106+
## Clone and build wrk2
107+
108+
Clone the repository and compile the tool:
109+
110+
```bash
111+
sudo git clone https://github.com/giltene/wrk2.git
112+
cd wrk2
113+
sudo make
114+
```
115+
116+
Move the binary to a directory in your system’s PATH:
117+
118+
```bash
119+
sudo cp wrk /usr/local/bin
120+
```
121+
122+
## Run the benchmark
123+
{{% notice Note %}}
124+
To achieve maximum performance of wrk2, the maximum number of file descriptors that a single process can open simultaneously should be sufficiently large.
125+
{{% /notice %}}
126+
127+
Use the following command to benchmark the HelloWorld servlet running on Tomcat:
128+
129+
```bash
130+
ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
131+
```
132+
You should see output similar to:
133+
134+
```console
135+
Running 1m test @ http://172.26.203.139:8080/examples/servlets/servlet/HelloWorldExample
136+
16 threads and 32 connections
137+
Thread calibration: mean lat.: 0.986ms, rate sampling interval: 10ms
138+
Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
139+
Thread calibration: mean lat.: 0.999ms, rate sampling interval: 10ms
140+
Thread calibration: mean lat.: 0.994ms, rate sampling interval: 10ms
141+
Thread calibration: mean lat.: 0.983ms, rate sampling interval: 10ms
142+
Thread calibration: mean lat.: 0.989ms, rate sampling interval: 10ms
143+
Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
144+
Thread calibration: mean lat.: 0.993ms, rate sampling interval: 10ms
145+
Thread calibration: mean lat.: 0.985ms, rate sampling interval: 10ms
146+
Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
147+
Thread calibration: mean lat.: 0.987ms, rate sampling interval: 10ms
148+
Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
149+
Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
150+
Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
151+
Thread calibration: mean lat.: 0.978ms, rate sampling interval: 10ms
152+
Thread calibration: mean lat.: 0.976ms, rate sampling interval: 10ms
153+
Thread Stats Avg Stdev Max +/- Stdev
154+
Latency 1.00ms 454.90us 5.09ms 63.98%
155+
Req/Sec 3.31k 241.68 4.89k 63.83%
156+
2999817 requests in 1.00m, 1.56GB read
157+
Requests/sec: 49997.08
158+
Transfer/sec: 26.57MB
159+
```
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
---
2+
title: Optimal baseline before tuning
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
{{% notice Note %}}
10+
To achieve maximum performance, ulimit -n 65535 must be executed on both server and client!
11+
{{% /notice %}}
12+
13+
## Optimal baseline before tuning
14+
- Baseline on Grace bare-metal (default configuration)
15+
- Baseline on Grace bare-metal (access logging disabled)
16+
- Baseline on Grace bare-metal (optimal thread count)
17+
18+
### Baseline on Grace bare-metal (default configuration)
19+
{{% notice Note %}}
20+
To align with the typical deployment scenario of Tomcat, reserve 8 cores online and set all other cores offline
21+
{{% /notice %}}
22+
23+
1. You can offline the CPU cores using the below command.
24+
```bash
25+
for no in {8..143}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
26+
```
27+
2. Use the following commands to verify that cores 0-7 are online and the remaining cores are offline.
28+
```bash
29+
lscpu
30+
```
31+
You can check the following information:
32+
```bash
33+
Architecture: aarch64
34+
CPU op-mode(s): 64-bit
35+
Byte Order: Little Endian
36+
CPU(s): 144
37+
On-line CPU(s) list: 0-7
38+
Off-line CPU(s) list: 8-143
39+
Vendor ID: ARM
40+
Model name: Neoverse-V2
41+
...
42+
```
43+
44+
3. Use the following command on the Grace bare-metal where `Tomcat` is on
45+
```bash
46+
~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
47+
ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
48+
```
49+
50+
4. And use the following command on the `x86_64` bare-metal where `wrk2` is on
51+
```bash
52+
tomcat_ip=10.169.226.181
53+
```
54+
```bash
55+
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
56+
```
57+
58+
The result of default configuration is:
59+
```bash
60+
Thread Stats Avg Stdev Max +/- Stdev
61+
Latency 13.29s 3.25s 19.07s 57.79%
62+
Req/Sec 347.59 430.94 0.97k 66.67%
63+
3035300 requests in 1.00m, 1.58GB read
64+
Socket errors: connect 1280, read 0, write 0, timeout 21760
65+
Requests/sec: 50517.09
66+
Transfer/sec: 26.84MB
67+
```
68+
69+
### Baseline on Grace bare-metal (access logging disabled)
70+
To disable the access logging, use a text editor to modify the `server.xml` file by commenting out or removing the **`org.apache.catalina.valves.AccessLogValve`** configuration.
71+
72+
The file is at:
73+
```bash
74+
vi ~/apache-tomcat-11.0.9/conf/server.xml
75+
```
76+
77+
The configuratin is at the end of the file, and common out or remove it.
78+
```xml
79+
<!--
80+
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
81+
prefix="localhost_access_log" suffix=".txt"
82+
pattern="%h %l %u %t &quot;%r&quot; %s %b" />
83+
-->
84+
```
85+
86+
1. Use the following command on the Grace bare-metal where `Tomcat` is on
87+
```bash
88+
~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
89+
ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
90+
```
91+
92+
2. And use the following command on the `x86_64` bare-metal where `wrk2` is on
93+
```bash
94+
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
95+
```
96+
97+
The result of access logging disabled is:
98+
```bash
99+
Thread Stats Avg Stdev Max +/- Stdev
100+
Latency 12.66s 3.05s 17.87s 57.47%
101+
Req/Sec 433.69 524.91 1.18k 66.67%
102+
3572200 requests in 1.00m, 1.85GB read
103+
Socket errors: connect 1280, read 0, write 0, timeout 21760
104+
Requests/sec: 59451.85
105+
Transfer/sec: 31.59MB
106+
```
107+
108+
### Baseline on Grace bare-metal (optimal thread count)
109+
To minimize resource contention between threads and overhead from thread context switching, the number of CPU-intensive threads in Tomcat should be aligned with the number of CPU cores.
110+
111+
1. When using `wrk` to perform pressure testing on `Tomcat`:
112+
```bash
113+
top -H -p$(pgrep java)
114+
```
115+
116+
You can see the below information
117+
```bash
118+
top - 12:12:45 up 1 day, 7:04, 5 users, load average: 7.22, 3.46, 1.75
119+
Threads: 79 total, 8 running, 71 sleeping, 0 stopped, 0 zombie
120+
%Cpu(s): 3.4 us, 1.9 sy, 0.0 ni, 94.1 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
121+
MiB Mem : 964975.5 total, 602205.6 free, 12189.5 used, 356708.3 buff/cache
122+
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 952786.0 avail Mem
123+
124+
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
125+
53254 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.70 http-nio-8080-e
126+
53255 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.62 http-nio-8080-e
127+
53256 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.64 http-nio-8080-e
128+
53258 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.62 http-nio-8080-e
129+
53260 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.69 http-nio-8080-e
130+
53257 yinyu01 20 0 38.0g 1.4g 28288 R 96.3 0.1 2:30.59 http-nio-8080-e
131+
53259 yinyu01 20 0 38.0g 1.4g 28288 R 96.3 0.1 2:30.63 http-nio-8080-e
132+
53309 yinyu01 20 0 38.0g 1.4g 28288 R 95.3 0.1 2:29.69 http-nio-8080-P
133+
53231 yinyu01 20 0 38.0g 1.4g 28288 S 0.3 0.1 0:00.10 VM Thread
134+
53262 yinyu01 20 0 38.0g 1.4g 28288 S 0.3 0.1 0:00.12 GC Thread#2
135+
```
136+
137+
It can be observed that **`http-nio-8080-e`** and **`http-nio-8080-P`** threads are CPU-intensive.
138+
Since the __`http-nio-8080-P`__ thread is fixed at 1 in current version of Tomcat, and the current number of CPU cores is 8, the http-nio-8080-e thread count should be configured to 7.
139+
140+
To configure the `http-nio-8080-e` thread count, use a text editor to modify the `context.xml` file by updating the `<Connector port="8080" protocol="HTTP/1.1"` configuration.
141+
142+
The file is at:
143+
```bash
144+
vi ~/apache-tomcat-11.0.9/conf/server.xml
145+
```
146+
147+
148+
```xml
149+
<!-- Before -->
150+
<Connector port="8080" protocol="HTTP/1.1"
151+
connectionTimeout="20000"
152+
redirectPort="8443" />
153+
```
154+
155+
```xml
156+
<!-- After -->
157+
<Connector port="8080" protocol="HTTP/1.1"
158+
connectionTimeout="20000"
159+
redirectPort="8443"
160+
minSpareThreads="7"
161+
maxThreads="7"
162+
maxKeepAliveRequests="500000"
163+
maxConnections="100000"
164+
/>
165+
```
166+
167+
2. Use the following command on the Grace bare-metal where `Tomcat` is on
168+
```bash
169+
~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
170+
ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
171+
```
172+
173+
3. And use the following command on the `x86_64` bare-metal where `wrk2` is on
174+
```bash
175+
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
176+
```
177+
178+
The result of optimal thread count is:
179+
```bash
180+
Thread Stats Avg Stdev Max +/- Stdev
181+
Latency 24.34s 9.91s 41.81s 57.77%
182+
Req/Sec 1.22k 4.29 1.23k 71.09%
183+
9255672 requests in 1.00m, 4.80GB read
184+
Requests/sec: 154479.07
185+
Transfer/sec: 82.06MB
186+
```

0 commit comments

Comments
 (0)