ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md
Lines changed: 159 additions & 0 deletions b/‎content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md
Lines changed: 159 additions & 0 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md
Lines changed: 186 additions & 0 deletions b/‎content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md
Lines changed: 186 additions & 0 deletions
@@ -0,0 +1,159 @@
+---
+title: Tomcat benchmark set up
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+
+## Overview 
+
+There are numerouse client-server and network-based workloads, and Tomcat is a typical example of such applications, which provide services via HTTP/HTTPS network requests.
+
+In this section, you'll set up a benchmark environment using Apache Tomcat and `wrk2` to simulate HTTP load and evaluate performance on an Arm-based bare-metal (**__`Nvidia-Grace`__**).
+
+## Set up the Tomcat benchmark server on **Nvidia Grace**
+[Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that runs Java web applications, handles HTTP requests, and serves dynamic content. It supports technologies such as Servlet, JSP, and WebSocket.
+
+## Install the Java Development Kit (JDK)
+
+Install OpenJDK 21 on your Arm-based Ubuntu 24 bare-metal: 
+
+```bash
+sudo apt update
+sudo apt install -y openjdk-21-jdk
+```
+
+## Install Tomcat 
+
+Download and extract Tomcat:
+
+```bash
+wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.9/bin/apache-tomcat-11.0.9.tar.gz
+tar xzf apache-tomcat-11.0.9.tar.gz
+```
+Alternatively, you can build Tomcat [from source](https://github.com/apache/tomcat).
+
+## Enable access to Tomcat examples
+
+To access the built-in examples from your local network or external IP, use a text editor to modify the `context.xml` file by updating the `RemoteAddrValve` configuration to allow all IP addresses.
+
+The file is at:
+```bash
+apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
+```
+
+```xml
+<!-- Before -->
+<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />
+
+<!-- After -->
+<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=".*" />
+```
+
+## Start the Tomcat server
+{{% notice Note %}}
+To achieve maximum performance of Tomcat, the maximum number of file descriptors that a single process can open simultaneously should be sufficiently large.
+{{% /notice %}}
+
+Start the server:
+
+```bash
+ulimit -n 65535 && ./apache-tomcat-11.0.9/bin/startup.sh
+```
+
+You should see output like:
+
+```output
+Using CATALINA_BASE:   /home/ubuntu/apache-tomcat-11.0.9
+Using CATALINA_HOME:   /home/ubuntu/apache-tomcat-11.0.9
+Using CATALINA_TMPDIR: /home/ubuntu/apache-tomcat-11.0.9/temp
+Using JRE_HOME:        /usr
+Using CLASSPATH:       /home/ubuntu/apache-tomcat-11.0.9/bin/bootstrap.jar:/home/ubuntu/apache-tomcat-11.0.9/bin/tomcat-juli.jar
+Using CATALINA_OPTS:
+Tomcat started.
+```
+
+## Confirm server access
+
+In your browser, open: `http://${tomcat_ip}:8080/examples`.
+
+You should see the Tomcat welcome page and examples, as shown below:
+
+![Screenshot of the Tomcat homepage showing version and welcome panel alt-text#center](./_images/lp-tomcat-homepage.png "Apache Tomcat homepage")
+
+![Screenshot of the Tomcat examples page showing servlet and JSP demo links alt-text#center](./_images/lp-tomcat-examples.png "Apache Tomcat examples")
+
+{{% notice Note %}}Make sure port 8080 is open in the security group of the IP address for your Arm-based Linux machine.{{% /notice%}}
+
+## Set up the benchmarking client using wrk2
+[Wrk2](https://github.com/giltene/wrk2) is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers.
+
+{{% notice Note %}}
+Currently `wrk2` is only supported on x86 machines. Run the benchmark client steps below on an `x86_64` server running Ubuntu 24.
+{{% /notice %}}
+
+## Install dependencies 
+
+Install the required packages:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y build-essential libssl-dev git zlib1g-dev
+```
+
+## Clone and build wrk2
+
+Clone the repository and compile the tool:
+
+```bash
+sudo git clone https://github.com/giltene/wrk2.git
+cd wrk2
+sudo make
+```
+
+Move the binary to a directory in your system’s PATH:
+ 
+```bash
+sudo cp wrk /usr/local/bin
+```
+
+## Run the benchmark
+{{% notice Note %}}
+To achieve maximum performance of wrk2, the maximum number of file descriptors that a single process can open simultaneously should be sufficiently large.
+{{% /notice %}}
+
+Use the following command to benchmark the HelloWorld servlet running on Tomcat:
+
+```bash
+ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
+```
+You should see output similar to:
+
+```console
+Running 1m test @ http://172.26.203.139:8080/examples/servlets/servlet/HelloWorldExample
+  16 threads and 32 connections
+  Thread calibration: mean lat.: 0.986ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.999ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.994ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.983ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.989ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.993ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.985ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.987ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.978ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 0.976ms, rate sampling interval: 10ms
+  Thread Stats   Avg      Stdev     Max   +/- Stdev
+    Latency     1.00ms  454.90us   5.09ms   63.98%
+    Req/Sec     3.31k   241.68     4.89k    63.83%
+  2999817 requests in 1.00m, 1.56GB read
+Requests/sec:  49997.08
+Transfer/sec:     26.57MB
+```
@@ -0,0 +1,186 @@
+---
+title: Optimal baseline before tuning
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+{{% notice Note %}}
+To achieve maximum performance, ulimit -n 65535 must be executed on both server and client!
+{{% /notice %}}
+
+## Optimal baseline before tuning
+- Baseline on Grace bare-metal (default configuration)
+- Baseline on Grace bare-metal (access logging disabled)
+- Baseline on Grace bare-metal (optimal thread count)
+
+### Baseline on Grace bare-metal (default configuration)
+{{% notice Note %}}
+To align with the typical deployment scenario of Tomcat, reserve 8 cores online and set all other cores offline
+{{% /notice %}}
+
+1. You can offline the CPU cores using the below command.
+```bash
+for no in {8..143}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
+```
+2. Use the following commands to verify that cores 0-7 are online and the remaining cores are offline.
+```bash
+lscpu
+```
+You can check the following information:
+```bash
+Architecture:             aarch64
+  CPU op-mode(s):         64-bit
+  Byte Order:             Little Endian
+CPU(s):                   144
+  On-line CPU(s) list:    0-7
+  Off-line CPU(s) list:   8-143
+Vendor ID:                ARM
+  Model name:             Neoverse-V2
+...
+```
+
+3. Use the following command on the Grace bare-metal where `Tomcat` is on
+```bash
+~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
+ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
+```
+
+4. And use the following command on the `x86_64` bare-metal where `wrk2` is on
+```bash
+tomcat_ip=10.169.226.181
+```
+```bash
+ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
+```
+
+The result of default configuration is:
+```bash
+  Thread Stats   Avg      Stdev     Max   +/- Stdev
+    Latency    13.29s     3.25s   19.07s    57.79%
+    Req/Sec   347.59    430.94     0.97k    66.67%
+  3035300 requests in 1.00m, 1.58GB read
+  Socket errors: connect 1280, read 0, write 0, timeout 21760
+Requests/sec:  50517.09
+Transfer/sec:     26.84MB
+```
+
+### Baseline on Grace bare-metal (access logging disabled)
+To disable the access logging, use a text editor to modify the `server.xml` file by commenting out or removing the **`org.apache.catalina.valves.AccessLogValve`** configuration.
+
+The file is at:
+```bash
+vi ~/apache-tomcat-11.0.9/conf/server.xml
+```
+
+The configuratin is at the end of the file, and common out or remove it.
+```xml
+<!-- 
+    <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
+        prefix="localhost_access_log" suffix=".txt"
+        pattern="%h %l %u %t &quot;%r&quot; %s %b" />
+-->
+```
+
+1. Use the following command on the Grace bare-metal where `Tomcat` is on
+```bash
+~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
+ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
+```
+
+2. And use the following command on the `x86_64` bare-metal where `wrk2` is on
+```bash
+ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
+```
+
+The result of access logging disabled is:
+```bash
+  Thread Stats   Avg      Stdev     Max   +/- Stdev
+    Latency    12.66s     3.05s   17.87s    57.47%
+    Req/Sec   433.69    524.91     1.18k    66.67%
+  3572200 requests in 1.00m, 1.85GB read
+  Socket errors: connect 1280, read 0, write 0, timeout 21760
+Requests/sec:  59451.85
+Transfer/sec:     31.59MB
+```
+
+### Baseline on Grace bare-metal (optimal thread count)
+To minimize resource contention between threads and overhead from thread context switching, the number of CPU-intensive threads in Tomcat should be aligned with the number of CPU cores.
+
+1. When using `wrk` to perform pressure testing on `Tomcat`:
+```bash
+top -H -p$(pgrep java)
+```
+
+You can see the below information
+```bash
+top - 12:12:45 up 1 day,  7:04,  5 users,  load average: 7.22, 3.46, 1.75
+Threads:  79 total,   8 running,  71 sleeping,   0 stopped,   0 zombie
+%Cpu(s):  3.4 us,  1.9 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st
+MiB Mem : 964975.5 total, 602205.6 free,  12189.5 used, 356708.3 buff/cache
+MiB Swap:      0.0 total,      0.0 free,      0.0 used. 952786.0 avail Mem
+
+    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
+  53254 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.70 http-nio-8080-e
+  53255 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.62 http-nio-8080-e
+  53256 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.64 http-nio-8080-e
+  53258 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.62 http-nio-8080-e
+  53260 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.69 http-nio-8080-e
+  53257 yinyu01   20   0   38.0g   1.4g  28288 R  96.3   0.1   2:30.59 http-nio-8080-e
+  53259 yinyu01   20   0   38.0g   1.4g  28288 R  96.3   0.1   2:30.63 http-nio-8080-e
+  53309 yinyu01   20   0   38.0g   1.4g  28288 R  95.3   0.1   2:29.69 http-nio-8080-P
+  53231 yinyu01   20   0   38.0g   1.4g  28288 S   0.3   0.1   0:00.10 VM Thread
+  53262 yinyu01   20   0   38.0g   1.4g  28288 S   0.3   0.1   0:00.12 GC Thread#2
+```
+
+It can be observed that **`http-nio-8080-e`** and **`http-nio-8080-P`** threads are CPU-intensive.
+Since the __`http-nio-8080-P`__ thread is fixed at 1 in current version of Tomcat, and the current number of CPU cores is 8, the http-nio-8080-e thread count should be configured to 7.
+
+To configure the `http-nio-8080-e` thread count, use a text editor to modify the `context.xml` file by updating the `<Connector port="8080" protocol="HTTP/1.1"` configuration.
+
+The file is at:
+```bash
+vi ~/apache-tomcat-11.0.9/conf/server.xml
+```
+
+
+```xml
+<!-- Before -->
+    <Connector port="8080" protocol="HTTP/1.1"
+               connectionTimeout="20000"
+               redirectPort="8443" />
+```
+
+```xml
+<!-- After -->
+    <Connector port="8080" protocol="HTTP/1.1"
+               connectionTimeout="20000"
+               redirectPort="8443"
+               minSpareThreads="7"
+               maxThreads="7"
+               maxKeepAliveRequests="500000"
+               maxConnections="100000"
+    />
+```
+
+2. Use the following command on the Grace bare-metal where `Tomcat` is on
+```bash
+~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
+ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
+```
+
+3. And use the following command on the `x86_64` bare-metal where `wrk2` is on
+```bash
+ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
+```
+
+The result of optimal thread count is:
+```bash
+  Thread Stats   Avg      Stdev     Max   +/- Stdev
+    Latency    24.34s     9.91s   41.81s    57.77%
+    Req/Sec     1.22k     4.29     1.23k    71.09%
+  9255672 requests in 1.00m, 4.80GB read
+Requests/sec: 154479.07
+Transfer/sec:     82.06MB
+```