Add AWS nested virtualization test harness by rgarcia · Pull Request #100 · kernel/hypeman

rgarcia · 2026-02-13T17:49:42Z

Summary

Adds a Go test program at tests/aws/ that launches c8id instances with AWS's new nested virtualization feature (CpuOptions.NestedVirtualization=enabled), installs hypeman, and benchmarks VM spin-up times and CoreMark performance
Supports both regular nested-virt instances and bare metal for comparison, with auto-detection of Debian AMIs, security groups, and subnets
Runs smoke tests with both Cloud Hypervisor and QEMU hypervisors, plus CoreMark CPU benchmarks on host and inside VMs

Results

Metric	Bare Metal (c8id.metal-48xl)	Nested Virt (c8id.4xlarge)
Boot to Running	18s	18s
Boot to SSH	1m 29s	48s
Hypeman Install	4m 19s	2m 9s
Cloud Hypervisor Launch	250ms	191ms
QEMU Launch	410ms	318ms
CoreMark Host	33,300 iter/s	32,806 iter/s
CoreMark VM	32,854 iter/s (1.3% overhead)	N/A (L2 VMs crash)

Key Findings

Nested virt works for L1: Both Cloud Hypervisor and QEMU VMs launch and run successfully inside nested-virt instances. Host CPU performance is within ~1.5% of bare metal.
L2 VMs (VM-inside-VM) crash immediately: Both Cloud Hypervisor (enters Shutdown state) and QEMU (socket connection refused — process exits) fail when trying to run a VM inside a VM on nested-virt instances. This means you can't run VMs inside VMs on these instances.
Faster time-to-first-VM: Nested virt instances reach SSH ~40s faster than bare metal and install hypeman ~2 min faster, likely due to the overhead of bare metal hardware initialization.
Same per-vCPU pricing: No premium for nested virtualization, but you can use smaller instance sizes (e.g., c8id.4xlarge at ~$0.88/hr vs c8id.metal-48xl at ~$10.56/hr) if you only need a few VMs.

Usage

cd tests/aws
go run main.go \
  --instance-type c8id.4xlarge \
  --key-name <your-key> \
  --key-path <path-to-pem> \
  --profile <aws-profile>

Test plan

Tested on c8id.4xlarge (nested virt) — smoke tests pass, CoreMark host runs, VM benchmark gracefully reports N/A
Tested on c8id.metal-48xl (bare metal) — smoke tests pass, CoreMark host + VM both run, 1.3% overhead measured
Verified cleanup: instances terminated and security groups deleted after each run

🤖 Generated with Claude Code

Note

Medium Risk
Adds new code that programmatically creates and tears down AWS resources (instances/security groups) and executes remote commands, so misconfiguration can lead to unexpected cost or security exposure if run improperly, though it’s isolated to tests/aws/.

Overview
Adds a standalone Go-based AWS test harness under tests/aws/ that provisions an EC2 instance (optionally enabling CpuOptions.NestedVirtualization), installs hypeman via cloud-init, and validates /dev/kvm + service health over SSH.

The harness can auto-resolve a Debian 12 AMI and default subnet, create a temporary SSH-only security group, run smoke tests for both cloud-hypervisor and QEMU (using hypeman exec for verification), and optionally run VM launch latency and CoreMark benchmarks before cleaning up resources (or keeping the instance via --keep).

^{Written by Cursor Bugbot for commit 9a1f936. This will update automatically on new commits. Configure here.}

Go program that launches a c8id instance with nested virtualization (CpuOptions.NestedVirtualization=enabled), installs hypeman, and runs smoke tests with both Cloud Hypervisor and QEMU, plus CoreMark benchmarks. Key findings from testing: - c8id.4xlarge: 18s boot, 48s SSH, ~2m install, CH 191ms, QEMU 318ms - c8id.metal-48xl: 18s boot, 1m29s SSH, ~4m install, CH 250ms, QEMU 410ms - Host CoreMark: ~32,800-33,300 iter/s (bare metal vs nested nearly identical) - L1 VM CoreMark on bare metal: 32,854 iter/s (1.3% overhead) - L2 VMs (VM-inside-VM) crash immediately on nested virt instances (both Cloud Hypervisor and QEMU — the QEMU process exits, socket refused) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-02-13T17:51:09Z

tests/aws/main.go

+			// Treat as warning if we at least got the host score.
+			if hostScore > 0 {
+				logf("CoreMark VM benchmark failed (host score available): %v", err)
+			} else {


VM benchmark failures are silently ignored

Medium Severity

run() treats any runCoreMark VM-side failure as a warning whenever hostScore > 0, so the program exits successfully even when the VM benchmark is broken. This hides real regressions in hypeman/hypervisor behavior and can report misleading benchmark results from tests/aws/main.go.

The rmmod/modprobe approach for disabling APICv corrupts VMX state and makes VM crashes significantly more frequent. Replace with modprobe.d config that takes effect on reboot. Through extensive testing, identified the root cause of nested virt VM crashes: TAP networking triggers a Nitro hypervisor bug where VMCS VM-Exit interrupt info is set to 0xffffffff. VMs without TAP (user-mode networking, vsock-only) work fine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.}

cursor · 2026-02-13T20:35:21Z

tests/aws/main.go

+
+	if len(chTimes) == 0 && len(qemuTimes) == 0 {
+		return nil, nil, fmt.Errorf("all launch benchmark iterations failed")
+	}


Partial benchmark failures misreported as valid stats

Medium Severity

runLaunchBenchmark only returns an error when both result slices are empty. If one hypervisor has zero successful iterations, computeStats gets an empty slice and returns zero durations, and the summary prints 0s metrics as if they were real benchmark results.

Additional Locations (1)

tests/aws/main.go#L405-L418

cursor · 2026-02-13T20:35:21Z

tests/aws/main.go

+					continue
+				}
+				if strings.Contains(full, "ready") {
+					return nil


Image readiness check can return too early

Low Severity

waitForImageReady verifies alpine appears in hypeman image list -q, but then returns success if the full image list contains ready anywhere. A different image being ready can satisfy this check while alpine is still not ready.

cursor · 2026-02-13T20:35:21Z

tests/aws/main.go

+				GroupId: aws.String(createdSGID),
+			}); err != nil {
+				logf("Warning: failed to delete security group: %v", err)
+			}


Keep mode leaks temporary security groups

Low Severity

When --keep is set, the code keeps the instance but still tries to delete createdSGID. AWS rejects deleting a security group attached to a running instance, so the temporary group is left behind. This causes predictable resource leakage in keep-mode runs.

Additional Locations (1)

tests/aws/main.go#L97-L100

cursor · 2026-02-13T20:38:59Z

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

✅ Fixed: Partial benchmark failures misreported as valid stats
- Added guards to only compute and display stats for hypervisors with non-empty result slices, showing explicit failure messages instead of misleading 0s values.
✅ Fixed: Image readiness check can return too early
- Changed the readiness check to verify that the specific line containing 'alpine' also contains 'ready', instead of checking for 'ready' anywhere in the full image list.
✅ Fixed: Keep mode leaks temporary security groups
- When --keep is set and the instance is kept running, the security group deletion is now skipped entirely since the running instance still references it.

Or push these changes by commenting:

@cursor push f75d0b7a3c

Preview (f75d0b7a3c)

diff --git a/tests/aws/main.go b/tests/aws/main.go
--- a/tests/aws/main.go
+++ b/tests/aws/main.go
@@ -99,19 +99,25 @@
 			logf("Keeping instance %s (--keep flag set)", instanceID)
 		}
 		if createdSGID != "" {
-			if instanceID != "" && !*keep {
-				logf("Waiting for instance to terminate before deleting security group...")
-				w := ec2.NewInstanceTerminatedWaiter(svc)
-				_ = w.Wait(cleanCtx, &ec2.DescribeInstancesInput{
-					InstanceIds: []string{instanceID},
-				}, 5*time.Minute)
+			if instanceID != "" && *keep {
+				// Instance is kept running and still references the SG;
+				// attempting to delete it would fail, so skip cleanup.
+				logf("Keeping security group %s (attached to kept instance %s)", createdSGID, instanceID)
+			} else {
+				if instanceID != "" {
+					logf("Waiting for instance to terminate before deleting security group...")
+					w := ec2.NewInstanceTerminatedWaiter(svc)
+					_ = w.Wait(cleanCtx, &ec2.DescribeInstancesInput{
+						InstanceIds: []string{instanceID},
+					}, 5*time.Minute)
+				}
+				logf("Deleting security group %s...", createdSGID)
+				if _, err := svc.DeleteSecurityGroup(cleanCtx, &ec2.DeleteSecurityGroupInput{
+					GroupId: aws.String(createdSGID),
+				}); err != nil {
+					logf("Warning: failed to delete security group: %v", err)
+				}
 			}
-			logf("Deleting security group %s...", createdSGID)
-			if _, err := svc.DeleteSecurityGroup(cleanCtx, &ec2.DeleteSecurityGroupInput{
-				GroupId: aws.String(createdSGID),
-			}); err != nil {
-				logf("Warning: failed to delete security group: %v", err)
-			}
 		}
 	}()
 
@@ -403,31 +409,39 @@
 			logf("Launch benchmark failed: %v", err)
 			return 1
 		}
-		chStats := computeStats(chLaunchTimes)
-		qemuStats := computeStats(qemuLaunchTimes)
 
 		fmt.Println()
 		logf("VM Launch Benchmark (50 iterations):")
-		logf("  Cloud Hypervisor: median=%s avg=%s min=%s max=%s p95=%s",
-			chStats.median.Round(time.Millisecond), chStats.avg.Round(time.Millisecond),
-			chStats.min.Round(time.Millisecond), chStats.max.Round(time.Millisecond),
-			chStats.p95.Round(time.Millisecond))
-		logf("  QEMU:             median=%s avg=%s min=%s max=%s p95=%s",
-			qemuStats.median.Round(time.Millisecond), qemuStats.avg.Round(time.Millisecond),
-			qemuStats.min.Round(time.Millisecond), qemuStats.max.Round(time.Millisecond),
-			qemuStats.p95.Round(time.Millisecond))
 
-		// Store stats for final summary.
-		timings["ch_median"] = chStats.median
-		timings["ch_avg"] = chStats.avg
-		timings["ch_min"] = chStats.min
-		timings["ch_max"] = chStats.max
-		timings["ch_p95"] = chStats.p95
-		timings["qemu_median"] = qemuStats.median
-		timings["qemu_avg"] = qemuStats.avg
-		timings["qemu_min"] = qemuStats.min
-		timings["qemu_max"] = qemuStats.max
-		timings["qemu_p95"] = qemuStats.p95
+		if len(chLaunchTimes) > 0 {
+			chStats := computeStats(chLaunchTimes)
+			logf("  Cloud Hypervisor: median=%s avg=%s min=%s max=%s p95=%s",
+				chStats.median.Round(time.Millisecond), chStats.avg.Round(time.Millisecond),
+				chStats.min.Round(time.Millisecond), chStats.max.Round(time.Millisecond),
+				chStats.p95.Round(time.Millisecond))
+			timings["ch_median"] = chStats.median
+			timings["ch_avg"] = chStats.avg
+			timings["ch_min"] = chStats.min
+			timings["ch_max"] = chStats.max
+			timings["ch_p95"] = chStats.p95
+		} else {
+			logf("  Cloud Hypervisor: all iterations failed — no stats available")
+		}
+
+		if len(qemuLaunchTimes) > 0 {
+			qemuStats := computeStats(qemuLaunchTimes)
+			logf("  QEMU:             median=%s avg=%s min=%s max=%s p95=%s",
+				qemuStats.median.Round(time.Millisecond), qemuStats.avg.Round(time.Millisecond),
+				qemuStats.min.Round(time.Millisecond), qemuStats.max.Round(time.Millisecond),
+				qemuStats.p95.Round(time.Millisecond))
+			timings["qemu_median"] = qemuStats.median
+			timings["qemu_avg"] = qemuStats.avg
+			timings["qemu_min"] = qemuStats.min
+			timings["qemu_max"] = qemuStats.max
+			timings["qemu_p95"] = qemuStats.p95
+		} else {
+			logf("  QEMU:             all iterations failed — no stats available")
+		}
 	} else {
 		logf("Skipping smoke test (--skip-smoke)")
 	}
@@ -466,23 +480,33 @@
 	if _, ok := timings["smoke"]; ok {
 		fmt.Printf("  Launch -> Smoke Test:   %s\n", timings["smoke"].Round(time.Second))
 	}
-	if _, ok := timings["ch_median"]; ok {
+	_, hasCH := timings["ch_median"]
+	_, hasQEMU := timings["qemu_median"]
+	if hasCH || hasQEMU {
 		fmt.Println()
 		fmt.Println("VM Launch Latency (50 iterations):")
-		fmt.Println("  Cloud Hypervisor:")
-		fmt.Printf("    Median: %s  Avg: %s  Min: %s  Max: %s  P95: %s\n",
-			timings["ch_median"].Round(time.Millisecond),
-			timings["ch_avg"].Round(time.Millisecond),
-			timings["ch_min"].Round(time.Millisecond),
-			timings["ch_max"].Round(time.Millisecond),
-			timings["ch_p95"].Round(time.Millisecond))
-		fmt.Println("  QEMU:")
-		fmt.Printf("    Median: %s  Avg: %s  Min: %s  Max: %s  P95: %s\n",
-			timings["qemu_median"].Round(time.Millisecond),
-			timings["qemu_avg"].Round(time.Millisecond),
-			timings["qemu_min"].Round(time.Millisecond),
-			timings["qemu_max"].Round(time.Millisecond),
-			timings["qemu_p95"].Round(time.Millisecond))
+		if hasCH {
+			fmt.Println("  Cloud Hypervisor:")
+			fmt.Printf("    Median: %s  Avg: %s  Min: %s  Max: %s  P95: %s\n",
+				timings["ch_median"].Round(time.Millisecond),
+				timings["ch_avg"].Round(time.Millisecond),
+				timings["ch_min"].Round(time.Millisecond),
+				timings["ch_max"].Round(time.Millisecond),
+				timings["ch_p95"].Round(time.Millisecond))
+		} else {
+			fmt.Println("  Cloud Hypervisor: all iterations failed")
+		}
+		if hasQEMU {
+			fmt.Println("  QEMU:")
+			fmt.Printf("    Median: %s  Avg: %s  Min: %s  Max: %s  P95: %s\n",
+				timings["qemu_median"].Round(time.Millisecond),
+				timings["qemu_avg"].Round(time.Millisecond),
+				timings["qemu_min"].Round(time.Millisecond),
+				timings["qemu_max"].Round(time.Millisecond),
+				timings["qemu_p95"].Round(time.Millisecond))
+		} else {
+			fmt.Println("  QEMU: all iterations failed")
+		}
 	}
 	if !*skipBenchmark && hostScore > 0 {
 		fmt.Println()
@@ -849,7 +873,16 @@
 				if err != nil {
 					continue
 				}
-				if strings.Contains(full, "ready") {
+				// Check that the specific alpine line shows "ready",
+				// not just any image in the list.
+				alpineReady := false
+				for _, line := range strings.Split(full, "\n") {
+					if strings.Contains(line, "alpine") && strings.Contains(line, "ready") {
+						alpineReady = true
+						break
+					}
+				}
+				if alpineReady {
 					return nil
 				}
 				logf("Image status: %s", strings.TrimSpace(full))

cursor bot reviewed Feb 13, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

cursor bot reviewed Feb 13, 2026

View reviewed changes

rgarcia closed this Feb 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add AWS nested virtualization test harness#100

Add AWS nested virtualization test harness#100
rgarcia wants to merge 2 commits intomainfrom
rgarcia/aws-nested-virt-test

rgarcia commented Feb 13, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

This comment has been minimized.

cursor bot left a comment

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

cursor bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

rgarcia commented Feb 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Key Findings

Usage

Test plan

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

VM benchmark failures are silently ignored

Uh oh!

This comment has been minimized.

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

Partial benchmark failures misreported as valid stats

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

Image readiness check can return too early

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

Keep mode leaks temporary security groups

Uh oh!

cursor bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rgarcia commented Feb 13, 2026 •

edited by cursor bot

Loading