Conversation
Go program that launches a c8id instance with nested virtualization (CpuOptions.NestedVirtualization=enabled), installs hypeman, and runs smoke tests with both Cloud Hypervisor and QEMU, plus CoreMark benchmarks. Key findings from testing: - c8id.4xlarge: 18s boot, 48s SSH, ~2m install, CH 191ms, QEMU 318ms - c8id.metal-48xl: 18s boot, 1m29s SSH, ~4m install, CH 250ms, QEMU 410ms - Host CoreMark: ~32,800-33,300 iter/s (bare metal vs nested nearly identical) - L1 VM CoreMark on bare metal: 32,854 iter/s (1.3% overhead) - L2 VMs (VM-inside-VM) crash immediately on nested virt instances (both Cloud Hypervisor and QEMU — the QEMU process exits, socket refused) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| // Treat as warning if we at least got the host score. | ||
| if hostScore > 0 { | ||
| logf("CoreMark VM benchmark failed (host score available): %v", err) | ||
| } else { |
There was a problem hiding this comment.
VM benchmark failures are silently ignored
Medium Severity
run() treats any runCoreMark VM-side failure as a warning whenever hostScore > 0, so the program exits successfully even when the VM benchmark is broken. This hides real regressions in hypeman/hypervisor behavior and can report misleading benchmark results from tests/aws/main.go.
This comment has been minimized.
This comment has been minimized.
The rmmod/modprobe approach for disabling APICv corrupts VMX state and makes VM crashes significantly more frequent. Replace with modprobe.d config that takes effect on reboot. Through extensive testing, identified the root cause of nested virt VM crashes: TAP networking triggers a Nitro hypervisor bug where VMCS VM-Exit interrupt info is set to 0xffffffff. VMs without TAP (user-mode networking, vsock-only) work fine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| if len(chTimes) == 0 && len(qemuTimes) == 0 { | ||
| return nil, nil, fmt.Errorf("all launch benchmark iterations failed") | ||
| } |
There was a problem hiding this comment.
Partial benchmark failures misreported as valid stats
Medium Severity
runLaunchBenchmark only returns an error when both result slices are empty. If one hypervisor has zero successful iterations, computeStats gets an empty slice and returns zero durations, and the summary prints 0s metrics as if they were real benchmark results.
Additional Locations (1)
| continue | ||
| } | ||
| if strings.Contains(full, "ready") { | ||
| return nil |
There was a problem hiding this comment.
| GroupId: aws.String(createdSGID), | ||
| }); err != nil { | ||
| logf("Warning: failed to delete security group: %v", err) | ||
| } |
There was a problem hiding this comment.
Keep mode leaks temporary security groups
Low Severity
When --keep is set, the code keeps the instance but still tries to delete createdSGID. AWS rejects deleting a security group attached to a running instance, so the temporary group is left behind. This causes predictable resource leakage in keep-mode runs.
Additional Locations (1)
|
Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.
Or push these changes by commenting: Preview (f75d0b7a3c)diff --git a/tests/aws/main.go b/tests/aws/main.go
--- a/tests/aws/main.go
+++ b/tests/aws/main.go
@@ -99,19 +99,25 @@
logf("Keeping instance %s (--keep flag set)", instanceID)
}
if createdSGID != "" {
- if instanceID != "" && !*keep {
- logf("Waiting for instance to terminate before deleting security group...")
- w := ec2.NewInstanceTerminatedWaiter(svc)
- _ = w.Wait(cleanCtx, &ec2.DescribeInstancesInput{
- InstanceIds: []string{instanceID},
- }, 5*time.Minute)
+ if instanceID != "" && *keep {
+ // Instance is kept running and still references the SG;
+ // attempting to delete it would fail, so skip cleanup.
+ logf("Keeping security group %s (attached to kept instance %s)", createdSGID, instanceID)
+ } else {
+ if instanceID != "" {
+ logf("Waiting for instance to terminate before deleting security group...")
+ w := ec2.NewInstanceTerminatedWaiter(svc)
+ _ = w.Wait(cleanCtx, &ec2.DescribeInstancesInput{
+ InstanceIds: []string{instanceID},
+ }, 5*time.Minute)
+ }
+ logf("Deleting security group %s...", createdSGID)
+ if _, err := svc.DeleteSecurityGroup(cleanCtx, &ec2.DeleteSecurityGroupInput{
+ GroupId: aws.String(createdSGID),
+ }); err != nil {
+ logf("Warning: failed to delete security group: %v", err)
+ }
}
- logf("Deleting security group %s...", createdSGID)
- if _, err := svc.DeleteSecurityGroup(cleanCtx, &ec2.DeleteSecurityGroupInput{
- GroupId: aws.String(createdSGID),
- }); err != nil {
- logf("Warning: failed to delete security group: %v", err)
- }
}
}()
@@ -403,31 +409,39 @@
logf("Launch benchmark failed: %v", err)
return 1
}
- chStats := computeStats(chLaunchTimes)
- qemuStats := computeStats(qemuLaunchTimes)
fmt.Println()
logf("VM Launch Benchmark (50 iterations):")
- logf(" Cloud Hypervisor: median=%s avg=%s min=%s max=%s p95=%s",
- chStats.median.Round(time.Millisecond), chStats.avg.Round(time.Millisecond),
- chStats.min.Round(time.Millisecond), chStats.max.Round(time.Millisecond),
- chStats.p95.Round(time.Millisecond))
- logf(" QEMU: median=%s avg=%s min=%s max=%s p95=%s",
- qemuStats.median.Round(time.Millisecond), qemuStats.avg.Round(time.Millisecond),
- qemuStats.min.Round(time.Millisecond), qemuStats.max.Round(time.Millisecond),
- qemuStats.p95.Round(time.Millisecond))
- // Store stats for final summary.
- timings["ch_median"] = chStats.median
- timings["ch_avg"] = chStats.avg
- timings["ch_min"] = chStats.min
- timings["ch_max"] = chStats.max
- timings["ch_p95"] = chStats.p95
- timings["qemu_median"] = qemuStats.median
- timings["qemu_avg"] = qemuStats.avg
- timings["qemu_min"] = qemuStats.min
- timings["qemu_max"] = qemuStats.max
- timings["qemu_p95"] = qemuStats.p95
+ if len(chLaunchTimes) > 0 {
+ chStats := computeStats(chLaunchTimes)
+ logf(" Cloud Hypervisor: median=%s avg=%s min=%s max=%s p95=%s",
+ chStats.median.Round(time.Millisecond), chStats.avg.Round(time.Millisecond),
+ chStats.min.Round(time.Millisecond), chStats.max.Round(time.Millisecond),
+ chStats.p95.Round(time.Millisecond))
+ timings["ch_median"] = chStats.median
+ timings["ch_avg"] = chStats.avg
+ timings["ch_min"] = chStats.min
+ timings["ch_max"] = chStats.max
+ timings["ch_p95"] = chStats.p95
+ } else {
+ logf(" Cloud Hypervisor: all iterations failed — no stats available")
+ }
+
+ if len(qemuLaunchTimes) > 0 {
+ qemuStats := computeStats(qemuLaunchTimes)
+ logf(" QEMU: median=%s avg=%s min=%s max=%s p95=%s",
+ qemuStats.median.Round(time.Millisecond), qemuStats.avg.Round(time.Millisecond),
+ qemuStats.min.Round(time.Millisecond), qemuStats.max.Round(time.Millisecond),
+ qemuStats.p95.Round(time.Millisecond))
+ timings["qemu_median"] = qemuStats.median
+ timings["qemu_avg"] = qemuStats.avg
+ timings["qemu_min"] = qemuStats.min
+ timings["qemu_max"] = qemuStats.max
+ timings["qemu_p95"] = qemuStats.p95
+ } else {
+ logf(" QEMU: all iterations failed — no stats available")
+ }
} else {
logf("Skipping smoke test (--skip-smoke)")
}
@@ -466,23 +480,33 @@
if _, ok := timings["smoke"]; ok {
fmt.Printf(" Launch -> Smoke Test: %s\n", timings["smoke"].Round(time.Second))
}
- if _, ok := timings["ch_median"]; ok {
+ _, hasCH := timings["ch_median"]
+ _, hasQEMU := timings["qemu_median"]
+ if hasCH || hasQEMU {
fmt.Println()
fmt.Println("VM Launch Latency (50 iterations):")
- fmt.Println(" Cloud Hypervisor:")
- fmt.Printf(" Median: %s Avg: %s Min: %s Max: %s P95: %s\n",
- timings["ch_median"].Round(time.Millisecond),
- timings["ch_avg"].Round(time.Millisecond),
- timings["ch_min"].Round(time.Millisecond),
- timings["ch_max"].Round(time.Millisecond),
- timings["ch_p95"].Round(time.Millisecond))
- fmt.Println(" QEMU:")
- fmt.Printf(" Median: %s Avg: %s Min: %s Max: %s P95: %s\n",
- timings["qemu_median"].Round(time.Millisecond),
- timings["qemu_avg"].Round(time.Millisecond),
- timings["qemu_min"].Round(time.Millisecond),
- timings["qemu_max"].Round(time.Millisecond),
- timings["qemu_p95"].Round(time.Millisecond))
+ if hasCH {
+ fmt.Println(" Cloud Hypervisor:")
+ fmt.Printf(" Median: %s Avg: %s Min: %s Max: %s P95: %s\n",
+ timings["ch_median"].Round(time.Millisecond),
+ timings["ch_avg"].Round(time.Millisecond),
+ timings["ch_min"].Round(time.Millisecond),
+ timings["ch_max"].Round(time.Millisecond),
+ timings["ch_p95"].Round(time.Millisecond))
+ } else {
+ fmt.Println(" Cloud Hypervisor: all iterations failed")
+ }
+ if hasQEMU {
+ fmt.Println(" QEMU:")
+ fmt.Printf(" Median: %s Avg: %s Min: %s Max: %s P95: %s\n",
+ timings["qemu_median"].Round(time.Millisecond),
+ timings["qemu_avg"].Round(time.Millisecond),
+ timings["qemu_min"].Round(time.Millisecond),
+ timings["qemu_max"].Round(time.Millisecond),
+ timings["qemu_p95"].Round(time.Millisecond))
+ } else {
+ fmt.Println(" QEMU: all iterations failed")
+ }
}
if !*skipBenchmark && hostScore > 0 {
fmt.Println()
@@ -849,7 +873,16 @@
if err != nil {
continue
}
- if strings.Contains(full, "ready") {
+ // Check that the specific alpine line shows "ready",
+ // not just any image in the list.
+ alpineReady := false
+ for _, line := range strings.Split(full, "\n") {
+ if strings.Contains(line, "alpine") && strings.Contains(line, "ready") {
+ alpineReady = true
+ break
+ }
+ }
+ if alpineReady {
return nil
}
logf("Image status: %s", strings.TrimSpace(full)) |



Summary
tests/aws/that launches c8id instances with AWS's new nested virtualization feature (CpuOptions.NestedVirtualization=enabled), installs hypeman, and benchmarks VM spin-up times and CoreMark performanceResults
Key Findings
Usage
Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Adds new code that programmatically creates and tears down AWS resources (instances/security groups) and executes remote commands, so misconfiguration can lead to unexpected cost or security exposure if run improperly, though it’s isolated to
tests/aws/.Overview
Adds a standalone Go-based AWS test harness under
tests/aws/that provisions an EC2 instance (optionally enablingCpuOptions.NestedVirtualization), installshypemanvia cloud-init, and validates/dev/kvm+ service health over SSH.The harness can auto-resolve a Debian 12 AMI and default subnet, create a temporary SSH-only security group, run smoke tests for both cloud-hypervisor and QEMU (using
hypeman execfor verification), and optionally run VM launch latency and CoreMark benchmarks before cleaning up resources (or keeping the instance via--keep).Written by Cursor Bugbot for commit 9a1f936. This will update automatically on new commits. Configure here.