qemu_guest_agent: Add new api support 'guest-get-load' #4309

6-dehan · 2025-05-14T14:07:57Z

Add a new guest agent command 'guest-get-load' to
get cpu load average info of Guest.

ID: 3534
Signed-off-by: Dehan Meng [email protected]

Summary by CodeRabbit

New Features
- Added guest load monitoring to the QEMU Guest Agent tests for Linux and Windows to measure and validate CPU load under stress.
Tests
- New test compares guest-reported vs agent-reported 1/5/15-minute load values, verifies increases during CPU stress and decreases afterward.
- Improved logging of pre-, during-, and post-stress load values for clearer diagnostics.
Chores
- Updated configuration to enable the new load-check workflow.

6-dehan · 2025-05-15T10:14:55Z

for linux guest:
(1/1) Host_RHEL.m9.u6.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.7.0.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_get_load.q35: PASS (165.03 s)

6-dehan · 2025-05-16T04:47:26Z

For Window VM:
(1/1) Host_RHEL.m9.u6.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win2025.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_get_load.q35: PASS (143.49 s)

6-dehan · 2025-05-16T04:47:48Z

@leidwang Could you please help to review this when you're free?

qemu/tests/qemu_guest_agent.py

6-dehan · 2025-09-01T08:08:52Z

Linux could be passed normally, Windows still has to adjust a little.
(3/4) Host_RHEL.m10.u1.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_get_load.q35: PASS (136.85 s)

coderabbitai · 2025-09-26T05:27:22Z

Walkthrough

Adds a new QGA test gagent_check_get_load that measures guest and QGA load averages, runs platform-specific CPU stress, and verifies load increase/decrease; also adds a check_get_load block to qemu_guest_agent.cfg with commands to retrieve loads and run stress tools.

Changes

Cohort / File(s)	Summary of edits
QGA test logic: get-load `qemu/tests/qemu_guest_agent.py`	Adds `gagent_check_get_load(self, test, params, env)` and helper methods `_get_load_stats`, `_verify_load_values`, `_log_load_values`. Implements Linux and Windows flows to capture baseline loads, trigger CPU stress, compare guest vs QGA loads, and verify increase/decrease across phases.
QGA config: get-load commands `qemu/tests/cfg/qemu_guest_agent.cfg`	Adds `check_get_load` block under `gagent_debugview_VSS_DLL` defining `gagent_check_type = get_load`, `cmd_get_load`, `cmd_install_stressng`, `cmd_run_stress` and a Windows PowerShell stress variant.
Docstring formatting changes `qemu/tests/qemu_guest_agent.py` (multiple locations)	Minor docstring wording/formatting adjustments (e.g., “:param env: Dictionary with test environment.” → “:param env: Dictionary with the test environment”). No runtime behavior changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant TR as Test Runner
  participant T as QemuGuestAgentTest
  participant G as QGA
  participant S as Guest Session
  participant ST as Stress Tool

  TR->>T: gagent_check_get_load(test, params, env)
  T->>S: open session
  alt Linux
    T->>S: cmd_get_load (baseline)
    S-->>T: guest_load_baseline
    T->>G: gagent.get_load (baseline)
    G-->>T: qga_load_baseline
    T->>S: cmd_install_stressng
    T->>S: cmd_run_stress (start ST async)
    S->>ST: start workload (async)
    T->>S: cmd_get_load (under load)
    S-->>T: guest_load_stress
    T->>G: gagent.get_load (under load)
    G-->>T: qga_load_stress
    T->>T: verify increase
    ST-->>S: complete
    T->>S: cmd_get_load (post)
    S-->>T: guest_load_post
    T->>G: gagent.get_load (post)
    G-->>T: qga_load_post
    T->>T: verify decrease
  else Windows
    T->>G: gagent.get_load (baseline)
    G-->>T: qga_load_baseline
    T->>S: start PowerShell stress job (async)
    S->>ST: start workload (async)
    T->>G: gagent.get_load (under load)
    G-->>T: qga_load_stress
    T->>T: verify increase
    ST-->>S: complete
    T->>G: gagent.get_load (post)
    G-->>T: qga_load_post
  end
  T->>S: close session
  T-->>TR: test result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers, tap the keys,
I coax the CPU up from its ease.
One, five, fifteen — the numbers play,
I watch them climb, then fade away.
Hoppity hops for tests today! 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly identifies the qemu_guest_agent component and succinctly describes the primary addition of support for the new ‘guest-get-load’ API command, which aligns directly with the pull request’s main objective. It is concise, specific, and free of extraneous detail, ensuring that reviewers can immediately understand the key change.
Docstring Coverage	✅ Passed	Docstring coverage is 93.42% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 870c1ea and 6afbcbf.

📒 Files selected for processing (2)

qemu/tests/cfg/qemu_guest_agent.cfg (1 hunks)
qemu/tests/qemu_guest_agent.py (10 hunks)

🔇 Additional comments (1)

qemu/tests/cfg/qemu_guest_agent.cfg (1)

675-675: Confirm cleanup of Windows stress jobs

This PowerShell snippet spawns 12 background jobs that loop forever. If they aren't explicitly stopped later, they'll keep chewing CPU after the test finishes and bleed into subsequent cases. Please confirm the test logic tears them down (e.g. Get-Job | Stop-Job -Force; Get-Job | Remove-Job); otherwise, add a cleanup step.

qemu/tests/cfg/qemu_guest_agent.cfg

qemu/tests/qemu_guest_agent.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

qemu/tests/qemu_guest_agent.py (3)
2109-2146: Make prev_values explicit in _verify_load_values.

The helper reads prev_values from closure, which is implicit and brittle. Pass it as a parameter for clarity and easier testing.

Apply this diff and update call sites:
-def _verify_load_values(qga_vals, guest_vals, check_type="match"):
+def _verify_load_values(qga_vals, guest_vals, check_type="match", prev_values=None):
@@
-            if check_type != "match" and prev_values:
+            if check_type != "match" and prev_values:
And later:
-            if errors := _verify_load_values(qga_vals, guest_vals):
+            if errors := _verify_load_values(qga_vals, guest_vals, prev_values=prev_values):
@@
-            if errors := _verify_load_values(qga_vals, guest_vals, "increase"):
+            if errors := _verify_load_values(qga_vals, guest_vals, "increase", prev_values=prev_values):
@@
-            if errors := _verify_load_values(qga_vals, guest_vals, "decrease"):
+            if errors := _verify_load_values(qga_vals, guest_vals, "decrease", prev_values=prev_values):
2208-2214: Prefer package helper or presence check before installing stress-ng.

Hard-failing on install can be flaky across distros/repos. Consider utils_package.package_install("stress-ng", session) or first checking if stress-ng is already present, then installing with a distro-agnostic helper.

2212-2227: Sleep-based timing is brittle; poll until condition or timeout.

Fixed sleeps (25s/60s) can be too short or long. Prefer polling for load crossing a threshold or for stress-ng to exit (e.g., utils_misc.wait_for with a predicate).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c264cc and c6efd4e.

📒 Files selected for processing (2)

qemu/tests/cfg/qemu_guest_agent.cfg (1 hunks)
qemu/tests/qemu_guest_agent.py (9 hunks)

🔇 Additional comments (1)

qemu/tests/cfg/qemu_guest_agent.cfg (1)
672-672: Fix stress-ng installation for non-DNF guests (fallback to yum).

RHEL 7 doesn't have dnf; current command will fail there. Use a simple dnf→yum fallback.

Apply this diff:
-            cmd_install_stressng = "dnf -y install stress-ng"
+            cmd_install_stressng = "if command -v dnf; then dnf -y install stress-ng; else yum -y install stress-ng; fi"

qemu/tests/cfg/qemu_guest_agent.cfg

qemu/tests/qemu_guest_agent.py

6-dehan · 2025-09-26T08:00:45Z

Hi @leidwang, Linux always passes. Currently, Windows has a product issue that I'm discussing with Dev. so the automation code is fine I think.

(1/2) Host_RHEL.m10.u1.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_get_load.q35: PASS (151.52 s)

arka-majumd

typos

qemu/tests/qemu_guest_agent.py

1. Add a new guest agent command 'guest-get-load' to get cpu load average info of Guest. 2. Randomly sorted out non-standard comments Signed-off-by: Dehan Meng <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

qemu/tests/qemu_guest_agent.py (1)

2160-2195: Windows stress jobs persist; add cleanup and verify post-stress decrease

Start-Job workers are never stopped and the test doesn’t check that load falls after stress. Add explicit cleanup and a decrease check; also run cleanup in finally. This avoids leaving the VM pegged and flaking follow-on tests.

Apply:

@@
-        if params.get("os_type") == "windows":
+        if params.get("os_type") == "windows":
             error_context.context("Get load info for Windows", LOG_JOB.info)
-            try:
+            # Stop any leftover stress jobs by command pattern used in cmd_run_stress
+            cleanup_cmd = (
+                'powershell "$jobs = Get-Job | Where-Object { $_.Command -like '
+                "''*sqrt(999999)*'' }; if ($jobs) { $jobs | Stop-Job -Force; "
+                '$jobs | Remove-Job }"'
+            )
+            try:
                 # Get initial load values
                 load_info = self.gagent.get_load()
@@
                 LOG_JOB.info(
                     "Load value increased as expected: before=%s, after=%s",
                     initial_load,
                     stress_load,
                 )
+                # Stop stress jobs and verify load decreases
+                session.cmd(cleanup_cmd)
+                time.sleep(20)
+                after_info = self.gagent.get_load()
+                after_load = after_info["load1m"]
+                LOG_JOB.info("Load info after stopping stress: %s", after_info)
+                if after_load >= stress_load:
+                    test.fail(
+                        f"Load value did not decrease after stopping stress:"
+                        f" stress={stress_load}, after={after_load}"
+                    )
             except guest_agent.VAgentCmdError as e:
                 test.fail(f"guest-get-load command failed: {e}")
+            finally:
+                # Best-effort cleanup even on failure
+                session.cmd(cleanup_cmd)

To verify locally, rerun the Windows variant and confirm jobs are gone after the test and load1m drops below the stressed value.

🧹 Nitpick comments (3)

qemu/tests/qemu_guest_agent.py (3)

2109-2146: Make prev_values an explicit parameter of _verify_load_values

The helper implicitly closes over prev_values. Pass it explicitly to avoid hidden state and misuse.

-def _verify_load_values(qga_vals, guest_vals, check_type="match"):
+def _verify_load_values(qga_vals, guest_vals, check_type="match", prev_values=None):
@@
-    if check_type != "match" and prev_values:
+    if check_type != "match":
+        if not prev_values:
+            return ["prev_values is required when check_type != 'match'"]
         qga_1m = qga_vals[0]
         guest_1m = guest_vals[0]
         prev_qga_1m = prev_values["qga"][0]
         prev_guest_1m = prev_values["guest"][0]

And pass it at call sites:

- if errors := _verify_load_values(qga_vals, guest_vals, "increase"):
+ if errors := _verify_load_values(qga_vals, guest_vals, "increase", prev_values):
@@
- if errors := _verify_load_values(qga_vals, guest_vals, "decrease"):
+ if errors := _verify_load_values(qga_vals, guest_vals, "decrease", prev_values):

Also applies to: 2220-2221, 2232-2233

2095-2101: Harden guest load parsing to tolerate formats like ‘uptime’

If cmd_get_load returns “load average: 0.10, 0.05, 0.01”, split() will leave commas and raise ValueError. Use a regex to extract the first three numbers.

-                try:
-                    loads = session.cmd_output(params["cmd_get_load"]).strip().split()
-                    return tuple(round(float(x), 2) for x in loads[:3])
+                try:
+                    out = session.cmd_output(params["cmd_get_load"]).strip()
+                    # capture floats/ints, ignore commas/labels
+                    nums = re.findall(r"[-+]?(?:\d*\.\d+|\d+)", out)
+                    return tuple(round(float(x), 2) for x in nums[:3])

2172-2176: Reduce fixed sleeps; prefer polling with a deadline

Replace sleeps with utils_misc.wait_for to reduce flakiness and overall wall time. For example, wait until load1m exceeds baseline by a delta, then proceed; later wait until it drops below a threshold.

Example pattern:

# Wait up to 30s for increase of >= 0.5
utils_misc.wait_for(lambda: self.gagent.get_load()["load1m"] >= prev_values["qga"][0] + 0.5, 30, 2)

# After stopping stress, wait up to 60s for decrease
utils_misc.wait_for(lambda: self.gagent.get_load()["load1m"] <= stress_load - 0.3, 60, 3)

Please confirm acceptable deltas for your environments.

Also applies to: 2208-2214, 2225-2230

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6efd4e and a5d7ca3.

📒 Files selected for processing (2)

qemu/tests/cfg/qemu_guest_agent.cfg (1 hunks)
qemu/tests/qemu_guest_agent.py (56 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

qemu/tests/cfg/qemu_guest_agent.cfg

🔇 Additional comments (1)

qemu/tests/qemu_guest_agent.py (1)

457-457: Docstring/env param wording updates

Spelling/consistency fixes to “:param env: Dictionary with the test environment” across methods look good. No further action.

Also applies to: 484-484, 512-512, 528-528, 665-665, 685-685, 729-729, 779-779, 821-821, 960-960, 997-997, 1085-1085, 1174-1174, 1369-1369, 1655-1655, 1732-1732, 1905-1905, 1971-1971, 2069-2071, 2242-2242, 2381-2381, 2493-2493, 2625-2625, 2759-2759, 2920-2920, 3057-3057, 3089-3089, 3090-3090, 3182-3182, 3242-3242, 3261-3261, 3368-3369, 3555-3556, 3850-3850, 3880-3880, 3921-3921, 4049-4053, 4071-4071, 4121-4121, 4182-4182, 4561-4561, 4688-4689, 4790-4790, 4956-4957, 5030-5031, 5142-5143, 5184-5185, 5257-5258

coderabbitai · 2025-09-30T15:57:58Z

qemu/tests/qemu_guest_agent.py

+            error_context.context("Starting CPU stress test", LOG_JOB.info)
+            s, o = session.cmd_status_output(params["cmd_install_stressng"])
+            if s != 0:
+                test.error(f"Failed to install stress-ng: {o}")
+            session.cmd(params["cmd_run_stress"])
+            time.sleep(25)
+
+            guest_vals = _get_load_stats(session)
+            qga_vals = _get_load_stats(session, False)
+
+            _log_load_values(guest_vals, qga_vals, "Under stress")
+
+            if errors := _verify_load_values(qga_vals, guest_vals, "increase"):
+                test.fail("Stress test load check failed:\n" + "\n".join(errors))
+
+            prev_values = {"guest": guest_vals, "qga": qga_vals}
+
+            # sleep (60) wait for the stress-ng terminated.
+            time.sleep(60)
+            guest_vals = _get_load_stats(session)
+            qga_vals = _get_load_stats(session, False)
+
+            _log_load_values(guest_vals, qga_vals, "After stress")
+
+            if errors := _verify_load_values(qga_vals, guest_vals, "decrease"):
+                test.fail("Post-stress load check failed:\n" + "\n".join(errors))
+


🛠️ Refactor suggestion | 🟠 Major

Best-effort Linux stress cleanup to avoid polluting subsequent tests

If stress-ng doesn’t terminate cleanly, it may bias later readings. Add a fallback stop command after the 60s wait (configurable via params).

- # sleep (60) wait for the stress-ng terminated. - time.sleep(60) + # Wait for stress to end, then ensure it's stopped. + time.sleep(60) + stop_cmd = params.get("cmd_stop_stress", "pkill -f stress-ng || true") + session.cmd_status_output(stop_cmd)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

error_context.context("Starting CPU stress test", LOG_JOB.info)

s, o = session.cmd_status_output(params["cmd_install_stressng"])

if s != 0:

test.error(f"Failed to install stress-ng: {o}")

session.cmd(params["cmd_run_stress"])

time.sleep(25)

guest_vals = _get_load_stats(session)

qga_vals = _get_load_stats(session, False)

_log_load_values(guest_vals, qga_vals, "Under stress")

if errors := _verify_load_values(qga_vals, guest_vals, "increase"):

test.fail("Stress test load check failed:\n" + "\n".join(errors))

prev_values = {"guest": guest_vals, "qga": qga_vals}

# sleep (60) wait for the stress-ng terminated.

time.sleep(60)

guest_vals = _get_load_stats(session)

qga_vals = _get_load_stats(session, False)

_log_load_values(guest_vals, qga_vals, "After stress")

if errors := _verify_load_values(qga_vals, guest_vals, "decrease"):

test.fail("Post-stress load check failed:\n" + "\n".join(errors))

prev_values = {"guest": guest_vals, "qga": qga_vals}

# Wait for stress to end, then ensure it's stopped.

time.sleep(60)

stop_cmd = params.get("cmd_stop_stress", "pkill -f stress-ng || true")

session.cmd_status_output(stop_cmd)

guest_vals = _get_load_stats(session)

qga_vals = _get_load_stats(session, False)

_log_load_values(guest_vals, qga_vals, "After stress")

if errors := _verify_load_values(qga_vals, guest_vals, "decrease"):

test.fail("Post-stress load check failed:\n" + "\n".join(errors))

6-dehan force-pushed the KVMAUTOMA-3534-cpuload branch 7 times, most recently from 1a6487d to c9269f4 Compare May 15, 2025 08:02

6-dehan force-pushed the KVMAUTOMA-3534-cpuload branch 3 times, most recently from 5ca1445 to fd0cfe1 Compare May 16, 2025 04:45

leidwang reviewed May 27, 2025

View reviewed changes

qemu/tests/qemu_guest_agent.py Show resolved Hide resolved

leidwang reviewed Aug 1, 2025

View reviewed changes

qemu/tests/qemu_guest_agent.py Outdated Show resolved Hide resolved

6-dehan force-pushed the KVMAUTOMA-3534-cpuload branch from 7a7c8c3 to f8a677b Compare August 21, 2025 03:44

6-dehan force-pushed the KVMAUTOMA-3534-cpuload branch from f8a677b to 6afbcbf Compare September 26, 2025 05:27

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

qemu/tests/cfg/qemu_guest_agent.cfg Show resolved Hide resolved

qemu/tests/qemu_guest_agent.py Show resolved Hide resolved

6-dehan force-pushed the KVMAUTOMA-3534-cpuload branch 2 times, most recently from 0c264cc to c6efd4e Compare September 26, 2025 05:51

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

qemu/tests/cfg/qemu_guest_agent.cfg Show resolved Hide resolved

qemu/tests/qemu_guest_agent.py Show resolved Hide resolved

arka-majumd reviewed Sep 28, 2025

View reviewed changes

qemu/tests/qemu_guest_agent.py Show resolved Hide resolved

qemu/tests/qemu_guest_agent.py Show resolved Hide resolved

qemu_guest_agent: Add new api support 'guest-get-load'

a5d7ca3

1. Add a new guest agent command 'guest-get-load' to get cpu load average info of Guest. 2. Randomly sorted out non-standard comments Signed-off-by: Dehan Meng <[email protected]>

6-dehan force-pushed the KVMAUTOMA-3534-cpuload branch from c6efd4e to a5d7ca3 Compare September 30, 2025 15:51

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qemu_guest_agent: Add new api support 'guest-get-load' #4309

qemu_guest_agent: Add new api support 'guest-get-load' #4309

Uh oh!

6-dehan commented May 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

6-dehan commented May 15, 2025

Uh oh!

6-dehan commented May 16, 2025

Uh oh!

6-dehan commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

6-dehan commented Sep 1, 2025

Uh oh!

coderabbitai bot commented Sep 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

6-dehan commented Sep 26, 2025

Uh oh!

arka-majumd left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 30, 2025

Uh oh!

Uh oh!

qemu_guest_agent: Add new api support 'guest-get-load' #4309

Are you sure you want to change the base?

qemu_guest_agent: Add new api support 'guest-get-load' #4309

Uh oh!

Conversation

6-dehan commented May 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

6-dehan commented May 15, 2025

Uh oh!

6-dehan commented May 16, 2025

Uh oh!

6-dehan commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

6-dehan commented Sep 1, 2025

Uh oh!

coderabbitai bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

6-dehan commented Sep 26, 2025

Uh oh!

arka-majumd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

6-dehan commented May 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 26, 2025 •

edited

Loading