-
Notifications
You must be signed in to change notification settings - Fork 182
qemu_guest_agent: Add new api support 'guest-get-load' #4309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1a6487d
to
c9269f4
Compare
for linux guest: |
5ca1445
to
fd0cfe1
Compare
For Window VM: |
@leidwang Could you please help to review this when you're free? |
7a7c8c3
to
f8a677b
Compare
Linux could be passed normally, Windows still has to adjust a little. |
f8a677b
to
6afbcbf
Compare
WalkthroughAdds a new QGA test gagent_check_get_load that measures guest and QGA load averages, runs platform-specific CPU stress, and verifies load increase/decrease; also adds a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant TR as Test Runner
participant T as QemuGuestAgentTest
participant G as QGA
participant S as Guest Session
participant ST as Stress Tool
TR->>T: gagent_check_get_load(test, params, env)
T->>S: open session
alt Linux
T->>S: cmd_get_load (baseline)
S-->>T: guest_load_baseline
T->>G: gagent.get_load (baseline)
G-->>T: qga_load_baseline
T->>S: cmd_install_stressng
T->>S: cmd_run_stress (start ST async)
S->>ST: start workload (async)
T->>S: cmd_get_load (under load)
S-->>T: guest_load_stress
T->>G: gagent.get_load (under load)
G-->>T: qga_load_stress
T->>T: verify increase
ST-->>S: complete
T->>S: cmd_get_load (post)
S-->>T: guest_load_post
T->>G: gagent.get_load (post)
G-->>T: qga_load_post
T->>T: verify decrease
else Windows
T->>G: gagent.get_load (baseline)
G-->>T: qga_load_baseline
T->>S: start PowerShell stress job (async)
S->>ST: start workload (async)
T->>G: gagent.get_load (under load)
G-->>T: qga_load_stress
T->>T: verify increase
ST-->>S: complete
T->>G: gagent.get_load (post)
G-->>T: qga_load_post
end
T->>S: close session
T-->>TR: test result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
qemu/tests/cfg/qemu_guest_agent.cfg
(1 hunks)qemu/tests/qemu_guest_agent.py
(10 hunks)
🔇 Additional comments (1)
qemu/tests/cfg/qemu_guest_agent.cfg (1)
675-675
: Confirm cleanup of Windows stress jobsThis PowerShell snippet spawns 12 background jobs that loop forever. If they aren't explicitly stopped later, they'll keep chewing CPU after the test finishes and bleed into subsequent cases. Please confirm the test logic tears them down (e.g.
Get-Job | Stop-Job -Force; Get-Job | Remove-Job
); otherwise, add a cleanup step.
0c264cc
to
c6efd4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
qemu/tests/qemu_guest_agent.py (3)
2109-2146
: Make prev_values explicit in _verify_load_values.The helper reads prev_values from closure, which is implicit and brittle. Pass it as a parameter for clarity and easier testing.
Apply this diff and update call sites:
-def _verify_load_values(qga_vals, guest_vals, check_type="match"): +def _verify_load_values(qga_vals, guest_vals, check_type="match", prev_values=None): @@ - if check_type != "match" and prev_values: + if check_type != "match" and prev_values:And later:
- if errors := _verify_load_values(qga_vals, guest_vals): + if errors := _verify_load_values(qga_vals, guest_vals, prev_values=prev_values): @@ - if errors := _verify_load_values(qga_vals, guest_vals, "increase"): + if errors := _verify_load_values(qga_vals, guest_vals, "increase", prev_values=prev_values): @@ - if errors := _verify_load_values(qga_vals, guest_vals, "decrease"): + if errors := _verify_load_values(qga_vals, guest_vals, "decrease", prev_values=prev_values):
2208-2214
: Prefer package helper or presence check before installing stress-ng.Hard-failing on install can be flaky across distros/repos. Consider utils_package.package_install("stress-ng", session) or first checking if stress-ng is already present, then installing with a distro-agnostic helper.
2212-2227
: Sleep-based timing is brittle; poll until condition or timeout.Fixed sleeps (25s/60s) can be too short or long. Prefer polling for load crossing a threshold or for stress-ng to exit (e.g., utils_misc.wait_for with a predicate).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
qemu/tests/cfg/qemu_guest_agent.cfg
(1 hunks)qemu/tests/qemu_guest_agent.py
(9 hunks)
🔇 Additional comments (1)
qemu/tests/cfg/qemu_guest_agent.cfg (1)
672-672
: Fix stress-ng installation for non-DNF guests (fallback to yum).RHEL 7 doesn't have dnf; current command will fail there. Use a simple dnf→yum fallback.
Apply this diff:
- cmd_install_stressng = "dnf -y install stress-ng" + cmd_install_stressng = "if command -v dnf; then dnf -y install stress-ng; else yum -y install stress-ng; fi"
Hi @leidwang, Linux always passes. Currently, Windows has a product issue that I'm discussing with Dev. so the automation code is fine I think. (1/2) Host_RHEL.m10.u1.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_get_load.q35: PASS (151.52 s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos
1. Add a new guest agent command 'guest-get-load' to get cpu load average info of Guest. 2. Randomly sorted out non-standard comments Signed-off-by: Dehan Meng <[email protected]>
c6efd4e
to
a5d7ca3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
qemu/tests/qemu_guest_agent.py (1)
2160-2195
: Windows stress jobs persist; add cleanup and verify post-stress decreaseStart-Job workers are never stopped and the test doesn’t check that load falls after stress. Add explicit cleanup and a decrease check; also run cleanup in finally. This avoids leaving the VM pegged and flaking follow-on tests.
Apply:
@@ - if params.get("os_type") == "windows": + if params.get("os_type") == "windows": error_context.context("Get load info for Windows", LOG_JOB.info) - try: + # Stop any leftover stress jobs by command pattern used in cmd_run_stress + cleanup_cmd = ( + 'powershell "$jobs = Get-Job | Where-Object { $_.Command -like ' + "''*sqrt(999999)*'' }; if ($jobs) { $jobs | Stop-Job -Force; " + '$jobs | Remove-Job }"' + ) + try: # Get initial load values load_info = self.gagent.get_load() @@ LOG_JOB.info( "Load value increased as expected: before=%s, after=%s", initial_load, stress_load, ) + # Stop stress jobs and verify load decreases + session.cmd(cleanup_cmd) + time.sleep(20) + after_info = self.gagent.get_load() + after_load = after_info["load1m"] + LOG_JOB.info("Load info after stopping stress: %s", after_info) + if after_load >= stress_load: + test.fail( + f"Load value did not decrease after stopping stress:" + f" stress={stress_load}, after={after_load}" + ) except guest_agent.VAgentCmdError as e: test.fail(f"guest-get-load command failed: {e}") + finally: + # Best-effort cleanup even on failure + session.cmd(cleanup_cmd)To verify locally, rerun the Windows variant and confirm jobs are gone after the test and load1m drops below the stressed value.
🧹 Nitpick comments (3)
qemu/tests/qemu_guest_agent.py (3)
2109-2146
: Make prev_values an explicit parameter of _verify_load_valuesThe helper implicitly closes over prev_values. Pass it explicitly to avoid hidden state and misuse.
-def _verify_load_values(qga_vals, guest_vals, check_type="match"): +def _verify_load_values(qga_vals, guest_vals, check_type="match", prev_values=None): @@ - if check_type != "match" and prev_values: + if check_type != "match": + if not prev_values: + return ["prev_values is required when check_type != 'match'"] qga_1m = qga_vals[0] guest_1m = guest_vals[0] prev_qga_1m = prev_values["qga"][0] prev_guest_1m = prev_values["guest"][0]And pass it at call sites:
- if errors := _verify_load_values(qga_vals, guest_vals, "increase"): + if errors := _verify_load_values(qga_vals, guest_vals, "increase", prev_values): @@ - if errors := _verify_load_values(qga_vals, guest_vals, "decrease"): + if errors := _verify_load_values(qga_vals, guest_vals, "decrease", prev_values):Also applies to: 2220-2221, 2232-2233
2095-2101
: Harden guest load parsing to tolerate formats like ‘uptime’If cmd_get_load returns “load average: 0.10, 0.05, 0.01”, split() will leave commas and raise ValueError. Use a regex to extract the first three numbers.
- try: - loads = session.cmd_output(params["cmd_get_load"]).strip().split() - return tuple(round(float(x), 2) for x in loads[:3]) + try: + out = session.cmd_output(params["cmd_get_load"]).strip() + # capture floats/ints, ignore commas/labels + nums = re.findall(r"[-+]?(?:\d*\.\d+|\d+)", out) + return tuple(round(float(x), 2) for x in nums[:3])
2172-2176
: Reduce fixed sleeps; prefer polling with a deadlineReplace sleeps with utils_misc.wait_for to reduce flakiness and overall wall time. For example, wait until load1m exceeds baseline by a delta, then proceed; later wait until it drops below a threshold.
Example pattern:
# Wait up to 30s for increase of >= 0.5 utils_misc.wait_for(lambda: self.gagent.get_load()["load1m"] >= prev_values["qga"][0] + 0.5, 30, 2) # After stopping stress, wait up to 60s for decrease utils_misc.wait_for(lambda: self.gagent.get_load()["load1m"] <= stress_load - 0.3, 60, 3)Please confirm acceptable deltas for your environments.
Also applies to: 2208-2214, 2225-2230
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
qemu/tests/cfg/qemu_guest_agent.cfg
(1 hunks)qemu/tests/qemu_guest_agent.py
(56 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- qemu/tests/cfg/qemu_guest_agent.cfg
🔇 Additional comments (1)
qemu/tests/qemu_guest_agent.py (1)
457-457
: Docstring/env param wording updatesSpelling/consistency fixes to “:param env: Dictionary with the test environment” across methods look good. No further action.
Also applies to: 484-484, 512-512, 528-528, 665-665, 685-685, 729-729, 779-779, 821-821, 960-960, 997-997, 1085-1085, 1174-1174, 1369-1369, 1655-1655, 1732-1732, 1905-1905, 1971-1971, 2069-2071, 2242-2242, 2381-2381, 2493-2493, 2625-2625, 2759-2759, 2920-2920, 3057-3057, 3089-3089, 3090-3090, 3182-3182, 3242-3242, 3261-3261, 3368-3369, 3555-3556, 3850-3850, 3880-3880, 3921-3921, 4049-4053, 4071-4071, 4121-4121, 4182-4182, 4561-4561, 4688-4689, 4790-4790, 4956-4957, 5030-5031, 5142-5143, 5184-5185, 5257-5258
error_context.context("Starting CPU stress test", LOG_JOB.info) | ||
s, o = session.cmd_status_output(params["cmd_install_stressng"]) | ||
if s != 0: | ||
test.error(f"Failed to install stress-ng: {o}") | ||
session.cmd(params["cmd_run_stress"]) | ||
time.sleep(25) | ||
|
||
guest_vals = _get_load_stats(session) | ||
qga_vals = _get_load_stats(session, False) | ||
|
||
_log_load_values(guest_vals, qga_vals, "Under stress") | ||
|
||
if errors := _verify_load_values(qga_vals, guest_vals, "increase"): | ||
test.fail("Stress test load check failed:\n" + "\n".join(errors)) | ||
|
||
prev_values = {"guest": guest_vals, "qga": qga_vals} | ||
|
||
# sleep (60) wait for the stress-ng terminated. | ||
time.sleep(60) | ||
guest_vals = _get_load_stats(session) | ||
qga_vals = _get_load_stats(session, False) | ||
|
||
_log_load_values(guest_vals, qga_vals, "After stress") | ||
|
||
if errors := _verify_load_values(qga_vals, guest_vals, "decrease"): | ||
test.fail("Post-stress load check failed:\n" + "\n".join(errors)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Best-effort Linux stress cleanup to avoid polluting subsequent tests
If stress-ng doesn’t terminate cleanly, it may bias later readings. Add a fallback stop command after the 60s wait (configurable via params).
- # sleep (60) wait for the stress-ng terminated.
- time.sleep(60)
+ # Wait for stress to end, then ensure it's stopped.
+ time.sleep(60)
+ stop_cmd = params.get("cmd_stop_stress", "pkill -f stress-ng || true")
+ session.cmd_status_output(stop_cmd)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
error_context.context("Starting CPU stress test", LOG_JOB.info) | |
s, o = session.cmd_status_output(params["cmd_install_stressng"]) | |
if s != 0: | |
test.error(f"Failed to install stress-ng: {o}") | |
session.cmd(params["cmd_run_stress"]) | |
time.sleep(25) | |
guest_vals = _get_load_stats(session) | |
qga_vals = _get_load_stats(session, False) | |
_log_load_values(guest_vals, qga_vals, "Under stress") | |
if errors := _verify_load_values(qga_vals, guest_vals, "increase"): | |
test.fail("Stress test load check failed:\n" + "\n".join(errors)) | |
prev_values = {"guest": guest_vals, "qga": qga_vals} | |
# sleep (60) wait for the stress-ng terminated. | |
time.sleep(60) | |
guest_vals = _get_load_stats(session) | |
qga_vals = _get_load_stats(session, False) | |
_log_load_values(guest_vals, qga_vals, "After stress") | |
if errors := _verify_load_values(qga_vals, guest_vals, "decrease"): | |
test.fail("Post-stress load check failed:\n" + "\n".join(errors)) | |
prev_values = {"guest": guest_vals, "qga": qga_vals} | |
# Wait for stress to end, then ensure it's stopped. | |
time.sleep(60) | |
stop_cmd = params.get("cmd_stop_stress", "pkill -f stress-ng || true") | |
session.cmd_status_output(stop_cmd) | |
guest_vals = _get_load_stats(session) | |
qga_vals = _get_load_stats(session, False) | |
_log_load_values(guest_vals, qga_vals, "After stress") | |
if errors := _verify_load_values(qga_vals, guest_vals, "decrease"): | |
test.fail("Post-stress load check failed:\n" + "\n".join(errors)) |
Add a new guest agent command 'guest-get-load' to
get cpu load average info of Guest.
ID: 3534
Signed-off-by: Dehan Meng [email protected]
Summary by CodeRabbit
New Features
Tests
Chores