Test/enumerate gpus coverage by DaniSalam4eto · Pull Request #223 · nikopueringer/CorridorKey

DaniSalam4eto · 2026-04-13T09:20:25Z

What does this change?

How was it tested?

Checklist

uv run pytest passes
uv run ruff check passes
uv run ruff format --check passes

DaniSalam4eto · 2026-04-13T09:22:07Z

This PR adds GPU-free unit tests for enumerate_gpus() / _enumerate_nvidia() / _enumerate_amd() (follow-up to #211). All subprocess and CUDA access is mocked. Please see CI for ruff + pytest; I’ll address any failures promptly.

apekshik · 2026-04-15T06:28:58Z

One of the new tests, test_empty_nvidia_result_is_returned_as_is, appears to lock in what looks like a bug introduced in commit d836296.

enumerate_gpus currently does:

if gpus is not None:
    return gpus

That treats an empty list returned from _enumerate_nvidia as a terminal success, which means it skips the AMD / torch fallback path even though [] only means "the probe ran and found nothing."

That can happen in real environments where nvidia-smi is installed but no NVIDIA GPU is actually present, such as:

stale driver setups
CI runners
VMs

The dispatch should instead be:

if gpus:
    return gpus

That way, [] falls through to the next probe.

The test should also be inverted so it asserts that fallthrough happens, rather than enforcing the current early-return behavior.

Raiden129 · 2026-04-15T08:04:25Z

I opened up a pull request to fix the issue in device utils, so you just need to update the test reflect the new behavior

enumerate_gpus() now uses truthiness (if gpus) instead of is not None so [] from nvidia-smi does not skip AMD/torch fallback. Updates test_empty_nvidia_result to assert fallthrough to AMD (apekshik review on PR nikopueringer#223). Made-with: Cursor

DaniSalam4eto · 2026-04-16T17:44:45Z

Thanks for the information: @apekshik and the fix from: @Raiden129.

Updated in this push:

device_utils.py: changed if gpus is not None: to if gpus: for both the NVIDIA and AMD dispatch checks in enumerate_gpus(). An empty list from _enumerate_nvidia() (stale driver, CI runner, VM) now correctly falls through to AMD / torch instead of being treated as terminal success.
test_device_utils.py: renamed test_empty_nvidia_result_is_returned_as_is to test_empty_nvidia_result_falls_through_to_amd and inverted the assertion so it enforces the fallthrough contract. NVIDIA returns [], AMD is consulted and its result is returned.
@Raiden129 I included the device_utils.py fix here as well so the test passes in isolation; happy to drop it if your PR lands first and I rebase on top.

nikopueringer · 2026-04-17T00:29:01Z

@DaniSalam4eto Hey there! Was reading through the comments here, and wanted to let you know that I merged @Raiden129 's fix. Since I merged that first, I saw your comment about rebasing, and wanted to let you know. Give me the green light when you're ready, and I will merge this PR. Thanks!

PR nikopueringer#211 added enumerate_gpus() and its platform-specific helpers to device_utils.py without unit tests. This commit is the first of three that close that gap; it introduces the shared test scaffolding and covers the NVIDIA path. Scaffolding added at module level in tests/test_device_utils.py: - _fake_completed(stdout, returncode): builds a MagicMock shaped like subprocess.CompletedProcess so monkeypatched subprocess.run() calls return something the helpers can read .stdout / .returncode off of. - _SubprocessRouter: callable class keyed on cmd[0] so one monkeypatch can satisfy both the amd-smi and rocm-smi subprocess.run() calls inside _enumerate_amd() with different responses per binary. Missing entries raise FileNotFoundError to mimic the OS behavior when a CLI is absent, which is the exact signal the helpers use to fall through to the next backend. New test class TestEnumerateNvidia covers every branch of _enumerate_nvidia(): - test_parses_single_gpu — happy path, including the MiB -> GiB conversion (nvidia-smi reports MiB with --format=csv,nounits and the helper divides by 1024). - test_parses_multi_gpu — exercises the CSV row loop with two GPUs. - test_nonzero_returncode_returns_none — early return when nvidia-smi exits non-zero (e.g. driver not loaded). - test_binary_missing_returns_none — FileNotFoundError path when nvidia-smi isn't installed at all. - test_timeout_returns_none — subprocess.TimeoutExpired path when the 5-second query wedges. - test_malformed_rows_are_skipped — verifies the `len(parts) >= 4` guard so partial/garbage rows don't crash the parser. Imports updated: add `subprocess` (for TimeoutExpired) and `_enumerate_nvidia` from device_utils. Module docstring expanded to mention enumerate_gpus() alongside the existing coverage areas. All tests run GPU-free via monkeypatch and fit the existing monkeypatch style of TestDetectBestDevice / TestResolveDevice in the same file. No conftest or fixture changes required.

Second of three commits backfilling tests for PR nikopueringer#211. Covers every branch of _enumerate_amd(), which walks a three-stage fallback: amd-smi (ROCm 6.0+, JSON) -> rocm-smi (legacy, CSV) -> Windows registry. All five tests run GPU-free on any platform. New test class TestEnumerateAmd: - test_amdsmi_success — happy path with a realistic amd-smi payload containing {"asic": {"market_name": ...}, "vram": {"size": {"value": MiB}}}. Verifies MiB -> GiB conversion and that vram_free_gb is populated from vram_total_gb (amd-smi's static subcommand reports no free-memory field). - test_amdsmi_malformed_entry_skipped — two-entry payload where the second has a non-numeric VRAM size ("garbage"). Locks in the inner try/except that PR nikopueringer#211 added after review feedback: one bad GPU entry must not crash the whole enumeration. ValueError from float("garbage") is the specific path exercised. - test_amdsmi_invalid_json_falls_through_to_rocmsmi — amd-smi returns "not-json". json.JSONDecodeError is caught by the outer handler and rocm-smi runs next. Mock returns 16 GiB total / 1 GiB used in bytes; assertions verify the bytes -> GiB conversion (divide by 1024**3) and that vram_free_gb = (total - used) / 1024**3 = 15.0. - test_amdsmi_missing_falls_through_to_rocmsmi — router has no entry for amd-smi, so FileNotFoundError is raised and caught. Second subprocess.run call hits rocm-smi and succeeds. Verifies the amd-smi-not-installed path independently from the JSONDecodeError path above. - test_no_tools_on_non_windows_returns_none — neither amd-smi nor rocm-smi present, sys.platform forced to "linux" so the Windows winreg branch is skipped. Verifies the function returns None (the "unavailable" sentinel) rather than an empty list. Imports updated: add `json` (for json.dumps payload construction) and `sys` (for monkeypatching sys.platform in the last test), plus `_enumerate_amd` from device_utils. The _SubprocessRouter helper introduced in the previous commit is reused here.

Third and final commit backfilling PR nikopueringer#211 tests. The previous two commits covered the helpers (_enumerate_nvidia, _enumerate_amd) in isolation; this one locks in the public entry point enumerate_gpus(), which walks NVIDIA -> AMD -> torch.cuda -> [] and decides which source to trust. New module-level helper: - _amd_must_not_run: raises AssertionError when called. Used as a monkeypatch target in the tests that must prove AMD enumeration is never invoked when NVIDIA already answered (first-match-wins). Module-scoped so ruff's blank-line rules around nested defs stay happy. New test class TestEnumerateGpus: - test_prefers_nvidia — NVIDIA helper returns a GPUInfo list; AMD helper is wired to _amd_must_not_run. Verifies dispatch stops at the first successful backend. - test_falls_back_to_amd_when_nvidia_unavailable — NVIDIA returns None (unavailable sentinel) and AMD returns a populated list. The AMD result is returned unchanged. - test_empty_nvidia_result_is_returned_as_is — the subtle one. nvidia-smi succeeding with zero GPUs returns []. That is NOT the same as None, and enumerate_gpus() must return [] without falling through to AMD. Written with _amd_must_not_run so this contract is enforced as an assertion, not an inference. Prevents a future refactor from accidentally collapsing `is not None` into truthiness and changing behavior. - test_torch_fallback_when_smi_tools_absent — both smi helpers return None, torch.cuda.is_available() is True with 2 devices. Mocks get_device_properties() to return MagicMock objects with .total_memory and .name set so the helper's bytes -> GiB conversion (divide by 1024**3) can be verified for both GPUs. Also asserts vram_free_gb mirrors vram_total_gb (torch.cuda has no free-memory accessor in this code path). - test_returns_empty_when_nothing_available — all three backends produce nothing (smi helpers return None, torch.cuda.is_available returns False). The function must return an empty list rather than raise, so CLI/UI code can simply check `if not gpus`. Imports updated: add `GPUInfo` and `enumerate_gpus` from device_utils. End state: 16 new tests across three commits, covering every branch of the GPU enumeration code added in PR nikopueringer#211. All tests mock subprocess and torch.cuda via monkeypatch, run GPU-free in CI, and match the existing monkeypatch style used elsewhere in the file.

enumerate_gpus() now uses truthiness (if gpus) instead of is not None so [] from nvidia-smi does not skip AMD/torch fallback. Updates test_empty_nvidia_result to assert fallthrough to AMD (apekshik review on PR nikopueringer#223). Made-with: Cursor

DaniSalam4eto · 2026-04-17T15:58:53Z

@nikopueringer Rebased on top of main now that #228 is in. Git dropped the duplicate device_utils.py change during the rebase, so this PR is now tests-only the renamed test_empty_nvidia_result_falls_through_to_amd locks in the fallthrough contract from @Raiden129's fix. Green light from my side, ready to merge.

nikopueringer · 2026-04-17T22:10:06Z

Merged! Thank you!

DaniSalam4eto force-pushed the test/enumerate-gpus-coverage branch from 2bdcb22 to 4146977 Compare April 17, 2026 15:56

DaniSalam4eto added 4 commits April 17, 2026 18:58

nikopueringer merged commit 207b879 into nikopueringer:main Apr 17, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test/enumerate gpus coverage#223

Test/enumerate gpus coverage#223
nikopueringer merged 4 commits intonikopueringer:mainfrom
DaniSalam4eto:test/enumerate-gpus-coverage

DaniSalam4eto commented Apr 13, 2026 •

edited

Loading

Uh oh!

DaniSalam4eto commented Apr 13, 2026

Uh oh!

apekshik commented Apr 15, 2026

Uh oh!

Raiden129 commented Apr 15, 2026

Uh oh!

DaniSalam4eto commented Apr 16, 2026

Uh oh!

nikopueringer commented Apr 17, 2026

Uh oh!

DaniSalam4eto commented Apr 17, 2026

Uh oh!

Uh oh!

nikopueringer commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

DaniSalam4eto commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this change?

How was it tested?

Checklist

Uh oh!

DaniSalam4eto commented Apr 13, 2026

Uh oh!

apekshik commented Apr 15, 2026

Uh oh!

Raiden129 commented Apr 15, 2026

Uh oh!

DaniSalam4eto commented Apr 16, 2026

Uh oh!

nikopueringer commented Apr 17, 2026

Uh oh!

DaniSalam4eto commented Apr 17, 2026

Uh oh!

Uh oh!

nikopueringer commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DaniSalam4eto commented Apr 13, 2026 •

edited

Loading