fix(ci): split coverage step to avoid Linux hang by JerrettDavis · Pull Request #94 · JerrettDavis/ExperimentFramework

JerrettDavis · 2026-04-29T00:43:10Z

Problem

The CI workflow's combined "Test with coverage" step was hanging on Ubuntu Linux runners for up to 55–60 minutes before hitting the job timeout. Tests themselves were always passing — the hang occurred during coverage data collection, not test execution.

Root cause: When dotnet test runs the full solution with --collect:'XPlat Code Coverage', the test host for ExperimentFramework.Tests (2,094 tests, many project refs) crashes during AfterTestRunEnd coverage collection. On Linux this manifests as a silent hang — the socket doesn't reset on crash, so the runner waits indefinitely until the 60-minute job timeout fires.

A secondary cause: three Distributed.Redis test classes in ExperimentFramework.Tests start Docker containers via Testcontainers but had no [Trait("Category","integration")] marker, so they ran during the standard CI filter and blocked the test host waiting for Docker.

Fix

Split the combined step into two separate steps:
- Test (verify correctness) — runs the full solution without coverage collection; no hang, proves tests pass
- Collect coverage (per-project) — runs each project individually with --collect; isolated test hosts avoid the multi-session crash. The resulting coverage.cobertura.xml files are picked up by the existing reportgenerator step unchanged.
Add --no-build to the test step to prevent a redundant rebuild race condition after the Build (Release) step.
Add missing packages.lock.json files for Governance.Persistence.Tests and Governance.Persistence.Redis.Tests so the NuGet cache key covers all test projects.
Quarantine Testcontainers Redis tests — add [Trait("Category","integration")] to the three Distributed.Redis test classes so the standard CI filter correctly excludes them.

What was not changed

Test logic is unchanged — no tests were deleted or skipped
Coverage thresholds and reporting are unchanged
The same fix was applied to the release job's test step

🤖 Generated with Claude Code

The dotnet test step was re-building multi-target source projects (e.g. ExperimentFramework.Audit targets net8/9/10) immediately before launching the net10.0 test host, which caused the Audit.Tests host to hang indefinitely during coverlet data-collector initialisation. Adding --no-build skips the redundant rebuild (the Build (Release) step already produced all required binaries) and removes the timing window that triggered the hang. Also adds missing packages.lock.json for Governance.Persistence.Tests and Governance.Persistence.Redis.Tests so the NuGet cache key covers all test projects and the --use-lock-file restore does not regenerate lock files during the test step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… hang When dotnet test runs the full solution with --collect:'XPlat Code Coverage', the test host process for ExperimentFramework.Tests (2094 tests, many project refs) crashes during AfterTestRunEnd coverage data collection. On Linux CI this manifests as a 55-minute silent hang (socket doesn't RST on crash), consuming the entire 60-minute job timeout. Fix: split the combined 'Test with coverage' step into two steps: 1. 'Test (verify correctness)' - full solution run, no coverage, no hang 2. 'Collect coverage (per-project)' - each project tested individually; isolated test hosts avoid the multi-session crash. The resulting coverage.cobertura.xml files are picked up by the existing reportgenerator step unchanged. Applied the same split to the release job's test step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The three Distributed.Redis test classes start real Docker containers via Testcontainers but had no [Trait("Category","integration")] marker, so they ran in the standard CI filter and blocked the test host waiting for Docker — causing the hang that appeared after Dashboard.UI.Tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-29T00:43:20Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 85b1650.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

GitHub Copilot and others added 3 commits April 28, 2026 19:42

github-actions Bot added ci/cd configuration area: tests dependencies labels Apr 29, 2026

JerrettDavis merged commit 9bb2da6 into main Apr 29, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): split coverage step to avoid Linux hang#94

fix(ci): split coverage step to avoid Linux hang#94
JerrettDavis merged 3 commits into
mainfrom
fix/ci-coverage-hang

JerrettDavis commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JerrettDavis commented Apr 29, 2026

Problem

Fix

What was not changed

Uh oh!

github-actions Bot commented Apr 29, 2026

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant