fix: detect and fail fast on multi-threaded test execution by siphonite · Pull Request #36 · siphonite/first

siphonite · 2026-03-03T02:25:47Z

Summary

This PR fixes issue #30: crash_point() SIGKILL terminates entire test harness.

FIRST uses SIGKILL to simulate power loss, which terminates the entire process. When multiple tests share the same process (the default in cargo test), one test hitting a crash point kills all other tests.

Changes

Added detection for RUST_TEST_THREADS environment variable (set by cargo test)
Added detection for --test-threads command-line argument
Fail fast with a clear error message explaining the issue and solution
Allow bypassing with FIRST_SKIP_THREAD_CHECK=1 for advanced use cases

Testing

Verified detection works with RUST_TEST_THREADS=4 → fails with clear error
Verified detection works with --test-threads=4 → fails with clear error
Verified tests pass with --test-threads=1
Verified FIRST_SKIP_THREAD_CHECK=1 allows bypassing the check

Fixes #30

Summary by cubic

Detects multi-threaded test runs and fails fast with a clear message to prevent a SIGKILL from killing the entire test harness. Also isolates per-run work directories to avoid collisions and now propagates child stderr so errors are visible.

Bug Fixes
- Detects RUST_TEST_THREADS and --test-threads in Execution; exits if >1 with guidance; override via FIRST_SKIP_THREAD_CHECK=1.
- Generates a unique base dir per orchestrator run (PID + UUID), passes FIRST_WORK_DIR to children, and derives run_N paths to prevent collisions; adds a repro test.
- Propagates stderr from failed child processes so thread-check and other errors are shown.
- Fixes clippy “collapsible if” warnings.
Migration
- Run tests single-threaded: cargo test -- --test-threads=1 (or set RUST_TEST_THREADS=1).
- Only set FIRST_SKIP_THREAD_CHECK=1 if each test runs in its own process.
- Work artifacts now live under $TMPDIR/first/first--/run_N.

^{Written for commit 3f7ae36. Summary will update on new commits.}

Add Integration Constraints section to limitations.md: - One first::test() per #[test] - Async tests not supported - Not thread-safe - No nested workspaces Add limitations summary to lib.rs crate docs. Closes #18

- Rewrite README for conciseness and professional formatting - Switch license from MIT to Apache 2.0 - Update project logo - Rewrite all architecture docs (~60% reduction) - Rewrite limitations.md and proof/reference_wal.md - Remove empty invariants.md

- Generate unique base directory per orchestrator run using UUID + PID - Pass base directory to child processes via FIRST_WORK_DIR - Derive leaf run_N directory in child process from base + target - Update extract_test_name to use thread name for better isolation - Add reproduction test case (tests/repro_issue_29.rs) - Closes #29

FIRST uses SIGKILL to simulate power loss, which terminates the entire process. When multiple tests share the same process (the default in cargo test), one test hitting a crash point kills all other tests. This change implements the acceptance criteria from issue #30: - Detect RUST_TEST_THREADS environment variable (set by cargo test) - Detect --test-threads command-line argument - Fail fast with a clear error message explaining the issue and solution - Allow bypassing with FIRST_SKIP_THREAD_CHECK=1 for advanced use cases Fixes #30

cubic-dev-ai

5 issues found across 16 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/orchestrator.rs">

<violation number="1" location="src/orchestrator.rs:49">
P2: The new UUID-scoped `base_dir` is never cleaned up, so successful runs leak a unique temp directory each execution.</violation>
</file>

<file name="tests/repro_issue_29.rs">

<violation number="1" location="tests/repro_issue_29.rs:24">
P2: This test can fail nondeterministically because it asserts persisted file contents after an intentional SIGKILL without fsyncing the write first.</violation>
</file>

<file name="src/test.rs">

<violation number="1" location="src/test.rs:120">
P3: Duplicate target/work-dir derivation was introduced in both Execution and Verify paths; extract it to a shared helper to avoid drift between phases.</violation>
</file>

<file name="docs/proof/reference_wal.md">

<violation number="1" location="docs/proof/reference_wal.md:36">
P2: The documented `commit` diff uses `?` in a function signature that doesn't return `Result`, making the snippet invalid and inconsistent with the actual code.</violation>
</file>

<file name="README.md">

<violation number="1" location="README.md:2">
P2: README references a non-existent image file (`assets/image.jpg`), causing a broken banner image.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-03T02:30:45Z

src/orchestrator.rs

+    let pid = std::process::id();
+    let base_dir = std::env::temp_dir()
+        .join("first")
+        .join(format!("first-{}-{}", pid, run_uuid));


P2: The new UUID-scoped base_dir is never cleaned up, so successful runs leak a unique temp directory each execution.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/orchestrator.rs, line 49: <comment>The new UUID-scoped `base_dir` is never cleaned up, so successful runs leak a unique temp directory each execution.</comment> <file context> @@ -42,8 +40,17 @@ where + let pid = std::process::id(); + let base_dir = std::env::temp_dir() + .join("first") + .join(format!("first-{}-{}", pid, run_uuid)); + loop { </file context>

cubic-dev-ai · 2026-03-03T02:30:45Z

tests/repro_issue_29.rs

+            // Sleep to increase chance of overlap
+            thread::sleep(Duration::from_millis(100));
+
+            first::crash_point("point_a");


P2: This test can fail nondeterministically because it asserts persisted file contents after an intentional SIGKILL without fsyncing the write first.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At tests/repro_issue_29.rs, line 24: <comment>This test can fail nondeterministically because it asserts persisted file contents after an intentional SIGKILL without fsyncing the write first.</comment> <file context> @@ -0,0 +1,65 @@ + // Sleep to increase chance of overlap + thread::sleep(Duration::from_millis(100)); + + first::crash_point("point_a"); + }) + .verify(|env, _| { </file context>

cubic-dev-ai · 2026-03-03T02:30:45Z

docs/proof/reference_wal.md

-         crash_point("after_commit_write");
-         self.file.sync_all().expect("failed to fsync after COMMIT");
+  pub fn commit(&mut self, txid: TxId) {
+     self.file.sync_all()?;  // ← ensure records durable FIRST


P2: The documented commit diff uses ? in a function signature that doesn't return Result, making the snippet invalid and inconsistent with the actual code.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At docs/proof/reference_wal.md, line 36: <comment>The documented `commit` diff uses `?` in a function signature that doesn't return `Result`, making the snippet invalid and inconsistent with the actual code.</comment> <file context> @@ -1,148 +1,54 @@ - crash_point("after_commit_write"); - self.file.sync_all().expect("failed to fsync after COMMIT"); + pub fn commit(&mut self, txid: TxId) { ++ self.file.sync_all()?; // ← ensure records durable FIRST + writeln!(self.file, "COMMIT {}", txid)?; + self.file.sync_all()?; </file context>

cubic-dev-ai · 2026-03-03T02:30:45Z

README.md

@@ -1,501 +1,157 @@
 <p align="center">
-  <img src="assets/logo.svg" alt="FIRST logo" width="240">
+  <img src="assets/image.jpg" alt="FIRST logo" width="520">


P2: README references a non-existent image file (assets/image.jpg), causing a broken banner image.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At README.md, line 2: <comment>README references a non-existent image file (`assets/image.jpg`), causing a broken banner image.</comment> <file context> @@ -1,501 +1,157 @@ <p align="center"> - <img src="assets/logo.svg" alt="FIRST logo" width="240"> + <img src="assets/image.jpg" alt="FIRST logo" width="520"> </p> </file context>

Suggested change

<img src="assets/image.jpg" alt="FIRST logo" width="520">

<img src="assets/logo.svg" alt="FIRST logo" width="520">

cubic-dev-ai · 2026-03-03T02:30:45Z

src/test.rs

            Phase::Execution => {
                if let Some(run_fn) = self.run_fn {
+                    // Derive leaf directory from base + target
+                    let target = std::env::var("FIRST_CRASH_TARGET")


P3: Duplicate target/work-dir derivation was introduced in both Execution and Verify paths; extract it to a shared helper to avoid drift between phases.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/test.rs, line 120: <comment>Duplicate target/work-dir derivation was introduced in both Execution and Verify paths; extract it to a shared helper to avoid drift between phases.</comment> <file context> @@ -116,12 +116,26 @@ where Phase::Execution => { if let Some(run_fn) = self.run_fn { + // Derive leaf directory from base + target + let target = std::env::var("FIRST_CRASH_TARGET") + .ok() + .and_then(|s| s.parse::<usize>().ok()) </file context>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e47322a48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T02:32:48Z

src/rt.rs

+                eprintln!(
+                    "ERROR: FIRST detected multi-threaded test execution (RUST_TEST_THREADS={})",
+                    thread_count
+                );
+                eprintln!();
+                eprintln!("FIRST uses SIGKILL to simulate power loss, which terminates the ENTIRE process.");
+                eprintln!("When multiple tests share the same process, one crash kills all tests.");
+                eprintln!();
+                eprintln!("SOLUTION: Run tests with --test-threads=1:");
+                eprintln!("  cargo test -- --test-threads=1");
+                eprintln!();
+                eprintln!("Or set FIRST_SKIP_THREAD_CHECK=1 at your own risk.");
+                std::process::exit(1);


Preserve thread-check diagnostics instead of dropping stderr

detect_multi_threaded_test() emits a detailed remediation message and then exits, but this runs in the EXECUTION child where spawn_child() pipes stderr through parse_crash_metadata() and discards non-crash lines, so users only see a generic execution failure and lose the new guidance. In runs that hit this branch (e.g., RUST_TEST_THREADS/--test-threads configurations), the fail-fast behavior works but the actionable error text is effectively hidden.

Useful? React with 👍 / 👎.

When a child process in EXECUTION phase fails (e.g., thread check fail-fast), the stderr output was being captured but discarded unless it contained crash metadata. This made the detailed thread-check error message invisible to users. Now we capture and print the child's stderr output when the child fails, so users can see the full error message. Also fixes clippy warning about collapsible if statements.

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/orchestrator.rs">

<violation number="1" location="src/orchestrator.rs:294">
P2: Avoid unbounded buffering of child stderr; cap captured output to prevent excessive memory use on verbose failures.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-03T02:39:29Z

src/orchestrator.rs

+            Ok(0) => break, // EOF
+            Ok(_) => {
+                let trimmed = line.trim();
+                full_output.push_str(trimmed);


P2: Avoid unbounded buffering of child stderr; cap captured output to prevent excessive memory use on verbose failures.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/orchestrator.rs, line 294: <comment>Avoid unbounded buffering of child stderr; cap captured output to prevent excessive memory use on verbose failures.</comment> <file context> @@ -272,18 +280,31 @@ fn spawn_child_with_crash_info( + Ok(0) => break, // EOF + Ok(_) => { + let trimmed = line.trim(); + full_output.push_str(trimmed); + full_output.push('\n'); + // Look for JSON crash metadata </file context>

siphonite added 6 commits February 6, 2026 20:28

docs: document re-entrancy and integration constraints

1dfff05

Add Integration Constraints section to limitations.md: - One first::test() per #[test] - Async tests not supported - Not thread-safe - No nested workspaces Add limitations summary to lib.rs crate docs. Closes #18

docs: resolve merge conflict in limitations.md

20d60f0

fix(orchestrator): move uuid import and remove orphaned doc comment

6d959d2

cubic-dev-ai bot reviewed Mar 3, 2026

View reviewed changes

fix: address clippy warnings for collapsible if statements

67f45c4

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

cubic-dev-ai bot reviewed Mar 3, 2026

View reviewed changes

siphonite merged commit 8067142 into main Mar 3, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect and fail fast on multi-threaded test execution#36

fix: detect and fail fast on multi-threaded test execution#36
siphonite merged 8 commits intomainfrom
fix/issue-30-sigkill-test-harness

siphonite commented Mar 3, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	<img src="assets/image.jpg" alt="FIRST logo" width="520">
	<img src="assets/logo.svg" alt="FIRST logo" width="520">

Conversation

siphonite commented Mar 3, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

siphonite commented Mar 3, 2026 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading

cubic-dev-ai bot Mar 3, 2026 •

edited

Loading