fix: tolerate transient NotFound from concurrent build scripts#3
Open
willpartcl wants to merge 1 commit into
Open
fix: tolerate transient NotFound from concurrent build scripts#3willpartcl wants to merge 1 commit into
willpartcl wants to merge 1 commit into
Conversation
wait_for_other_builds() walks the entire build dir looking for stable
state. Cargo runs build scripts in parallel; sibling crates create and
delete tempfiles inside the same dir while we walk. WalkDir surfaces
those as Err(NotFound) at iteration time, and entry.unwrap() panics.
The fix: treat WalkDir errors as evidence the directory is still in
flux, so loop again instead of crashing. The function's invariant is
'wait until directory is stable' — a NotFound mid-walk is precisely
the case where it isn't yet.
locate_manifest_paths() has the same TOCTOU pattern between .exists()
and .read_to_string().expect(). Replaced with match-on-read so a
disappeared file becomes a graceful skip instead of a panic.
Observed via partcl CI:
thread 'main' panicked at inwelling-0.5.5/src/lib.rs:207:31:
called `Result::unwrap()` on an `Err` value:
Error { depth: 2, inner: Io { path: Some(".../partser-.../rmetanFrez5"),
err: Os { code: 2, kind: NotFound, ... }}}
Affects all consumers; race is non-deterministic and depends on
filesystem timing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
`wait_for_other_builds` walks the build directory looking for stable state. Cargo runs build scripts in parallel; sibling crates create and delete tempfiles inside the same dir while we walk. WalkDir surfaces those as `Err(NotFound)` and `entry.unwrap()` panics.
The fix: treat a WalkDir error as evidence the directory is still in flux — set `waiting = true` and continue. The function's invariant is "wait until directory stops changing"; a NotFound mid-walk is exactly the case where it hasn't yet.
`locate_manifest_paths` has the same TOCTOU pattern: `.exists()` followed by `.read_to_string().expect()` panics if the file disappears between check and read. Replaced with `if let Ok(contents) = read_to_string(...)` so the disappearance becomes a graceful skip.
Observed
In a multi-crate workspace where two crates use `inwelling` (one via `to()`, one via `collect_downstream()`), heavy parallel builds reproduce this:
```
thread 'main' panicked at inwelling-0.5.5/src/lib.rs:207:31:
called `Result::unwrap()` on an `Err` value:
Error { depth: 2, inner: Io { path: Some(".../partser-.../rmetanFrez5"),
err: Os { code: 2, kind: NotFound, ... } } }
```
Non-deterministic — depends on filesystem timing and which sibling build script finishes first.