FIXES

Cleaned backlog from the full source audit.

Structure:

Quick wins first: smaller, high-signal fixes that improve correctness, UX, and trust quickly.
Architectural issues second: deeper system changes that need broader design work.

Quick Wins First

CLI And Artifact Correctness

✅ Fix trace metadata drift so recorded traces always embed the real final tracePath. Why: Recorded traces can be written successfully while the summary inside the trace omits or misstates the final output path. Impact: This creates misleading artifact metadata and can confuse downstream reporting and tooling. Evidence: src/runtime/engine.rs, src/runtime/engine.rs, src/runtime/engine.rs, src/runtime/tracefile.rs Done when:
- ✅ run --record writes a trace whose embedded summary path matches the actual file written.
- ✅ test --record does the same in both overwrite and append modes.
- ✅ Regression tests cover append-mode collision handling.
✅ Make fozzy test fail clearly when the caller explicitly supplies a nonexistent scenario path. Why: Today a mixed invocation can pass if one input exists and another explicit file path is wrong. Impact: This silently narrows the executed test set and can produce a false-green result in CI or release gating. Evidence: Discovery and empty-match handling live in src/runtime/engine.rs, with file matching in src/platform/fsutil.rs. Done when:
- ✅ Explicit missing paths cause a hard failure with a clear error.
- ✅ Glob patterns still preserve normal glob semantics.
- ✅ Mixed literal-path and glob invocations are covered by tests.
✅ Make fozzy init honor --config <path> instead of always writing fozzy.toml. Why: The CLI exposes a custom config path but initialization still writes the default filename. Impact: This breaks user expectations and makes scripted bootstrapping unreliable. Evidence: src/runtime/engine.rs, src/main.rs Done when:
- ✅ Default init still creates fozzy.toml.
- ✅ --config custom.toml init writes custom.toml.
- ✅ Force/non-force behavior is tested for custom config paths.
✅ Stop default fozzy test runs from silently skipping distributed scenarios while still reporting overall success. Why: The default test discovery can find distributed scenarios, but the runner skips them and can still return status=pass. Impact: This is easy to misread as “all discovered tests passed” when some were never executed. Evidence: src/runtime/engine.rs, src/runtime/engine.rs, src/runtime/engine.rs Done when:
- ✅ Mixed regular/distributed discovery no longer produces a misleading false-green result.
- ✅ The final contract is explicit: fail, opt-in skip, or separate discovery domains.
- ✅ Docs and init scaffolding match the final behavior.

Validation And Contract Drift

✅ Make distributed-scenario validation consistent across fozzy validate and fozzy explore. Why: A distributed scenario can be rejected by validation and still execute successfully through explore. Impact: “Valid” and “runnable” do not currently mean the same thing, which weakens trust in both commands. Evidence: Validator in src/model/scenario.rs, validation call site in src/main.rs, explore loading in src/modes/explore.rs Done when:
- ✅ A scenario rejected by validate is also rejected by explore, unless an explicit permissive mode exists.
- ✅ Missing topology declarations are not silently synthesized during execution.
- ✅ Regression tests cover malformed distributed scenarios.
✅ Make scenario validation recurse into nested assert_throws and assert_rejects blocks. Why: Top-level validation does not currently validate the nested step programs inside these wrappers. Impact: Malformed nested DSL can slip through preflight and then be treated as an “expected” failure, producing a false-green scenario result. Evidence: Top-level validation in src/model/scenario.rs, nested execution in src/runtime/engine.rs Done when:
- ✅ Nested invalid steps fail validation before runtime.
- ✅ The same validation rules apply at top level and in nested blocks.
- ✅ Tests cover nested invalid durations and invalid field combinations.
✅ Stop silently discarding source/scenario read failures in topology mapping. Why: fozzy map currently skips unreadable files and dropped scenarios without surfacing that the report is incomplete. Impact: The command can overstate coverage confidence and under-report risk. Evidence: src/cmd/map_cmd.rs, src/cmd/map_cmd.rs, src/cmd/map_cmd.rs Done when:
- ✅ Unreadable source files are reported explicitly.
- ✅ Invalid or unreadable scenario files are reported explicitly.
- ✅ JSON output includes structured degraded-confidence metadata.

SDK And Local Developer Experience

✅ Harden the TypeScript SDK stream() path so spawn failures become normal SDK errors. Why: stream() does not currently install an error handler on the child process. Impact: Missing binaries or spawn failures can crash the consumer’s Node process instead of surfacing a catchable SDK error. Evidence: sdk-ts/src/index.ts, reference behavior in sdk-ts/src/index.ts Done when:
- ✅ Missing binary errors are catchable.
- ✅ Permission-denied spawn errors are catchable.
- ✅ Normal streaming behavior still works.
✅ Consolidate duplicated run/test summary finalization and artifact-writing logic. Why: Similar mechanics are implemented in multiple places with slightly different behavior. Impact: This is how metadata drift and subtle CLI inconsistency happen. Evidence: src/runtime/engine.rs, src/runtime/engine.rs, src/runtime/engine.rs Done when:
- ✅ Run/test/replay/shrink flows use the same summary and artifact conventions where applicable.
- ✅ Collision-policy handling is consistent.
- ✅ Fresh recorded trace paths normalize consistently across trace verify, replay, and ci.
- ✅ Direct trace selectors normalize consistently across artifacts/report/memory/profile follow-up commands.
- ✅ Regression coverage exists for the shared logic.
✅ Clean up checked-in runtime artifacts and profiling outputs at repo root. Why: The repo currently contains many trace/profile outputs alongside source files. Impact: This adds noise to audits and reviews and can interfere with tooling signal quality. Done when:
- ✅ Incidental runtime outputs are ignored or moved out of the repo root.
- ✅ Intentional fixtures remain documented and clearly separated.
✅ Clarify or remove legacy config-loading pathways that no longer match CLI behavior. Why: The CLI now exits on config parse/read errors, but a fallback helper still exists that warns and silently defaults. Impact: This can confuse future contributors about the intended config contract. Evidence: src/platform/config.rs, src/main.rs Done when:
- ✅ The intended config-loading contract is explicit.
- ✅ Library and CLI behavior are documented or unified.

Deeper Architectural Issues Second

Runtime Safety And Resource Control

✅ Make host-backed runtime operations respect Fozzy timeouts while the host call is actually in flight. Why: Host HTTP and host process steps block inside the step itself, while timeout checks happen only before and after step execution. Impact: A hung host process or slow endpoint can stall a run indefinitely despite --timeout. Evidence: src/runtime/engine.rs, src/runtime/engine.rs, src/runtime/engine.rs, src/runtime/engine.rs Done when:
- ✅ Hung host proc calls time out promptly.
- ✅ Hung host HTTP calls time out promptly.
- ✅ Timeout behavior is recorded and replayed coherently.
✅ Enforce host stdout/stderr and HTTP body limits during streaming, not after full buffering. Why: Current size checks happen after the whole payload has already been loaded into memory. Impact: The limits look protective but do not actually prevent memory spikes. Evidence: src/runtime/engine.rs, src/runtime/engine.rs Done when:
- ✅ Oversized host proc output is cut off safely during read.
- ✅ Oversized host HTTP bodies are aborted during read.
- ✅ The failure mode remains debuggable.

Caching And Lifecycle Semantics

✅ Rework process-global scenario caches so they do not serve stale content forever and do not grow without bound. Why: Parsed scenarios and compiled fuzz targets are cached globally for the lifetime of the process. Impact: Long-lived embeddings can observe stale scenarios and unbounded cache growth. Evidence: src/model/scenario.rs, src/modes/fuzz.rs Done when:
- ✅ Cache lifecycle is explicit.
- ✅ Scenario edits can be observed correctly, or the cache semantics are deliberately bounded and documented.
- ✅ Repeated unique temp paths do not cause unbounded growth.

Codebase Structure

✅ Break up oversized control-center modules. Why: Several core files are extremely large and mix unrelated responsibilities. Impact: This increases review cost, onboarding cost, and the likelihood that fixes land in one path but not its sibling path. Primary hotspots:
- src/runtime/engine.rs
- src/main.rs
- src/cmd/profile_cmd.rs Done when:
- ✅ Host backend logic is separated cleanly.
- ✅ Trace/summary finalization helpers are separated cleanly.
- ✅ CLI dispatch has clearer module boundaries.
- ✅ Profile subcommands have clearer module boundaries.
- ✅ Profile render/export helpers are separated cleanly.
- ✅ Profile diff/explain/metric analysis helpers are separated cleanly.
- ✅ Profile trace/timeline/profile builders are separated cleanly.
- ✅ Profile loading/doctor/support helpers are separated cleanly.
- ✅ Profile shared types and schema shapes are separated cleanly.
- ✅ Profile tests are separated cleanly from production command wiring.
- ✅ Full/gate CLI workflows are separated cleanly.
- ✅ CLI bootstrap/strict/error helpers are separated cleanly.
- ✅ Runtime init/scaffolding helpers are separated cleanly.
- ✅ Runtime doctor/preflight helpers are separated cleanly.
- ✅ Runtime test aggregation and trace-writing helpers are separated cleanly.
- ✅ Runtime run/replay/shrink orchestration helpers are separated cleanly.
- ✅ Runtime engine is now focused on execution core responsibilities.
- ✅ Behavior stays stable under regression tests.

Suggested Order

First Pass

✅ Trace metadata consistency
✅ Explicit missing-path failures in fozzy test
✅ init --config path handling
✅ False-green distributed-scenario handling in fozzy test
✅ Distributed validation parity
✅ Recursive nested-step validation
✅ Topology mapper degraded-read reporting
✅ SDK stream() error handling
✅ Shared summary/artifact finalization cleanup
✅ Repo artifact cleanup
✅ Config-loading contract cleanup

Second Pass

✅ Scenario cache lifecycle redesign
✅ Host timeout enforcement
✅ Streaming resource limits for host I/O
✅ Large-module refactors

Validation Expectations

✅ New behavior is covered by focused regression tests.
✅ Runtime-impacting fixes are validated with Fozzy-first flows:
- fozzy doctor --deep --scenario <scenario> --runs 5 --seed <seed> --json
- fozzy test --det --strict <scenarios...> --json
- fozzy run ... --det --record <trace.fozzy> --json
- fozzy trace verify <trace.fozzy> --strict --json
- fozzy replay <trace.fozzy> --json
- fozzy ci <trace.fozzy> --json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIXES

Quick Wins First

CLI And Artifact Correctness

Validation And Contract Drift

SDK And Local Developer Experience

Deeper Architectural Issues Second

Runtime Safety And Resource Control

Caching And Lifecycle Semantics

Codebase Structure

Suggested Order

First Pass

Second Pass

Validation Expectations

FilesExpand file tree

FIXES.md

Latest commit

History

FIXES.md

File metadata and controls

FIXES

Quick Wins First

CLI And Artifact Correctness

Validation And Contract Drift

SDK And Local Developer Experience

Deeper Architectural Issues Second

Runtime Safety And Resource Control

Caching And Lifecycle Semantics

Codebase Structure

Suggested Order

First Pass

Second Pass

Validation Expectations