Skip to content

feat(replay): operational integrity scripts (verify, find-gaps, reconcile)#351

Open
ctol3r wants to merge 1 commit into
wave/replay-survivability-w14from
wave/replay-integrity-scripts
Open

feat(replay): operational integrity scripts (verify, find-gaps, reconcile)#351
ctol3r wants to merge 1 commit into
wave/replay-survivability-w14from
wave/replay-integrity-scripts

Conversation

@ctol3r
Copy link
Copy Markdown
Owner

@ctol3r ctol3r commented May 12, 2026

Summary

Stacked on #344 (Wave 14 survivability suite). Adds three read-only CLI tools under scripts/replay/ that operators can run against arbitrary evidence inputs (JSON fixtures or, in future, exported snapshot manifests) to verify the canonical replay-identity contract from PR #343 + #344.

Where #344 pins survivability inside an in-process jest run, these scripts let operators verify the contract against real production evidence after restart, deploy, or as part of CI integrity gates.

Files

File Role
scripts/replay/_lib.ts Shared helpers: parseArgs, normalizeEvidence, loadEvidenceFromFile, chronologyKey, hoursBetween, JSON-line emitters
scripts/replay/verify-replay-integrity.ts Recomputes lineageKey + runId and asserts determinism, cosmetic-invariance, sensitivity, optional expected-match
scripts/replay/find-replay-gaps.ts Scans chronological snapshots for continuity gaps above --max-gap-hours
scripts/replay/reconcile-lineage.ts Compares two evidence candidates → same-lineage-same-snapshot / same-lineage-different-snapshot / different-lineage
apps/web/__tests__/replay-scripts.test.ts 28 vitest cases — every exported pure function + _lib helpers

Exit-code taxonomy

Script 0 2 3 / 4 / 5 / 6
verify-replay-integrity integrity verified bad input 3 = integrity FAILED
find-replay-gaps continuous bad input 4 = gap(s) detected
reconcile-lineage same subject + same snapshot bad input 5 = same lineage, different snapshot; 6 = different lineages (verifier action required)

Distinct exit codes mean a CI pipeline can react differently — gap detection is informational (exit 4) while different-lineage on a receipt reconciliation is a hard alarm (exit 6).

Survivability matrix coverage

The scripts collectively cover all six runtime-turbulence scenarios from docs/architecture/replay-survivability-matrix.md (shipped in #344):

Scenario Script
Deploy replacement verify-replay-integrity — determinism + expected-match
Replay corruption attempt verify-replay-integrity — sensitivity + expected-match
Degraded restoration verify-replay-integrity on degraded fixture
Runtime restart verify-replay-integrity — determinism over 25 iterations
Partial persistence outage reconcile-lineage — surfaces evidence delta
Stale replay recovery verify-replay-integrity — wall-clock-independent ids
Audit-chain gap detection find-replay-gaps
Cross-subject reconciliation reconcile-lineage

Truth rules

  • Banned-strings scan: CLEAN
  • No new product claims; scripts are read-only operational tooling

Validation

  • Targeted vitest: 28/28 passing
  • Full web build: pnpm turbo run build --filter @vitalcv/web13/13 tasks
  • No new dependencies; scripts use only Node built-ins + existing replayIdentity module
  • Tests live under apps/web/__tests__/ so they run via the existing vitest config; scripts themselves remain at repo root per the brief

Usage

pnpm exec tsx scripts/replay/verify-replay-integrity.ts \
  --evidence path/to/evidence.json \
  [--expected-run-id run_v1_...]

pnpm exec tsx scripts/replay/find-replay-gaps.ts \
  --snapshots path/to/snapshots.json \
  [--max-gap-hours 720]

pnpm exec tsx scripts/replay/reconcile-lineage.ts \
  --left path/to/left.json \
  --right path/to/right.json

Out of scope (explicit follow-ups)

  • Direct Prisma integration (read evidence from the live DB without JSON fixtures) — needs a thin connector + a per-NPI evidence fetcher; bounded follow-up
  • A scripts/runtime/verify-runtime-truth.ts (the runtime-coherence gate mentioned in a separate brief) — different concern, different worktree

…cile)

Stacked on #344 (Wave 14 survivability suite). Adds three read-only
CLI tools under scripts/replay/ that exercise the canonical
replay-identity contract from PR #343 + #344 against arbitrary
evidence inputs (JSON fixtures or, in future, Prisma snapshots).
Operationally these are the offline / CI equivalents of the
survivability properties already pinned by the in-process jest suite
in apps/api/backend/src/services/replay/__tests__/.

New files:

  scripts/replay/_lib.ts
    parseArgs, normalizeEvidence, loadEvidenceFromFile, chronologyKey,
    hoursBetween, emitJsonLine / emitHumanLine — dependency-light
    helpers shared across all three scripts.

  scripts/replay/verify-replay-integrity.ts
    Recomputes lineageKey + runId for an evidence set and verifies:
      1. Determinism — 25 recomputations collapse to one runId
      2. Cosmetic-input invariance — whitespace/order changes are absorbed
      3. Sensitivity — tampered inputs always diverge
      4. (Optional) expected-match — runId matches an externally-provided
         expected value (audit-corruption signal)
    Exit codes: 0 ok / 2 bad input / 3 integrity FAILED

  scripts/replay/find-replay-gaps.ts
    Scans a chronological list of snapshots (same lineageKey) and
    reports continuity gaps longer than --max-gap-hours (default 720h).
    Exit codes: 0 continuous / 2 bad input / 4 gap(s) detected.

  scripts/replay/reconcile-lineage.ts
    Reconciles two candidate evidence sets:
      same-lineage-same-snapshot (exit 0) — perfectly reconciled
      same-lineage-different-snapshot (exit 5) — lineage continuous,
        evidence delta surfaced (artifactsAdded/Removed,
        lastCheckedAtChanged, channelChanged)
      different-lineage (exit 6) — DIFFERENT SUBJECTS, verifier
        action required

  apps/web/__tests__/replay-scripts.test.ts
    28 vitest cases — every exported pure function exercised, plus
    _lib helpers (parseArgs, normalizeEvidence, chronologyKey,
    hoursBetween).

The three pure functions (verifyReplayIntegrity, findReplayGaps,
reconcileLineage) are exported separately from the CLI main()
wrappers so they're directly unit-testable without process spawning
or filesystem fixtures.

Survivability properties enforced by these scripts complement #344's
in-process suite: where #344 pins the contract inside a jest run, these
scripts let operators verify the contract against real production
evidence (e.g. an exported snapshot manifest) post-restart, post-deploy,
or as part of CI integrity gates.

Validation: 28/28 vitest passing; pnpm turbo run build --filter
@vitalcv/web → 13/13 tasks; truth-strings CLEAN.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vcv-web Ready Ready Preview, Comment May 12, 2026 11:32pm
vitalcv Ready Ready Preview, Comment May 12, 2026 11:32pm

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bf9cce270

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +161 to +162
if (typeof require !== 'undefined' && require.main === module) {
main();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use ESM-safe entrypoint detection for CLI main()

This file is executed as an ES module (#!/usr/bin/env -S node --import=tsx/esm), but the require.main === module guard is CommonJS-only. In ESM, require is undefined, so this condition is false and main() never runs, causing the script to exit without performing any checks; the same guard pattern is also present in scripts/replay/find-replay-gaps.ts and scripts/replay/reconcile-lineage.ts.

Useful? React with 👍 / 👎.

const tamperedCheckedAt = computeRunId({
...evidence,
lastCheckedAt: evidence.lastCheckedAt
? new Date(new Date(evidence.lastCheckedAt).getTime() + 1000).toISOString()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle invalid lastCheckedAt before calling toISOString

normalizeEvidence accepts any non-empty string for lastCheckedAt, but this sensitivity check immediately does new Date(...).toISOString(). For invalid timestamp strings, getTime() becomes NaN and toISOString() throws RangeError, so bad input crashes the script instead of producing the documented bad-input behavior.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants