Skip to content

feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut)#42

Open
xne998808-ai wants to merge 1 commit into
mainfrom
feat/footage-edit
Open

feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut)#42
xne998808-ai wants to merge 1 commit into
mainfrom
feat/footage-edit

Conversation

@xne998808-ai

Copy link
Copy Markdown

What

A second ingest pipeline alongside synthesis: turn a folder of raw takes into a cut, with every edit decision living as diffable JSON. Productizes the core loop from Thariq Shihipar's deck "How Fable Edited Its Own Launch Video" — transcribe → pick the cleanest take (with written reasons) → frame-accurate cut → stitch → re-transcribe to self-verify.

This is Phase-0 of RFC-11 (the full north-star pipeline — subtitles + HTML/Remotion animated overlays composited onto the footage + motion config + grade — is specced in research/2026-06-11-spec-11-footage-edit.md v0.2 for later phases).

Changes

  • content-graph: new footage NodeKind (clipAssetId/in/out/firstWords/candidateClipAssetIds/selectionRationale). An EDL is just a content-graph of footage nodes — no separate format; reuses sequence edges + topoSort, so footage and synthesized nodes mix in one timeline.
  • core: SourceAdapter/Transcript/TranscriptWord types + SourceRegistry (the ingest-side counterpart to EngineAdapter); footage.ts with selectTake/buildFootageGraph (pure, unit-tested) + cutClip/concatClips (frame-accurate cut + concat-filter, reusing the mixed-engine PTS lesson).
  • adapter-whisper (new package): whisper-local SourceAdapter — on-device word-level transcription via whisper.cpp, friendly missing-tool hints, default ingest back-end (no upload, no fee).
  • cli: footage-edit --takes <dir> [--scenes <json>] [--model <ggml>] — runs the whole loop, writes final-edit.json (the EDL).
  • tests: 6 unit tests for the selection logic (no whisper/ffmpeg dependency). Also fixes core's test script (node --test test/ was broken on Node 26 → glob, matching the other packages).

Verification

  • pnpm -r build + pnpm -r typecheck: all 9 packages green
  • pnpm --filter @html-video/core test: 6/6 pass
  • End-to-end on synthetic + real footage: transcribe → select (zero-filler takes, warm-up trimmed) → cut → concat → re-transcribe self-check returns clean: true; output MP4 full-decodes clean (single v+a stream). Findings (incl. a real whisper language/model hallucination caught + the within-clip span-cut gap) in research/2026-06-11-spec-11-poc-findings.md.

Usage

export HTMLVIDEO_WHISPER_MODEL=/path/to/ggml-base.en.bin   # brew install whisper-cpp
html-video footage-edit --takes ./takes --scenes ./scenes.json --out final.mp4

Out of scope (later RFC-11 phases)

Subtitles, transcript-timed HTML/Remotion overlay compositing onto footage, global motion config, color grade, studio UI, within-clip span cutting.

🤖 Generated with Claude Code

… → select → cut)

Adds a second ingest pipeline alongside synthesis: turn raw takes into a cut,
with every edit decision living as diffable JSON. Reproduces the core loop from
Thariq Shihipar's "How Fable Edited Its Own Launch Video" deck, productized.

- content-graph: new `footage` NodeKind (clipAssetId/in/out/rationale/candidates)
  — an EDL is just a content-graph of footage nodes, no separate format
- core: SourceAdapter/Transcript types + SourceRegistry (ingest-side counterpart
  to EngineAdapter); footage.ts with selectTake/buildFootageGraph (pure, tested)
  + cutClip/concatClips (frame-accurate cut + concat-filter, reusing the
  mixed-engine PTS lesson)
- adapter-whisper: new package — whisper.cpp local SourceAdapter (word-level
  timestamps, on-device, friendly missing-tool hints)
- cli: `footage-edit` command — transcribe → select → cut → concat → re-transcribe
  verify; writes final-edit.json (the EDL)
- 6 unit tests for selection logic (no whisper/ffmpeg dep)

Verified end-to-end on real footage. RFC-11 v0.2 (research/) is the full-pipeline
north star (Phase 1+: subtitles + HTML/Remotion overlay compositing onto footage,
motion config, grade); this commit is Phase-0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lefarcen lefarcen requested a review from PerishCode June 11, 2026 13:21
@lefarcen lefarcen added size/XL Size XL (700-1499 LOC) risk/high High risk type/feature Feature change labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/high High risk size/XL Size XL (700-1499 LOC) type/feature Feature change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants