feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut) by xne998808-ai · Pull Request #42 · nexu-io/html-video

xne998808-ai · 2026-06-11T13:17:31Z

What

A second ingest pipeline alongside synthesis: turn a folder of raw takes into a cut, with every edit decision living as diffable JSON. Productizes the core loop from Thariq Shihipar's deck "How Fable Edited Its Own Launch Video" — transcribe → pick the cleanest take (with written reasons) → frame-accurate cut → stitch → re-transcribe to self-verify.

This is Phase-0 of RFC-11 (the full north-star pipeline — subtitles + HTML/Remotion animated overlays composited onto the footage + motion config + grade — is specced in research/2026-06-11-spec-11-footage-edit.md v0.2 for later phases).

Changes

content-graph: new footage NodeKind (clipAssetId/in/out/firstWords/candidateClipAssetIds/selectionRationale). An EDL is just a content-graph of footage nodes — no separate format; reuses sequence edges + topoSort, so footage and synthesized nodes mix in one timeline.
core: SourceAdapter/Transcript/TranscriptWord types + SourceRegistry (the ingest-side counterpart to EngineAdapter); footage.ts with selectTake/buildFootageGraph (pure, unit-tested) + cutClip/concatClips (frame-accurate cut + concat-filter, reusing the mixed-engine PTS lesson).
adapter-whisper (new package): whisper-local SourceAdapter — on-device word-level transcription via whisper.cpp, friendly missing-tool hints, default ingest back-end (no upload, no fee).
cli: footage-edit --takes <dir> [--scenes <json>] [--model <ggml>] — runs the whole loop, writes final-edit.json (the EDL).
tests: 6 unit tests for the selection logic (no whisper/ffmpeg dependency). Also fixes core's test script (node --test test/ was broken on Node 26 → glob, matching the other packages).

Verification

pnpm -r build + pnpm -r typecheck: all 9 packages green
pnpm --filter @html-video/core test: 6/6 pass
End-to-end on synthetic + real footage: transcribe → select (zero-filler takes, warm-up trimmed) → cut → concat → re-transcribe self-check returns clean: true; output MP4 full-decodes clean (single v+a stream). Findings (incl. a real whisper language/model hallucination caught + the within-clip span-cut gap) in research/2026-06-11-spec-11-poc-findings.md.

Usage

export HTMLVIDEO_WHISPER_MODEL=/path/to/ggml-base.en.bin   # brew install whisper-cpp
html-video footage-edit --takes ./takes --scenes ./scenes.json --out final.mp4

Out of scope (later RFC-11 phases)

Subtitles, transcript-timed HTML/Remotion overlay compositing onto footage, global motion config, color grade, studio UI, within-clip span cutting.

🤖 Generated with Claude Code

… → select → cut) Adds a second ingest pipeline alongside synthesis: turn raw takes into a cut, with every edit decision living as diffable JSON. Reproduces the core loop from Thariq Shihipar's "How Fable Edited Its Own Launch Video" deck, productized. - content-graph: new `footage` NodeKind (clipAssetId/in/out/rationale/candidates) — an EDL is just a content-graph of footage nodes, no separate format - core: SourceAdapter/Transcript types + SourceRegistry (ingest-side counterpart to EngineAdapter); footage.ts with selectTake/buildFootageGraph (pure, tested) + cutClip/concatClips (frame-accurate cut + concat-filter, reusing the mixed-engine PTS lesson) - adapter-whisper: new package — whisper.cpp local SourceAdapter (word-level timestamps, on-device, friendly missing-tool hints) - cli: `footage-edit` command — transcribe → select → cut → concat → re-transcribe verify; writes final-edit.json (the EDL) - 6 unit tests for selection logic (no whisper/ffmpeg dep) Verified end-to-end on real footage. RFC-11 v0.2 (research/) is the full-pipeline north star (Phase 1+: subtitles + HTML/Remotion overlay compositing onto footage, motion config, grade); this commit is Phase-0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lefarcen requested a review from PerishCode June 11, 2026 13:21

lefarcen added size/XL Size XL (700-1499 LOC) risk/high High risk type/feature Feature change labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut)#42

feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut)#42
xne998808-ai wants to merge 1 commit into
mainfrom
feat/footage-edit

xne998808-ai commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xne998808-ai commented Jun 11, 2026

What

Changes

Verification

Usage

Out of scope (later RFC-11 phases)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants