feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut)#42
Open
xne998808-ai wants to merge 1 commit into
Open
feat(footage): RFC-11 Phase-0 — edit real footage as text (transcribe → select → cut)#42xne998808-ai wants to merge 1 commit into
xne998808-ai wants to merge 1 commit into
Conversation
… → select → cut) Adds a second ingest pipeline alongside synthesis: turn raw takes into a cut, with every edit decision living as diffable JSON. Reproduces the core loop from Thariq Shihipar's "How Fable Edited Its Own Launch Video" deck, productized. - content-graph: new `footage` NodeKind (clipAssetId/in/out/rationale/candidates) — an EDL is just a content-graph of footage nodes, no separate format - core: SourceAdapter/Transcript types + SourceRegistry (ingest-side counterpart to EngineAdapter); footage.ts with selectTake/buildFootageGraph (pure, tested) + cutClip/concatClips (frame-accurate cut + concat-filter, reusing the mixed-engine PTS lesson) - adapter-whisper: new package — whisper.cpp local SourceAdapter (word-level timestamps, on-device, friendly missing-tool hints) - cli: `footage-edit` command — transcribe → select → cut → concat → re-transcribe verify; writes final-edit.json (the EDL) - 6 unit tests for selection logic (no whisper/ffmpeg dep) Verified end-to-end on real footage. RFC-11 v0.2 (research/) is the full-pipeline north star (Phase 1+: subtitles + HTML/Remotion overlay compositing onto footage, motion config, grade); this commit is Phase-0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A second ingest pipeline alongside synthesis: turn a folder of raw takes into a cut, with every edit decision living as diffable JSON. Productizes the core loop from Thariq Shihipar's deck "How Fable Edited Its Own Launch Video" — transcribe → pick the cleanest take (with written reasons) → frame-accurate cut → stitch → re-transcribe to self-verify.
This is Phase-0 of RFC-11 (the full north-star pipeline — subtitles + HTML/Remotion animated overlays composited onto the footage + motion config + grade — is specced in
research/2026-06-11-spec-11-footage-edit.mdv0.2 for later phases).Changes
footageNodeKind (clipAssetId/in/out/firstWords/candidateClipAssetIds/selectionRationale). An EDL is just a content-graph of footage nodes — no separate format; reusessequenceedges + topoSort, so footage and synthesized nodes mix in one timeline.SourceAdapter/Transcript/TranscriptWordtypes +SourceRegistry(the ingest-side counterpart toEngineAdapter);footage.tswithselectTake/buildFootageGraph(pure, unit-tested) +cutClip/concatClips(frame-accurate cut + concat-filter, reusing the mixed-engine PTS lesson).whisper-localSourceAdapter — on-device word-level transcription via whisper.cpp, friendly missing-tool hints, default ingest back-end (no upload, no fee).footage-edit --takes <dir> [--scenes <json>] [--model <ggml>]— runs the whole loop, writesfinal-edit.json(the EDL).testscript (node --test test/was broken on Node 26 → glob, matching the other packages).Verification
pnpm -r build+pnpm -r typecheck: all 9 packages greenpnpm --filter @html-video/core test: 6/6 passclean: true; output MP4 full-decodes clean (single v+a stream). Findings (incl. a real whisper language/model hallucination caught + the within-clip span-cut gap) inresearch/2026-06-11-spec-11-poc-findings.md.Usage
Out of scope (later RFC-11 phases)
Subtitles, transcript-timed HTML/Remotion overlay compositing onto footage, global motion config, color grade, studio UI, within-clip span cutting.
🤖 Generated with Claude Code