feat: gemini retention review + music + scroll test + retention map#28
Merged
feat: gemini retention review + music + scroll test + retention map#28
Conversation
Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix. **Foundation** (`packages/core/src/gemini/`) - env.ts: GEMINI_API_KEY loader, mirrors anthropic/env.ts. - client.ts: REST client with uploadFile (resumable Files API) + generateStructured (function-tool call). Picked direct fetch over @google/genai SDK because the SDK pulls in gRPC + Vertex auth we don't need. - Cost-tracking: kind: "gemini" entries in CostOp. gemini-2.5-flash priced at $0.30/$2.50 per M tokens. **Milestone B — ElevenLabs Music** - elevenlabs/music.ts: async polled-job pattern (POST returns music_id, GET polls until completed, then download signed audio_url). Geometric backoff 2/4/8s, max 30s per poll, 5min total. - script/music/manifest.ts: same shape as SFX with scenesCovered (a track spans multiple scenes). resolveMusicSpan computes the master-timeline window from covered scenes. 9 unit tests. - assembler emits <audio data-track-index="2" data-timeline-group="music"> per entry, including data-music-duck-db so the producer applies a sidechain duck during voiceover. - 4 routes: music-suggest (Haiku) / music-generate (ElevenLabs polled job) / GET music / DELETE music/:entryId. - Frontend: 🎵 Music Wizard panel above Director — vibe textarea → Haiku proposes 1-3 tracks → click Generate per track. Applied tracks list below with Remove buttons. **Milestone C — Gemini render review** - POST /storyline/render-review uploads the most recent .mp4 from <project>/renders/ to Gemini Files API, prompts with script meta + per-scene timings, returns: overallRetentionScore (0-100) scrollRiskWindows[] — severity, why, one-sentence fix brandConsistency { score, drift[] } audioMix { voiceClarity, musicLevels, sfxBalance } perScene[] { visualHook, paceMatch, onBrand, note } - Persisted to .hyperframes/render-reviews/<ts>.json so reload shows the last review without re-running. GET /render-review serves it. - Frontend: Retention Review panel with overall-score chip, scroll-risk windows with timestamps, 3-column audio/brand summary. **Milestone E — Per-scene scroll test** - POST /storyline/scroll-test samples 3 frames per scene via the new adapter.extractVideoFrameToBytes hook, asks Gemini "would they scroll?". Returns verdict + sceneStrengthScore + optional concrete patch. - New per-card AI action: 📉 Scroll test. Patches drop into the amber suggestion stack, applied via the same pipeline. **Milestone F — Retention map** - Horizontal strip at the top of Storyline, one cell per scene, color-coded by retention strength (review's 3-dim avg, fallback to scrollTest score). Click → smooth-scroll to scene. Hidden until at least one signal lands. Tests: 728 core (+9 music-span), 281 studio. Lint, format, typecheck clean. Live verify: all four panels render in expanded sidebar mode, zero console errors. Plan: #25 (Milestones B + C + E + F shipped; D image analysis is the remaining piece, smaller follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix.
Total AI-augmented pipeline cost per 2-min video: voiceover (
$0.40) + SFX ($0.05) + music ($0.10) + Gemini review + scroll-tests ($0.15) + Haiku polish (~$0.01) ≈ $0.71 — below a single render's compute time.Foundation
packages/core/src/gemini/— env loader (mirrors anthropic/env.ts) + REST client withuploadFile(Files API resumable protocol) +generateStructured<T>(function-tool call). Direct fetch over the@google/genaiSDK because the SDK pulls in gRPC + Vertex auth we don't need.kind: "gemini"added to CostOp; gemini-2.5-flash priced at $0.30/$2.50 per M tokens.extractVideoFrameToBytes(filePath, timeS) => Buffer | nullso the studio host can supply ffmpeg-based frame sampling for the scroll test.Milestone B — Music
End-to-end: Haiku proposes prompts with
scenesCovered, ElevenLabs generates via polled job, manifest atassets/music/music.manifest.json, assembler emits<audio data-track-index="2" data-timeline-group="music" data-music-duck-db="-12">per entry. The producer's mixer readsdata-music-duck-dbto apply a sidechain duck during voiceover.Frontend: amber 🎵 Music Wizard panel above the Director. Textarea → "Propose ↵" → 1-3 track suggestions with prompt/role/scenesCovered/duration → 🎵 Generate per track → applied tracks list below with Remove.
Milestone C — Gemini render review
POST /storyline/render-review:.mp4from<project>/renders/.hyperframes/render-reviews/<timestamp>.jsonso reload shows the last review without re-runningFrontend: 🔍 Run review CTA when no review yet → "Gemini is watching…" while running → result panel with overall-score chip (color-coded), scroll-risk windows with
0:14-0:23timestamps + per-window why+fix, and a 3-column audio/brand summary.Milestone E — Per-scene scroll test
POST /storyline/scroll-testper sceneId — samples 3 frames at start/mid/end via the new adapter hook, sends them + narration to Gemini, asks "would a feed viewer scroll past?". Returns verdict +sceneStrengthScore(0-100) + optional concretepatch(template/props/reasoning) the studio can apply.Frontend: new 📉 Scroll test per-card AI action. The optional patch drops into the existing amber suggestion stack with the same Apply pipeline as every other Haiku action — no new write surface.
Milestone F — Retention map
A small horizontal strip at the top of the Storyline tab. One cell per scene, color-coded:
Composes render-review's per-scene scores (avg of visualHook/paceMatch/onBrand) with scroll-test fallback when render-review hasn't run for a scene. Click any cell → smooth-scroll to that scene's card. Hidden until at least one signal lands so it doesn't add noise on a fresh project.
Architecture: same single-track data path
All four milestones funnel into the existing
applyPatch(sceneId, patch) → PUT /script/scenes/:idpipeline. The only new write surfaces are the audio file writes (SFX/music — manifest-tracked, soft-deletable). Gemini's outputs are advisory; nothing the model says auto-mutates the script.Test plan
assets/music/<id>.mp3lands and the music lane shows the clipWhat's deliberately deferred to a follow-up
🤖 Generated with Claude Code