[plan] sfx + music via elevenlabs · gemini video & image intelligence#25
Draft
[plan] sfx + music via elevenlabs · gemini video & image intelligence#25
Conversation
7 tasks
cuio
added a commit
that referenced
this pull request
Apr 27, 2026
feat: per-scene sfx via elevenlabs (milestone a of #25)
cuio
pushed a commit
that referenced
this pull request
Apr 29, 2026
Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix. **Foundation** (`packages/core/src/gemini/`) - env.ts: GEMINI_API_KEY loader, mirrors anthropic/env.ts. - client.ts: REST client with uploadFile (resumable Files API) + generateStructured (function-tool call). Picked direct fetch over @google/genai SDK because the SDK pulls in gRPC + Vertex auth we don't need. - Cost-tracking: kind: "gemini" entries in CostOp. gemini-2.5-flash priced at $0.30/$2.50 per M tokens. **Milestone B — ElevenLabs Music** - elevenlabs/music.ts: async polled-job pattern (POST returns music_id, GET polls until completed, then download signed audio_url). Geometric backoff 2/4/8s, max 30s per poll, 5min total. - script/music/manifest.ts: same shape as SFX with scenesCovered (a track spans multiple scenes). resolveMusicSpan computes the master-timeline window from covered scenes. 9 unit tests. - assembler emits <audio data-track-index="2" data-timeline-group="music"> per entry, including data-music-duck-db so the producer applies a sidechain duck during voiceover. - 4 routes: music-suggest (Haiku) / music-generate (ElevenLabs polled job) / GET music / DELETE music/:entryId. - Frontend: 🎵 Music Wizard panel above Director — vibe textarea → Haiku proposes 1-3 tracks → click Generate per track. Applied tracks list below with Remove buttons. **Milestone C — Gemini render review** - POST /storyline/render-review uploads the most recent .mp4 from <project>/renders/ to Gemini Files API, prompts with script meta + per-scene timings, returns: overallRetentionScore (0-100) scrollRiskWindows[] — severity, why, one-sentence fix brandConsistency { score, drift[] } audioMix { voiceClarity, musicLevels, sfxBalance } perScene[] { visualHook, paceMatch, onBrand, note } - Persisted to .hyperframes/render-reviews/<ts>.json so reload shows the last review without re-running. GET /render-review serves it. - Frontend: Retention Review panel with overall-score chip, scroll-risk windows with timestamps, 3-column audio/brand summary. **Milestone E — Per-scene scroll test** - POST /storyline/scroll-test samples 3 frames per scene via the new adapter.extractVideoFrameToBytes hook, asks Gemini "would they scroll?". Returns verdict + sceneStrengthScore + optional concrete patch. - New per-card AI action: 📉 Scroll test. Patches drop into the amber suggestion stack, applied via the same pipeline. **Milestone F — Retention map** - Horizontal strip at the top of Storyline, one cell per scene, color-coded by retention strength (review's 3-dim avg, fallback to scrollTest score). Click → smooth-scroll to scene. Hidden until at least one signal lands. Tests: 728 core (+9 music-span), 281 studio. Lint, format, typecheck clean. Live verify: all four panels render in expanded sidebar mode, zero console errors. Plan: #25 (Milestones B + C + E + F shipped; D image analysis is the remaining piece, smaller follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Two parallel feature tracks that land the missing retention levers. Both feed the existing Storyline cockpit's
applyPatchpipeline — no parallel write surfaces.Total AI-augmented direction cost per video: under $0.70, below a single render's compute time.
Why this stack
No overlap. Haiku proposes, ElevenLabs generates, Gemini judges.
How the pieces compose
```
┌─ Director (Storyline / Project) ─┐
│ Haiku, single textarea │
└──────────────┬───────────────────┘
│ proposes patches
┌─ Per-scene Director ──────────────┴──────────────────┐
│ "✦ Direct this scene" (already shipped) │
└──────────────┬────────────────────────────────────────┘
│ all funnel into …
┌──────────────▼─────────────────────────────────────────┐
│ applyPatch(sceneId, patch) → PUT /scenes/:id │
│ (single write surface) │
└──────────────┬─────────────────────────────────────────┘
│
└─ Storyline reload
│
├─ SFX manifest → SFX lane (Milestone A)
├─ Music manifest → Music lane (Milestone B)
└─ Retention map ← Gemini (C+E+F)
```
Suggested phase order
Out of scope this round
Reading order for the next session
🤖 Generated with Claude Code