[plan] sfx + music via elevenlabs · gemini video & image intelligence by cuio · Pull Request #25 · cuio/hyperframes

cuio · 2026-04-27T12:44:26Z

Tracking doc only — DRAFT. This PR exists as a permanent URL for the plan. Do not merge.
The next Claude session checks out this branch and starts at Phase 1.

TL;DR

Two parallel feature tracks that land the missing retention levers. Both feed the existing Storyline cockpit's applyPatch pipeline — no parallel write surfaces.

Track	What	Models	Cost
A — SFX	Per-scene generated sound effects on the existing SFX lane	Haiku proposes · ElevenLabs Sound Generation	~$0.05/video
B — Music	Background music tracks scoring multi-scene sections	Haiku proposes · ElevenLabs Music v3	~$0.10/video
C — Render review	Gemini analyses the rendered MP4 → structured retention feedback + scroll-risk windows	Gemini 2.5 Flash	~$0.05/video
D — Image analysis	Auto-detect role / vibe / suggested treatment on upload	Gemini 2.5 Flash	~$0.001/image
E — Scroll test	Per-scene "would they scroll?" prediction with one-change fix proposal	Gemini 2.5 Flash	~$0.005/scene
F — Retention map	Horizontal strip at the top of the Storyline tab — one square per scene, coloured by retention score	(composes C+E)	free

Total AI-augmented direction cost per video: under $0.70, below a single render's compute time.

Why this stack

Model	Best at	Used for
Haiku 4.5	Cheap, fast, scene-scoped directorial polish	SFX prompt generation, music prompt generation
ElevenLabs	Production audio (voice, SFX, music)	Generating actual audio assets
Gemini 2.5 Flash	Long-context video + image understanding	Render review, scroll prediction, image role detection

No overlap. Haiku proposes, ElevenLabs generates, Gemini judges.

How the pieces compose

```
┌─ Director (Storyline / Project) ─┐
│ Haiku, single textarea │
└──────────────┬───────────────────┘
│ proposes patches
┌─ Per-scene Director ──────────────┴──────────────────┐
│ "✦ Direct this scene" (already shipped) │
└──────────────┬────────────────────────────────────────┘
│ all funnel into …
┌──────────────▼─────────────────────────────────────────┐
│ applyPatch(sceneId, patch) → PUT /scenes/:id │
│ (single write surface) │
└──────────────┬─────────────────────────────────────────┘
│
└─ Storyline reload
│
├─ SFX manifest → SFX lane (Milestone A)
├─ Music manifest → Music lane (Milestone B)
└─ Retention map ← Gemini (C+E+F)
```

Suggested phase order

Read `.claude/handoffs/sfx-music-gemini-intelligence.md` end to end. It's the source of truth — endpoints, prompt shapes, tool schemas, cost notes, file paths, gotchas.
Milestone A (SFX) backend then frontend — fastest win, low risk, validates the manifest pattern.
Milestone C (Gemini render review) before B — the render-review feedback loop unlocks the rest.
Milestone B (Music) wired into the music lane.
Milestone D (image analysis) as a small standalone PR.
Milestone E (per-scene scroll test) — the retention killer feature.
Milestone F (retention map) — ties it all together at the top of the Storyline tab.

Out of scope this round

Per-scene voice cloning (different reader per scene = retention killer)
Auto-generated B-roll (Sora/Runway is a separate stack)
Render-blocking quality gates (Gemini is advisory, not gating)
Music-as-score timing-aware generation (phase 2)
Multi-language SFX (ElevenLabs SFX is English-only currently)

Reading order for the next session

This PR description
Then the full handoff doc on disk
Then start coding at Milestone A

🤖 Generated with Claude Code

feat: per-scene sfx via elevenlabs (milestone a of #25)

Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix. **Foundation** (`packages/core/src/gemini/`) - env.ts: GEMINI_API_KEY loader, mirrors anthropic/env.ts. - client.ts: REST client with uploadFile (resumable Files API) + generateStructured (function-tool call). Picked direct fetch over @google/genai SDK because the SDK pulls in gRPC + Vertex auth we don't need. - Cost-tracking: kind: "gemini" entries in CostOp. gemini-2.5-flash priced at $0.30/$2.50 per M tokens. **Milestone B — ElevenLabs Music** - elevenlabs/music.ts: async polled-job pattern (POST returns music_id, GET polls until completed, then download signed audio_url). Geometric backoff 2/4/8s, max 30s per poll, 5min total. - script/music/manifest.ts: same shape as SFX with scenesCovered (a track spans multiple scenes). resolveMusicSpan computes the master-timeline window from covered scenes. 9 unit tests. - assembler emits <audio data-track-index="2" data-timeline-group="music"> per entry, including data-music-duck-db so the producer applies a sidechain duck during voiceover. - 4 routes: music-suggest (Haiku) / music-generate (ElevenLabs polled job) / GET music / DELETE music/:entryId. - Frontend: 🎵 Music Wizard panel above Director — vibe textarea → Haiku proposes 1-3 tracks → click Generate per track. Applied tracks list below with Remove buttons. **Milestone C — Gemini render review** - POST /storyline/render-review uploads the most recent .mp4 from <project>/renders/ to Gemini Files API, prompts with script meta + per-scene timings, returns: overallRetentionScore (0-100) scrollRiskWindows[] — severity, why, one-sentence fix brandConsistency { score, drift[] } audioMix { voiceClarity, musicLevels, sfxBalance } perScene[] { visualHook, paceMatch, onBrand, note } - Persisted to .hyperframes/render-reviews/<ts>.json so reload shows the last review without re-running. GET /render-review serves it. - Frontend: Retention Review panel with overall-score chip, scroll-risk windows with timestamps, 3-column audio/brand summary. **Milestone E — Per-scene scroll test** - POST /storyline/scroll-test samples 3 frames per scene via the new adapter.extractVideoFrameToBytes hook, asks Gemini "would they scroll?". Returns verdict + sceneStrengthScore + optional concrete patch. - New per-card AI action: 📉 Scroll test. Patches drop into the amber suggestion stack, applied via the same pipeline. **Milestone F — Retention map** - Horizontal strip at the top of Storyline, one cell per scene, color-coded by retention strength (review's 3-dim avg, fallback to scrollTest score). Click → smooth-scroll to scene. Hidden until at least one signal lands. Tests: 728 core (+9 music-span), 281 studio. Lint, format, typecheck clean. Live verify: all four panels render in expanded sidebar mode, zero console errors. Plan: #25 (Milestones B + C + E + F shipped; D image analysis is the remaining piece, smaller follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(plan): sfx/music + gemini intelligence handoff brief

0d2ed46

cuio mentioned this pull request Apr 27, 2026

feat: per-scene sfx via elevenlabs (milestone a of #25) #26

Merged

7 tasks

cuio added a commit that referenced this pull request Apr 27, 2026

Merge pull request #26 from cuio/feat/storyline-sfx-elevenlabs

27cc52c

feat: per-scene sfx via elevenlabs (milestone a of #25)

This was referenced Apr 29, 2026

feat: gemini retention review + music + scroll test + retention map #28

Merged

feat: gemini image analyzer (auto-detect role/vibe/treatment on upload) #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plan] sfx + music via elevenlabs · gemini video & image intelligence#25

[plan] sfx + music via elevenlabs · gemini video & image intelligence#25
cuio wants to merge 1 commit intomainfrom
plan/sfx-music-and-gemini-intelligence

cuio commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cuio commented Apr 27, 2026

TL;DR

Why this stack

How the pieces compose

Suggested phase order

Out of scope this round

Reading order for the next session

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant