diff --git a/.claude/handoffs/sfx-music-gemini-intelligence.md b/.claude/handoffs/sfx-music-gemini-intelligence.md new file mode 100644 index 000000000..dd1a7cbc2 --- /dev/null +++ b/.claude/handoffs/sfx-music-gemini-intelligence.md @@ -0,0 +1,390 @@ +# Plan: ElevenLabs SFX/Music + Gemini video & image intelligence + +> **For the next Claude session.** This document is a complete brief — read top +> to bottom, then start at Phase 1. Two parallel feature tracks (audio + AI +> intelligence) that converge on retention engineering. Keep the existing +> Storyline cockpit's apply-pipeline as the spine; everything new feeds patches +> into it. + +## Why + +The cockpit can now write and direct copy, swap themes, and reorder scenes — +but two big retention levers are still missing: + +1. **Audio beyond voiceover.** Most viewers' affect is set by the _bed_ (music + - SFX), not the narration. We have a Music and SFX lane on the timeline + already (PR #20 wired the placeholders) but nothing flows in. +2. **Pre-render visual judgement.** We compose carefully but ship blind. The + only "is this on-brand?" check is the user's eye on a finished render. + Gemini's video understanding lets us close that loop _during_ direction, + not after. + +Both are LLM-driven in design — the studio asks the user what they want, the +model proposes, the user accepts à la carte. Same UX language as Director. + +## What ships, broken into milestones + +### Milestone A — ElevenLabs SFX (1 PR, ~2 days) + +Per-scene generated sound effects landing on the existing SFX lane. + +- New backend route `POST /storyline/sfx-suggest` — Haiku reads one scene + (window same as scene-intent: focal + ±2) and proposes 1-3 SFX ideas with + text prompts and durations. Returns `{ suggestions: [{ id, prompt, +durationS, anchor: "scene-start" | "accent-word" | "scene-end" }] }`. +- New backend route `POST /storyline/sfx-generate` — takes a suggestion + a + destination path, calls ElevenLabs Sound Generation + (`POST /v1/sound-generation`), writes mp3 to `assets/sfx/-.mp3`, + appends an entry to `assets/sfx/sfx.manifest.json`. +- Assembler (already supports the SFX track placeholder): consume the + manifest and emit one `