feat: gemini retention review + music + scroll test + retention map by cuio · Pull Request #28 · cuio/hyperframes

cuio · 2026-04-29T08:59:51Z

Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix.

	What	Provider	Cost
B	ElevenLabs Music — multi-scene background tracks landing on the existing Music lane	Haiku proposes · ElevenLabs Music v3 polled job	~$0.10/video
C	Gemini render review — analyses the rendered MP4 → structured retention feedback + scroll-risk windows + per-scene scores	Gemini 2.5 Flash	~$0.05/video
E	Per-scene scroll test — samples 3 frames + narration → "would they scroll?" + a one-change fix that drops as an applyable patch	Gemini 2.5 Flash	~$0.005/scene
F	Retention map — horizontal strip at the top of Storyline visualising C's per-scene scores (with E as fallback)	(composes C+E)	free

Total AI-augmented pipeline cost per 2-min video: voiceover (~~$0.40) + SFX (~~$0.05) + music (~~$0.10) + Gemini review + scroll-tests (~~$0.15) + Haiku polish (~$0.01) ≈ $0.71 — below a single render's compute time.

Foundation

packages/core/src/gemini/ — env loader (mirrors anthropic/env.ts) + REST client with uploadFile (Files API resumable protocol) + generateStructured<T> (function-tool call). Direct fetch over the @google/genai SDK because the SDK pulls in gRPC + Vertex auth we don't need.
kind: "gemini" added to CostOp; gemini-2.5-flash priced at $0.30/$2.50 per M tokens.
New optional adapter hook extractVideoFrameToBytes(filePath, timeS) => Buffer | null so the studio host can supply ffmpeg-based frame sampling for the scroll test.

Milestone B — Music

End-to-end: Haiku proposes prompts with scenesCovered, ElevenLabs generates via polled job, manifest at assets/music/music.manifest.json, assembler emits <audio data-track-index="2" data-timeline-group="music" data-music-duck-db="-12"> per entry. The producer's mixer reads data-music-duck-db to apply a sidechain duck during voiceover.

Frontend: amber 🎵 Music Wizard panel above the Director. Textarea → "Propose ↵" → 1-3 track suggestions with prompt/role/scenesCovered/duration → 🎵 Generate per track → applied tracks list below with Remove.

Milestone C — Gemini render review

POST /storyline/render-review:

Picks the most recent .mp4 from <project>/renders/
Uploads to Gemini Files API (resumable, polls until ACTIVE)
Prompts with the full script + per-scene timings

Returns:

{ overallRetentionScore: number,        // 0-100
  scrollRiskWindows: [{ startS, endS, severity, why, fix }],
  brandConsistency: { score, drift[] },
  audioMix: { voiceClarity, musicLevels, sfxBalance },
  perScene: [{ sceneId, visualHook, paceMatch, onBrand, note }] }

Persists to .hyperframes/render-reviews/<timestamp>.json so reload shows the last review without re-running

Frontend: 🔍 Run review CTA when no review yet → "Gemini is watching…" while running → result panel with overall-score chip (color-coded), scroll-risk windows with 0:14-0:23 timestamps + per-window why+fix, and a 3-column audio/brand summary.

Milestone E — Per-scene scroll test

POST /storyline/scroll-test per sceneId — samples 3 frames at start/mid/end via the new adapter hook, sends them + narration to Gemini, asks "would a feed viewer scroll past?". Returns verdict + sceneStrengthScore (0-100) + optional concrete patch (template/props/reasoning) the studio can apply.

Frontend: new 📉 Scroll test per-card AI action. The optional patch drops into the existing amber suggestion stack with the same Apply pipeline as every other Haiku action — no new write surface.

Milestone F — Retention map

A small horizontal strip at the top of the Storyline tab. One cell per scene, color-coded:

Green (≥70): hold-power
Amber (40-69): warn
Red (<40): scroll signal

Composes render-review's per-scene scores (avg of visualHook/paceMatch/onBrand) with scroll-test fallback when render-review hasn't run for a scene. Click any cell → smooth-scroll to that scene's card. Hidden until at least one signal lands so it doesn't add noise on a fresh project.

Architecture: same single-track data path

All four milestones funnel into the existing applyPatch(sceneId, patch) → PUT /script/scenes/:id pipeline. The only new write surfaces are the audio file writes (SFX/music — manifest-tracked, soft-deletable). Gemini's outputs are advisory; nothing the model says auto-mutates the script.

Test plan

728 core tests pass (was 718; +9 music-span helpers + 1 net pickup from refactor)
281 studio tests pass
Lint, format, typecheck clean across the whole tree
Live verify in expanded Storyline: all four panels render (Retention Review CTA · Music Wizard · Director · scene cards), zero console errors
Manual: type a vibe in Music Wizard, click Propose ↵, then Generate on one — confirm assets/music/<id>.mp3 lands and the music lane shows the clip
Manual: render once via the existing render pipeline, click Run review — confirm Gemini watches, scroll-risk windows surface with timestamps, retention map appears at the top
Manual: click 📉 Scroll test on a scene — confirm verdict + score + (when applicable) an applyable patch in the suggestion stack

What's deliberately deferred to a follow-up

Milestone D (image analysis) — auto-detect role/vibe/treatment on upload via Gemini. Smaller standalone PR; the rest of the plan doesn't depend on it.
Render-blocking quality gates — Gemini stays advisory. Never gates a render.
Multi-language SFX prompts — ElevenLabs SFX is English-only currently.

🤖 Generated with Claude Code

Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix. **Foundation** (`packages/core/src/gemini/`) - env.ts: GEMINI_API_KEY loader, mirrors anthropic/env.ts. - client.ts: REST client with uploadFile (resumable Files API) + generateStructured (function-tool call). Picked direct fetch over @google/genai SDK because the SDK pulls in gRPC + Vertex auth we don't need. - Cost-tracking: kind: "gemini" entries in CostOp. gemini-2.5-flash priced at $0.30/$2.50 per M tokens. **Milestone B — ElevenLabs Music** - elevenlabs/music.ts: async polled-job pattern (POST returns music_id, GET polls until completed, then download signed audio_url). Geometric backoff 2/4/8s, max 30s per poll, 5min total. - script/music/manifest.ts: same shape as SFX with scenesCovered (a track spans multiple scenes). resolveMusicSpan computes the master-timeline window from covered scenes. 9 unit tests. - assembler emits <audio data-track-index="2" data-timeline-group="music"> per entry, including data-music-duck-db so the producer applies a sidechain duck during voiceover. - 4 routes: music-suggest (Haiku) / music-generate (ElevenLabs polled job) / GET music / DELETE music/:entryId. - Frontend: 🎵 Music Wizard panel above Director — vibe textarea → Haiku proposes 1-3 tracks → click Generate per track. Applied tracks list below with Remove buttons. **Milestone C — Gemini render review** - POST /storyline/render-review uploads the most recent .mp4 from <project>/renders/ to Gemini Files API, prompts with script meta + per-scene timings, returns: overallRetentionScore (0-100) scrollRiskWindows[] — severity, why, one-sentence fix brandConsistency { score, drift[] } audioMix { voiceClarity, musicLevels, sfxBalance } perScene[] { visualHook, paceMatch, onBrand, note } - Persisted to .hyperframes/render-reviews/<ts>.json so reload shows the last review without re-running. GET /render-review serves it. - Frontend: Retention Review panel with overall-score chip, scroll-risk windows with timestamps, 3-column audio/brand summary. **Milestone E — Per-scene scroll test** - POST /storyline/scroll-test samples 3 frames per scene via the new adapter.extractVideoFrameToBytes hook, asks Gemini "would they scroll?". Returns verdict + sceneStrengthScore + optional concrete patch. - New per-card AI action: 📉 Scroll test. Patches drop into the amber suggestion stack, applied via the same pipeline. **Milestone F — Retention map** - Horizontal strip at the top of Storyline, one cell per scene, color-coded by retention strength (review's 3-dim avg, fallback to scrollTest score). Click → smooth-scroll to scene. Hidden until at least one signal lands. Tests: 728 core (+9 music-span), 281 studio. Lint, format, typecheck clean. Live verify: all four panels render in expanded sidebar mode, zero console errors. Plan: #25 (Milestones B + C + E + F shipped; D image analysis is the remaining piece, smaller follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

cuio merged commit ffd8c41 into main Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gemini retention review + music + scroll test + retention map#28

feat: gemini retention review + music + scroll test + retention map#28
cuio merged 1 commit intomainfrom
feat/storyline-gemini-music-retention

cuio commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cuio commented Apr 29, 2026

Foundation

Milestone B — Music

Milestone C — Gemini render review

Milestone E — Per-scene scroll test

Milestone F — Retention map

Architecture: same single-track data path

Test plan

What's deliberately deferred to a follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant