Skip to content

feat: gemini retention review + music + scroll test + retention map#28

Merged
cuio merged 1 commit intomainfrom
feat/storyline-gemini-music-retention
Apr 30, 2026
Merged

feat: gemini retention review + music + scroll test + retention map#28
cuio merged 1 commit intomainfrom
feat/storyline-gemini-music-retention

Conversation

@cuio
Copy link
Copy Markdown
Owner

@cuio cuio commented Apr 29, 2026

Four milestones from #25 in one drop. Closes the retention feedback loop: write → render → grade → fix.

What Provider Cost
B ElevenLabs Music — multi-scene background tracks landing on the existing Music lane Haiku proposes · ElevenLabs Music v3 polled job ~$0.10/video
C Gemini render review — analyses the rendered MP4 → structured retention feedback + scroll-risk windows + per-scene scores Gemini 2.5 Flash ~$0.05/video
E Per-scene scroll test — samples 3 frames + narration → "would they scroll?" + a one-change fix that drops as an applyable patch Gemini 2.5 Flash ~$0.005/scene
F Retention map — horizontal strip at the top of Storyline visualising C's per-scene scores (with E as fallback) (composes C+E) free

Total AI-augmented pipeline cost per 2-min video: voiceover ($0.40) + SFX ($0.05) + music ($0.10) + Gemini review + scroll-tests ($0.15) + Haiku polish (~$0.01) ≈ $0.71 — below a single render's compute time.

Foundation

  • packages/core/src/gemini/ — env loader (mirrors anthropic/env.ts) + REST client with uploadFile (Files API resumable protocol) + generateStructured<T> (function-tool call). Direct fetch over the @google/genai SDK because the SDK pulls in gRPC + Vertex auth we don't need.
  • kind: "gemini" added to CostOp; gemini-2.5-flash priced at $0.30/$2.50 per M tokens.
  • New optional adapter hook extractVideoFrameToBytes(filePath, timeS) => Buffer | null so the studio host can supply ffmpeg-based frame sampling for the scroll test.

Milestone B — Music

End-to-end: Haiku proposes prompts with scenesCovered, ElevenLabs generates via polled job, manifest at assets/music/music.manifest.json, assembler emits <audio data-track-index="2" data-timeline-group="music" data-music-duck-db="-12"> per entry. The producer's mixer reads data-music-duck-db to apply a sidechain duck during voiceover.

Frontend: amber 🎵 Music Wizard panel above the Director. Textarea → "Propose ↵" → 1-3 track suggestions with prompt/role/scenesCovered/duration → 🎵 Generate per track → applied tracks list below with Remove.

Milestone C — Gemini render review

POST /storyline/render-review:

  1. Picks the most recent .mp4 from <project>/renders/
  2. Uploads to Gemini Files API (resumable, polls until ACTIVE)
  3. Prompts with the full script + per-scene timings
  4. Returns:
    { overallRetentionScore: number,        // 0-100
      scrollRiskWindows: [{ startS, endS, severity, why, fix }],
      brandConsistency: { score, drift[] },
      audioMix: { voiceClarity, musicLevels, sfxBalance },
      perScene: [{ sceneId, visualHook, paceMatch, onBrand, note }] }
  5. Persists to .hyperframes/render-reviews/<timestamp>.json so reload shows the last review without re-running

Frontend: 🔍 Run review CTA when no review yet → "Gemini is watching…" while running → result panel with overall-score chip (color-coded), scroll-risk windows with 0:14-0:23 timestamps + per-window why+fix, and a 3-column audio/brand summary.

Milestone E — Per-scene scroll test

POST /storyline/scroll-test per sceneId — samples 3 frames at start/mid/end via the new adapter hook, sends them + narration to Gemini, asks "would a feed viewer scroll past?". Returns verdict + sceneStrengthScore (0-100) + optional concrete patch (template/props/reasoning) the studio can apply.

Frontend: new 📉 Scroll test per-card AI action. The optional patch drops into the existing amber suggestion stack with the same Apply pipeline as every other Haiku action — no new write surface.

Milestone F — Retention map

A small horizontal strip at the top of the Storyline tab. One cell per scene, color-coded:

  • Green (≥70): hold-power
  • Amber (40-69): warn
  • Red (<40): scroll signal

Composes render-review's per-scene scores (avg of visualHook/paceMatch/onBrand) with scroll-test fallback when render-review hasn't run for a scene. Click any cell → smooth-scroll to that scene's card. Hidden until at least one signal lands so it doesn't add noise on a fresh project.

Architecture: same single-track data path

All four milestones funnel into the existing applyPatch(sceneId, patch) → PUT /script/scenes/:id pipeline. The only new write surfaces are the audio file writes (SFX/music — manifest-tracked, soft-deletable). Gemini's outputs are advisory; nothing the model says auto-mutates the script.

Test plan

  • 728 core tests pass (was 718; +9 music-span helpers + 1 net pickup from refactor)
  • 281 studio tests pass
  • Lint, format, typecheck clean across the whole tree
  • Live verify in expanded Storyline: all four panels render (Retention Review CTA · Music Wizard · Director · scene cards), zero console errors
  • Manual: type a vibe in Music Wizard, click Propose ↵, then Generate on one — confirm assets/music/<id>.mp3 lands and the music lane shows the clip
  • Manual: render once via the existing render pipeline, click Run review — confirm Gemini watches, scroll-risk windows surface with timestamps, retention map appears at the top
  • Manual: click 📉 Scroll test on a scene — confirm verdict + score + (when applicable) an applyable patch in the suggestion stack

What's deliberately deferred to a follow-up

  • Milestone D (image analysis) — auto-detect role/vibe/treatment on upload via Gemini. Smaller standalone PR; the rest of the plan doesn't depend on it.
  • Render-blocking quality gates — Gemini stays advisory. Never gates a render.
  • Multi-language SFX prompts — ElevenLabs SFX is English-only currently.

🤖 Generated with Claude Code

Four milestones from #25 in one drop. Closes the retention feedback
loop: write → render → grade → fix.

**Foundation** (`packages/core/src/gemini/`)

- env.ts: GEMINI_API_KEY loader, mirrors anthropic/env.ts.
- client.ts: REST client with uploadFile (resumable Files API) +
  generateStructured (function-tool call). Picked direct fetch over
  @google/genai SDK because the SDK pulls in gRPC + Vertex auth we
  don't need.
- Cost-tracking: kind: "gemini" entries in CostOp. gemini-2.5-flash
  priced at $0.30/$2.50 per M tokens.

**Milestone B — ElevenLabs Music**

- elevenlabs/music.ts: async polled-job pattern (POST returns music_id,
  GET polls until completed, then download signed audio_url).
  Geometric backoff 2/4/8s, max 30s per poll, 5min total.
- script/music/manifest.ts: same shape as SFX with scenesCovered (a
  track spans multiple scenes). resolveMusicSpan computes the
  master-timeline window from covered scenes. 9 unit tests.
- assembler emits <audio data-track-index="2" data-timeline-group="music">
  per entry, including data-music-duck-db so the producer applies a
  sidechain duck during voiceover.
- 4 routes: music-suggest (Haiku) / music-generate (ElevenLabs polled
  job) / GET music / DELETE music/:entryId.
- Frontend: 🎵 Music Wizard panel above Director — vibe textarea →
  Haiku proposes 1-3 tracks → click Generate per track. Applied
  tracks list below with Remove buttons.

**Milestone C — Gemini render review**

- POST /storyline/render-review uploads the most recent .mp4 from
  <project>/renders/ to Gemini Files API, prompts with script meta +
  per-scene timings, returns:
    overallRetentionScore (0-100)
    scrollRiskWindows[] — severity, why, one-sentence fix
    brandConsistency { score, drift[] }
    audioMix { voiceClarity, musicLevels, sfxBalance }
    perScene[] { visualHook, paceMatch, onBrand, note }
- Persisted to .hyperframes/render-reviews/<ts>.json so reload shows
  the last review without re-running. GET /render-review serves it.
- Frontend: Retention Review panel with overall-score chip, scroll-risk
  windows with timestamps, 3-column audio/brand summary.

**Milestone E — Per-scene scroll test**

- POST /storyline/scroll-test samples 3 frames per scene via the new
  adapter.extractVideoFrameToBytes hook, asks Gemini "would they
  scroll?". Returns verdict + sceneStrengthScore + optional concrete
  patch.
- New per-card AI action: 📉 Scroll test. Patches drop into the
  amber suggestion stack, applied via the same pipeline.

**Milestone F — Retention map**

- Horizontal strip at the top of Storyline, one cell per scene,
  color-coded by retention strength (review's 3-dim avg, fallback to
  scrollTest score). Click → smooth-scroll to scene. Hidden until at
  least one signal lands.

Tests: 728 core (+9 music-span), 281 studio. Lint, format, typecheck
clean. Live verify: all four panels render in expanded sidebar mode,
zero console errors.

Plan: #25 (Milestones B + C + E + F shipped; D image analysis is the
remaining piece, smaller follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@cuio cuio merged commit ffd8c41 into main Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant