feat(narration): add FishAudio as a selectable TTS provider#54
feat(narration): add FishAudio as a selectable TTS provider#54fancyboi999 wants to merge 2 commits into
Conversation
Adds FishAudio alongside MiniMax as a narration (voiceover) backend, selectable per workspace in Studio → Settings → Audio. Background music stays MiniMax-only (FishAudio has no music generation). - core: new fishaudio.ts (resolveFishAudioCredentials / generateFishTts / listFishVoices) + shared TtsAudioResult type + 16 unit tests - config: MediaConfigStore gains fishaudio creds + narrationProvider; the speech model is env-controlled (FISH_AUDIO_MODEL, default s1) - server: narration routes by provider; new /api/config/fishaudio, /api/config/narration-provider, /api/fishaudio/voices (key stays server-side) - studio UI: provider toggle + FishAudio key pane (no region); searchable voice picker (reference_id) with sample preview; en + zh strings Verified end-to-end against the real FishAudio API. MiniMax path unchanged. See research/2026-06-15-spec-10-fishaudio-tts-provider.md (RFC-10).
|
Hey @fancyboi999! 👋 Nice work on the FishAudio integration — the design doc + comprehensive testing (16 unit tests + real API verification + E2E export validation) gives a clear picture of what's changing. The provider abstraction looks clean, and keeping the server-side key handling is the right security move. I've assigned this to @mrcfps for code review. A few things that look promising from the structure:
Related context: #45 discusses provider selection patterns, and #38 (SenseAudio) has overlapping config/UI seams worth checking for consistency. Once @mrcfps reviews, we'll move this forward. Let me know if anything's unclear! 💡 To drive this PR to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …): |
Closes #53
What
Adds FishAudio alongside MiniMax as a narration (voiceover) backend, selectable per workspace in Studio → Settings → Audio. Background music stays MiniMax-only (FishAudio has no music generation).
Why
We use FishAudio TTS heavily (custom
reference_idvoices), but the studio's narration pipeline was MiniMax-only. See #53 for motivation.Changes
fishaudio.ts—resolveFishAudioCredentials/generateFishTts/listFishVoices— plus a sharedTtsAudioResulttype and 16 unit tests.MediaConfigStoregains FishAudio creds + anarrationProvidersetting; the speech model is env-controlled (FISH_AUDIO_MODEL, defaults1)./api/config/fishaudio,/api/config/narration-provider,/api/fishaudio/voices(the key stays server-side — the browser never sees it).reference_id) with sample preview; en + zh strings.Design doc:
research/2026-06-15-spec-10-fishaudio-tts-provider.md(RFC-10).Provider differences absorbed
modelheader (env-driven)voice_idreference_id(searchable)Screenshots
Verification (real, not mocks)
pnpm --filter @html-video/core test→ 16/16.generateFishTts→ real MP3 (ffprobe-valid);listFishVoices→ account models.ffprobeshowsh264video +aacaudio, andffmpeg -f null -decodes clean.Notes
adapter-hyperframestestscript points at a missingtest/dir (unrelated to this change; present onmain).