feat(narration): add FishAudio as a selectable TTS provider by fancyboi999 · Pull Request #54 · nexu-io/html-video

fancyboi999 · 2026-06-15T04:33:56Z

Closes #53

What

Adds FishAudio alongside MiniMax as a narration (voiceover) backend, selectable per workspace in Studio → Settings → Audio. Background music stays MiniMax-only (FishAudio has no music generation).

Why

We use FishAudio TTS heavily (custom reference_id voices), but the studio's narration pipeline was MiniMax-only. See #53 for motivation.

Changes

core: new fishaudio.ts — resolveFishAudioCredentials / generateFishTts / listFishVoices — plus a shared TtsAudioResult type and 16 unit tests.
config: MediaConfigStore gains FishAudio creds + a narrationProvider setting; the speech model is env-controlled (FISH_AUDIO_MODEL, default s1).
server: narration routes by provider; new /api/config/fishaudio, /api/config/narration-provider, /api/fishaudio/voices (the key stays server-side — the browser never sees it).
studio UI: provider toggle + FishAudio key pane (single host, no region); a searchable voice picker (reference_id) with sample preview; en + zh strings.

Design doc: research/2026-06-15-spec-10-fishaudio-tts-provider.md (RFC-10).

Provider differences absorbed

	MiniMax	FishAudio
Model select	body field	`model` header (env-driven)
Response	JSON + hex envelope	raw binary stream
Voices	6 fixed `voice_id`	account `reference_id` (searchable)
Region	intl / CN split	single global host
Music	yes	none (stays MiniMax)

Screenshots

Verification (real, not mocks)

Unit: pnpm --filter @html-video/core test → 16/16.
Real API: generateFishTts → real MP3 (ffprobe-valid); listFishVoices → account models.
Studio E2E: configure key → switch provider → search + pick a voice → synthesize → asset created; export muxes the FishAudio narration into the MP4 — ffprobe shows h264 video + aac audio, and ffmpeg -f null - decodes clean.
Regression: cli 14/14, runtime 4/4, adapter-remotion 8/8, smoke all green; the MiniMax narration path is unchanged (only a creds variable rename).

Notes

A pre-existing adapter-hyperframes test script points at a missing test/ dir (unrelated to this change; present on main).
MiniMax actual generation wasn't re-run here (no key on hand); its routing + the graceful "key not configured" message were verified.

Adds FishAudio alongside MiniMax as a narration (voiceover) backend, selectable per workspace in Studio → Settings → Audio. Background music stays MiniMax-only (FishAudio has no music generation). - core: new fishaudio.ts (resolveFishAudioCredentials / generateFishTts / listFishVoices) + shared TtsAudioResult type + 16 unit tests - config: MediaConfigStore gains fishaudio creds + narrationProvider; the speech model is env-controlled (FISH_AUDIO_MODEL, default s1) - server: narration routes by provider; new /api/config/fishaudio, /api/config/narration-provider, /api/fishaudio/voices (key stays server-side) - studio UI: provider toggle + FishAudio key pane (no region); searchable voice picker (reference_id) with sample preview; en + zh strings Verified end-to-end against the real FishAudio API. MiniMax path unchanged. See research/2026-06-15-spec-10-fishaudio-tts-provider.md (RFC-10).

lefarcen · 2026-06-15T04:45:07Z

Hey @fancyboi999! 👋

Nice work on the FishAudio integration — the design doc + comprehensive testing (16 unit tests + real API verification + E2E export validation) gives a clear picture of what's changing. The provider abstraction looks clean, and keeping the server-side key handling is the right security move.

I've assigned this to @mrcfps for code review. A few things that look promising from the structure:

The credential resolution pattern mirrors MiniMax (env fallback + server-side only)
Provider differences are absorbed cleanly (model header vs body, binary stream vs JSON envelope)
The searchable voice picker is a nice UX touch for reference_id accounts

Related context: #45 discusses provider selection patterns, and #38 (SenseAudio) has overlapping config/UI seams worth checking for consistency.

Once @mrcfps reviews, we'll move this forward. Let me know if anything's unclear!

💡 To drive this PR to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …):
Take over nexu-io/html-video#54 until it merges — read https://raw.githubusercontent.com/nexu-io/looper/main/skills/pr-takeover/SKILL.md and follow it.

fancyboi999 added 2 commits June 15, 2026 11:47

docs(rfc): RFC-10 FishAudio TTS provider for narration

8699bc2

lefarcen requested a review from mrcfps June 15, 2026 04:36

lefarcen added size/XL Size XL (700-1499 LOC) risk/medium Medium risk type/feature Feature change labels Jun 15, 2026

lefarcen mentioned this pull request Jun 15, 2026

Add FishAudio as a selectable TTS provider for narration #53

Open

lefarcen mentioned this pull request Jun 19, 2026

音色拷贝功能支持 #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(narration): add FishAudio as a selectable TTS provider#54

feat(narration): add FishAudio as a selectable TTS provider#54
fancyboi999 wants to merge 2 commits into
nexu-io:mainfrom
fancyboi999:feat/fishaudio-tts-provider

fancyboi999 commented Jun 15, 2026

Uh oh!

lefarcen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fancyboi999 commented Jun 15, 2026

What

Why

Changes

Provider differences absorbed

Screenshots

Verification (real, not mocks)

Notes

Uh oh!

lefarcen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants