Releases: MCERQUA/OpenVoiceUI
OpenVoiceUI 2026.5.4
Highlights
Critical cost-leak fix. Every song generation since ~Apr 19 was firing 5-30+ paid Suno API calls instead of 1, due to streaming-text re-parse loops in app.js. Audit confirmed every "successful" song was burning 5-8× the credits silently. PR #294 caps the paid API at exactly 1 call per unique prompt per 60s.
Critical UX fix. The voice agent occasionally responded to users with a bare "NO" or "YES" — both spoken via TTS and shown in the transcript. Two-layer guard in routes/conversation.py now catches it before the user ever hears it. PR #296.
New: Office briefing injection. Voice agents now receive Clerk-driven open follow-ups + open matters when a known person logs in. PR #297.
Release-blocker fix. server.py was importing a JamBot-only song_tagger proxy unconditionally, so any fresh image build would have crashed at Flask startup. Made the import optional. PR #295.
Full changelog
Fixes
- fix(suno): cap paid Suno API at 1 call per unique prompt per 60s window — #294
- fix(conversation): eradicate bare "NO"/"YES" voice-agent leak — #296
- fix(auth): prevent users from reading or burning other users' usage quotas
- fix: suppress status TTS after interim response; guard playTTS against text mode
Features
- feat(office): persist clerk_user_id + extend CURRENT_USER with office briefing — #297
- feat: add Qwen3-local TTS provider + fix desktop first-run seeding
Chore / infrastructure
- chore: bump version to 2026.5.4 — #298
- chore(ovui): bump app.js cache-buster v=24 → v=25 (forces fresh JS load)
- chore(server): make song_tagger import optional + gitignore the file — #295
- chore: hide qwen3-local Voice Studio card when provider is offline
Dependencies (security/maintenance)
- ci(deps): bump actions/upload-pages-artifact 3 → 5 — #286
- ci(deps): bump aquasecurity/trivy-action 0.35.0 → 0.36.0 — #288
- deps(deps): bump groq 1.1.2 → 1.2.0 — #287
- deps(deps): update python-pptx ≥0.6.23 → ≥1.0.2 — #289
- deps(deps): update openpyxl ≥3.1.0 → ≥3.1.5 — #290
Install
- NPM:
npm install -g [email protected] - Pinokio: install or click "Update" on existing instance — pulls this release automatically
- Docker:
docker compose pull && docker compose up -d(rebuilds from latest source) - Native Linux:
git pull && bash setup-sudo.sh(tested with [email protected])
OpenVoiceUI 2026.4.28-1
What's New
Fixes
- Remove robot voice filler during retry cascades — browser SpeechSynthesis was saying "still working on it" in a robotic voice during empty response retries, which felt broken. Visual status indicators already cover the waiting state; silence is correct.
Full Changelog: v2026.4.28...v2026.4.28-1
OpenVoiceUI 2026.4.28
What's New
Features
- Pi Coding CLI Default — openclaw now ships with the Pi coding agent by default, so the coding-agent skill works out of the box without requiring an Anthropic API key
- bump-openclaw-version.sh — single script to atomically update all three installer paths (Pinokio, Docker, native) so the openclaw version never drifts between installs
Fixes
- Remove hardcoded Anthropic model from setup-config.js — openclaw auto-selects from available API keys provided during setup
- Pin openclaw to 2026.3.24 across all installer paths (was inconsistent across docker-compose.yml, Dockerfile, setup-sudo.sh)
- Remove qwen3-local TTS provider from public repo — JamBot-internal GPU test only
- Mark Deepgram as optional in install.js — WebSpeech is the default STT, Deepgram is opt-in
- Fix diagnose.js to not report missing model as an issue when auto-select is in use
Full Changelog: v2026.4.19...v2026.4.28
OpenVoiceUI 2026.4.19
What's New
Session Recovery + Interrupt Handling (PR #284)
A 13-fix cascade addressing context loss, failed interrupts, repeated session poisoning, and MiniMax-M2.7-highspeed returning chat.final with zero text. Validated live over 6+ hours; zero "Sorry, I couldn't process that" terminal failures observed after deploy.
Gateway layer:
- MiniMax empty-final retry —
services/gateways/openclaw.pynow retrieschat.sendonce when a turn completes with no text. Catches ~50% of empties invisibly to the user. openclaw's own failover only triggers on timeout/auth/rate-limit, not on empty-final, so this closes the gap.
Conversation layer (routes/conversation.py):
- Context-preserving recovery prime — on
session_recovery, pulls last 30 turns fromconversation_log(bothsession_id='default'ANDsession_id IS NULL) and injects as a[RECENT CONTEXT — …]prefix into the fresh recovery session so it resumes the conversation instead of starting fresh. - Sticky recovery with timestamped keys —
recovery-<epoch>keys persist for process lifetime; if recovery itself poisons later, a newrecovery-<newepoch>is spun up cleanly. - Steer-during-inference empty recovery —
record_recent_steer/consume_recent_steertrack per-session steers for 30s; if LLM empties right after a steer lands, the steered message is auto-refired as a fresh turn so user corrections aren't lost. - Uncommitted tool-promise auto-continue — detects "I'll build X / let me Y" responses with zero tool use and auto-sends a system follow-up to force tool execution.
- Recovery idle-timeout 10min (was 60s elapsed) with activity-bump on every gateway event so productive multi-tool recovery turns aren't kicked out mid-work.
- Split recovery timestamps —
_recovery_entered_at,_recovery_last_activity_at,_recovery_last_exited_at— cooldown now measures against last exit (10s) so recovery can re-fire immediately whenmainre-poisons. - Removed
time.sleep(1/2)padding from empty-retry and steer-recovery paths. Every second of artificial delay was dead silence for the user.
Classifier (routes/message_classifier.py):
- Scope-refinement steer patterns —
naw,nah,nuh-uh, "X only", "just X", "not Y", "exclude X", "filter out". Prevents scope corrections from being queued as context.
Client (src/app.js):
_textDoneReceivedrace guard — post-text_donemessages abort + fresh path instead of orphan-steer into a closed openclaw turn.- Persistent cascade filler TTS via
SpeechSynthesis— "one moment / still working / almost there / hang tight" progressively during >2s cascades. Auto-cancelled on real TTS. - Silent mic-resume on terminal cascade failure — no more "Sorry, I couldn't process that" polluting the transcript.
- Live thinking indicator — dots stay animated while showing current tool + elapsed seconds; refreshes on every heartbeat between tool_start events.
- Stop double-processing
data.actionsintext_done— actions already stream live, eliminating duplicate tool-call entries in the action panel.
Music Integrations (PR #283)
- SoundCloud + Bandcamp in-player embeds —
playSoundCloud(url)andplayBandcamp(url)on MusicModule. New[SOUNDCLOUD:]/[BANDCAMP:]voice action tags. - External embed switching — cleared cleanly when switching to library playback.
- Spotify stub removed (was non-functional).
CI / Housekeeping
- Dependabot now targets
devinstead ofmain(PRs #281, #282) so dependency PRs land on the integration branch first. - Version bump to
2026.4.19acrosspackage.json+website/package.json.
Related artifacts (MIKE-AI repo)
- Session-monitor patterns extended with 12 new cascade events +
minimax_empty_clusteralert (3+ empty-finals in 5min window). - Full triage playbook at
openclaw-expert/references/session-recovery-and-empty-finals.mdso future debugging starts from the answers.
Full Changelog
OpenVoiceUI 2026.4.13
What's New
Voice / STT
- Deepgram STT reliability bundle — PTT release mute defer + AudioContext idempotence + interim results cancel accumulation timer + endpointing 300 → 500. Fixes PTT capture, mid-sentence cutoffs, and zero-transcript reconnects.
- WebSpeech PTT release mute defer — parallel fix for the fallback STT provider
- Conversation retry preserves session key on fast-empty responses (no more recovery-key context loss)
- Server-side garbage STT filter — lower threshold + NDJSON format so the UI doesn't lock on "thinking" when filtering very short utterances
- Canvas pages context list cap raised from 1000 → 5000 chars so the agent can see all canvas pages instead of the alphabetically first ~60
TTS
- Resemble TTS reliability — shared httpx connection pool, chunk size 500 → 1500, request_id error logging, retry budget 5 → 8
- Suno tag normalization — sloppy whitespace in
[MUSIC_PLAY:...]and[CANVAS:...]tags now gets normalized before display/extract
Auth & API
- Clerk
__sessioncookie now persists for 30 days (Max-Age=2592000) — no more re-auth on browser restart - Auth bypass for
/api/vault/oauth/callback/so external OAuth redirects can land - Canvas page CSP
connect-srcallowsblob:for fetching client-side generated audio/video
Vault & Plugins
- Credential Vault Phase 1.5 — Cycles A/B/C: writes reach running agents, opt-in OAuth pattern, Platform Setup admin page
- Hermes plugin install flow finished — lifecycle hooks, install_config, vault sync via provision service HTTP API
- byterover-memory plugin removed and parked in catalog repo (quarantined draft PR for re-add when stable)
- Twenty CRM and SEO Platform plugins now declare vault credentials
UI / UX
- Action Console verbose tool detail with Hermes gateway shape support — full command/path/file content shown in the panel (transcript stays clean)
- Hermes gateway indicator on initial load and on profile switch — Action Console label now correctly reflects the active gateway
- Mute UX fix — clicking mute mid-call no longer triggers a fake error popup
- Clawdbot → Agent rename in user-facing strings
- Admin Connections "Not enabled by platform" link now navigates to the Platform Setup panel
Docs
- New
.claude/CLAUDE.mdproject context file for contributors with parked-plugin quarantine convention
Plugins (separate repo)
- byterover-memory — removed from catalog and quarantined on
parked/byterover-memorybranch (draft PR #3 holds it open as a visible reminder until stability work lands)
Full Changelog: v2026.4.10...v2026.4.13
OpenVoiceUI 2026.4.10
What's New
Features
- Plugin Config API — gateway plugins can now be configured before install with API keys and provider selection from the admin Plugins panel
- Plugin Settings Panel — post-install "Settings" button for updating gateway plugin configuration
- Plugin Catalog Stubs — all 8 community plugins now have
plugin.json+README.mdin the main repo for dashboard discovery - Custom Faces System — dynamic HTML face pages with an editor, no plugin required
- Page-Icon Meta Tags — canvas pages can declare icons via meta tags, extracted and shown in the page menu
Fixes
- Canvas icon extraction and iframe permissions
- Plugin system — profile isolation, lore deployment, gateway pairing
- Remove fake email from README footer, link to website instead
- Canvas hardening for iframe security
Plugins (separate repo)
- Hermes Agent — overhauled and pinned to
nousresearch/hermes-agent:v0.6.0. Fixed emoji tool markers, added agent profile, complete README for GitHub/Pinokio/npm install. - BHB Animated Characters — synced builder page, added voice samples
- ByteRover Memory — updated description
Full Changelog: v2026.4.7...v2026.4.10
OpenVoiceUI 2026.4.7
What's New
Features
- Remote Plugin Catalog — browse and one-click download community plugins from GitHub
- Conversation Interject — interrupt AI mid-response with new input
- External STT Provider — bring your own speech-to-text transcription API
- Text/Voice Mode Toggle — switch between text chat and voice conversation
- Subagent Visibility — see active sub-agents in the UI
- AI Config Admin Panel — configure AI settings from the admin interface
Fixes
- Plugin system — profile isolation, lore deployment, gateway pairing, restart button
- Canvas auth token bridge variable name fix (Auth → AuthModule)
- Bulk upload throttled to 3 concurrent with auto-retry
- External STT silence delay increased to 1500ms (prevents mid-sentence cutoff)
- Gateway handshake + device pairing now use operator.admin scope
- ByteRover memory made optional (moved to plugin)
Dependencies
- requests 2.33.0 → 2.33.1
- actions/setup-python 5 → 6
- actions/setup-node 4 → 6
- actions/deploy-pages 4 → 5
- trivy-action 0.34.0 → 0.35.0
Full Changelog: v2026.4.2...v2026.4.7
OpenVoiceUI 2026.4.2
What's New
External STT Provider — Bring Your Own Transcription API
Users can now point OpenVoiceUI at any external Whisper-compatible STT service via STT_API_URL. Auto-detects OpenAI-compatible (/v1/audio/transcriptions) and generic Whisper ASR (/asr) formats. Selectable from the admin panel with full VAD + PTT support. (#193, #244)
Admin Panel Improvements
- AI Models & API Keys panel — configure primary/fallback LLM models and manage provider API keys from the admin dashboard
- Subagent visibility — live subagent status exposed in UI
Fixes
- Gateway auth —
operator.adminscope added to device pairing and gateway handshake, prevents NOT_PAIRED errors on reconnect - STT accumulation delay — 1500ms silence threshold before sending transcript, prevents mid-sentence cutoff
- ByteRover memory — made optional, moved to plugin system (no longer a required dependency)
- Text/voice mode toggle — switch between text and voice input modes
- Version display + update detection fix
Dependencies
- cryptography 46.0.5 → 46.0.6
- trivy-action 0.34.0 → 0.35.0
- upload-artifact v4 → v7
OpenVoiceUI 2026.3.31
OpenVoiceUI 2026.3.31
What's New
- Text/Voice mode toggle — Switch between text and voice input modes with persistent selection
- Subagent visibility — Live subagent status tracking in the UI via OpenClaw gateway events
- AI config admin panel — View and manage AI model configuration from the admin interface
- Canvas screenshot — Capture canvas page screenshots programmatically
- Version display & update detection — Desktop shows current version, detects when updates are available
Documentation
- Docusaurus docs site — 23-page documentation site with full API reference (1,239 lines, 90+ endpoints)
- Desktop canvas refactor plan — Architecture plan for improved page navigation and fuzzy matching
Dependencies
- Bump cryptography 46.0.5 → 46.0.6
- Bump actions/upload-artifact v4 → v7
- Bump aquasecurity/trivy-action 0.34.0 → 0.35.0
OpenVoiceUI 2026.3.29-1
OpenVoiceUI 2026.3.29-1
Patch release — includes all 2026.3.29 changes with updated npm publish.
What's New
- GLM-5-turbo upgrade — All LLM references updated from glm-4.7 to glm-5-turbo across providers and install paths
- ByteRover long-term memory — Added ByteRover context engine (brv CLI + clawhub) to openclaw Dockerfile for persistent structured memory across sessions
- Admin panel overhaul — Production-ready admin panel: mobile-responsive layout, all panels functional
- README overhaul — Clean install instructions and updated feature list
Bug Fixes
- Desktop first-run — Seeds ALL pages onto desktop on first load, not just knownPages
- BigHead cleanup — Removed BigHead avatar content from base repo (moved to plugin system)
- PTT mic restore — Toggling Push-to-Talk off no longer leaves mic permanently muted
- Desktop version tag — Added version stamp for automatic desktop update propagation