Skip to content

Releases: MCERQUA/OpenVoiceUI

OpenVoiceUI 2026.5.4

04 May 01:17
d908777

Choose a tag to compare

Highlights

Critical cost-leak fix. Every song generation since ~Apr 19 was firing 5-30+ paid Suno API calls instead of 1, due to streaming-text re-parse loops in app.js. Audit confirmed every "successful" song was burning 5-8× the credits silently. PR #294 caps the paid API at exactly 1 call per unique prompt per 60s.

Critical UX fix. The voice agent occasionally responded to users with a bare "NO" or "YES" — both spoken via TTS and shown in the transcript. Two-layer guard in routes/conversation.py now catches it before the user ever hears it. PR #296.

New: Office briefing injection. Voice agents now receive Clerk-driven open follow-ups + open matters when a known person logs in. PR #297.

Release-blocker fix. server.py was importing a JamBot-only song_tagger proxy unconditionally, so any fresh image build would have crashed at Flask startup. Made the import optional. PR #295.

Full changelog

Fixes

  • fix(suno): cap paid Suno API at 1 call per unique prompt per 60s window — #294
  • fix(conversation): eradicate bare "NO"/"YES" voice-agent leak — #296
  • fix(auth): prevent users from reading or burning other users' usage quotas
  • fix: suppress status TTS after interim response; guard playTTS against text mode

Features

  • feat(office): persist clerk_user_id + extend CURRENT_USER with office briefing — #297
  • feat: add Qwen3-local TTS provider + fix desktop first-run seeding

Chore / infrastructure

  • chore: bump version to 2026.5.4 — #298
  • chore(ovui): bump app.js cache-buster v=24 → v=25 (forces fresh JS load)
  • chore(server): make song_tagger import optional + gitignore the file — #295
  • chore: hide qwen3-local Voice Studio card when provider is offline

Dependencies (security/maintenance)

  • ci(deps): bump actions/upload-pages-artifact 3 → 5 — #286
  • ci(deps): bump aquasecurity/trivy-action 0.35.0 → 0.36.0 — #288
  • deps(deps): bump groq 1.1.2 → 1.2.0 — #287
  • deps(deps): update python-pptx ≥0.6.23 → ≥1.0.2 — #289
  • deps(deps): update openpyxl ≥3.1.0 → ≥3.1.5 — #290

Install

  • NPM: npm install -g [email protected]
  • Pinokio: install or click "Update" on existing instance — pulls this release automatically
  • Docker: docker compose pull && docker compose up -d (rebuilds from latest source)
  • Native Linux: git pull && bash setup-sudo.sh (tested with [email protected])

OpenVoiceUI 2026.4.28-1

28 Apr 02:22
c5175e1

Choose a tag to compare

What's New

Fixes

  • Remove robot voice filler during retry cascades — browser SpeechSynthesis was saying "still working on it" in a robotic voice during empty response retries, which felt broken. Visual status indicators already cover the waiting state; silence is correct.

Full Changelog: v2026.4.28...v2026.4.28-1

OpenVoiceUI 2026.4.28

28 Apr 02:10
1b4dacd

Choose a tag to compare

What's New

Features

  • Pi Coding CLI Default — openclaw now ships with the Pi coding agent by default, so the coding-agent skill works out of the box without requiring an Anthropic API key
  • bump-openclaw-version.sh — single script to atomically update all three installer paths (Pinokio, Docker, native) so the openclaw version never drifts between installs

Fixes

  • Remove hardcoded Anthropic model from setup-config.js — openclaw auto-selects from available API keys provided during setup
  • Pin openclaw to 2026.3.24 across all installer paths (was inconsistent across docker-compose.yml, Dockerfile, setup-sudo.sh)
  • Remove qwen3-local TTS provider from public repo — JamBot-internal GPU test only
  • Mark Deepgram as optional in install.js — WebSpeech is the default STT, Deepgram is opt-in
  • Fix diagnose.js to not report missing model as an issue when auto-select is in use

Full Changelog: v2026.4.19...v2026.4.28

OpenVoiceUI 2026.4.19

19 Apr 06:44
e9c86ac

Choose a tag to compare

What's New

Session Recovery + Interrupt Handling (PR #284)

A 13-fix cascade addressing context loss, failed interrupts, repeated session poisoning, and MiniMax-M2.7-highspeed returning chat.final with zero text. Validated live over 6+ hours; zero "Sorry, I couldn't process that" terminal failures observed after deploy.

Gateway layer:

  • MiniMax empty-final retryservices/gateways/openclaw.py now retries chat.send once when a turn completes with no text. Catches ~50% of empties invisibly to the user. openclaw's own failover only triggers on timeout/auth/rate-limit, not on empty-final, so this closes the gap.

Conversation layer (routes/conversation.py):

  • Context-preserving recovery prime — on session_recovery, pulls last 30 turns from conversation_log (both session_id='default' AND session_id IS NULL) and injects as a [RECENT CONTEXT — …] prefix into the fresh recovery session so it resumes the conversation instead of starting fresh.
  • Sticky recovery with timestamped keysrecovery-<epoch> keys persist for process lifetime; if recovery itself poisons later, a new recovery-<newepoch> is spun up cleanly.
  • Steer-during-inference empty recoveryrecord_recent_steer / consume_recent_steer track per-session steers for 30s; if LLM empties right after a steer lands, the steered message is auto-refired as a fresh turn so user corrections aren't lost.
  • Uncommitted tool-promise auto-continue — detects "I'll build X / let me Y" responses with zero tool use and auto-sends a system follow-up to force tool execution.
  • Recovery idle-timeout 10min (was 60s elapsed) with activity-bump on every gateway event so productive multi-tool recovery turns aren't kicked out mid-work.
  • Split recovery timestamps_recovery_entered_at, _recovery_last_activity_at, _recovery_last_exited_at — cooldown now measures against last exit (10s) so recovery can re-fire immediately when main re-poisons.
  • Removed time.sleep(1/2) padding from empty-retry and steer-recovery paths. Every second of artificial delay was dead silence for the user.

Classifier (routes/message_classifier.py):

  • Scope-refinement steer patternsnaw, nah, nuh-uh, "X only", "just X", "not Y", "exclude X", "filter out". Prevents scope corrections from being queued as context.

Client (src/app.js):

  • _textDoneReceived race guard — post-text_done messages abort + fresh path instead of orphan-steer into a closed openclaw turn.
  • Persistent cascade filler TTS via SpeechSynthesis — "one moment / still working / almost there / hang tight" progressively during >2s cascades. Auto-cancelled on real TTS.
  • Silent mic-resume on terminal cascade failure — no more "Sorry, I couldn't process that" polluting the transcript.
  • Live thinking indicator — dots stay animated while showing current tool + elapsed seconds; refreshes on every heartbeat between tool_start events.
  • Stop double-processing data.actions in text_done — actions already stream live, eliminating duplicate tool-call entries in the action panel.

Music Integrations (PR #283)

  • SoundCloud + Bandcamp in-player embedsplaySoundCloud(url) and playBandcamp(url) on MusicModule. New [SOUNDCLOUD:] / [BANDCAMP:] voice action tags.
  • External embed switching — cleared cleanly when switching to library playback.
  • Spotify stub removed (was non-functional).

CI / Housekeeping

  • Dependabot now targets dev instead of main (PRs #281, #282) so dependency PRs land on the integration branch first.
  • Version bump to 2026.4.19 across package.json + website/package.json.

Related artifacts (MIKE-AI repo)

  • Session-monitor patterns extended with 12 new cascade events + minimax_empty_cluster alert (3+ empty-finals in 5min window).
  • Full triage playbook at openclaw-expert/references/session-recovery-and-empty-finals.md so future debugging starts from the answers.

Full Changelog

v2026.4.13...v2026.4.19

OpenVoiceUI 2026.4.13

13 Apr 06:34
7bb694b

Choose a tag to compare

What's New

Voice / STT

  • Deepgram STT reliability bundle — PTT release mute defer + AudioContext idempotence + interim results cancel accumulation timer + endpointing 300 → 500. Fixes PTT capture, mid-sentence cutoffs, and zero-transcript reconnects.
  • WebSpeech PTT release mute defer — parallel fix for the fallback STT provider
  • Conversation retry preserves session key on fast-empty responses (no more recovery-key context loss)
  • Server-side garbage STT filter — lower threshold + NDJSON format so the UI doesn't lock on "thinking" when filtering very short utterances
  • Canvas pages context list cap raised from 1000 → 5000 chars so the agent can see all canvas pages instead of the alphabetically first ~60

TTS

  • Resemble TTS reliability — shared httpx connection pool, chunk size 500 → 1500, request_id error logging, retry budget 5 → 8
  • Suno tag normalization — sloppy whitespace in [MUSIC_PLAY:...] and [CANVAS:...] tags now gets normalized before display/extract

Auth & API

  • Clerk __session cookie now persists for 30 days (Max-Age=2592000) — no more re-auth on browser restart
  • Auth bypass for /api/vault/oauth/callback/ so external OAuth redirects can land
  • Canvas page CSP connect-src allows blob: for fetching client-side generated audio/video

Vault & Plugins

  • Credential Vault Phase 1.5 — Cycles A/B/C: writes reach running agents, opt-in OAuth pattern, Platform Setup admin page
  • Hermes plugin install flow finished — lifecycle hooks, install_config, vault sync via provision service HTTP API
  • byterover-memory plugin removed and parked in catalog repo (quarantined draft PR for re-add when stable)
  • Twenty CRM and SEO Platform plugins now declare vault credentials

UI / UX

  • Action Console verbose tool detail with Hermes gateway shape support — full command/path/file content shown in the panel (transcript stays clean)
  • Hermes gateway indicator on initial load and on profile switch — Action Console label now correctly reflects the active gateway
  • Mute UX fix — clicking mute mid-call no longer triggers a fake error popup
  • Clawdbot → Agent rename in user-facing strings
  • Admin Connections "Not enabled by platform" link now navigates to the Platform Setup panel

Docs

  • New .claude/CLAUDE.md project context file for contributors with parked-plugin quarantine convention

Plugins (separate repo)

  • byterover-memory — removed from catalog and quarantined on parked/byterover-memory branch (draft PR #3 holds it open as a visible reminder until stability work lands)

Full Changelog: v2026.4.10...v2026.4.13

OpenVoiceUI 2026.4.10

10 Apr 02:58
0746dea

Choose a tag to compare

What's New

Features

  • Plugin Config API — gateway plugins can now be configured before install with API keys and provider selection from the admin Plugins panel
  • Plugin Settings Panel — post-install "Settings" button for updating gateway plugin configuration
  • Plugin Catalog Stubs — all 8 community plugins now have plugin.json + README.md in the main repo for dashboard discovery
  • Custom Faces System — dynamic HTML face pages with an editor, no plugin required
  • Page-Icon Meta Tags — canvas pages can declare icons via meta tags, extracted and shown in the page menu

Fixes

  • Canvas icon extraction and iframe permissions
  • Plugin system — profile isolation, lore deployment, gateway pairing
  • Remove fake email from README footer, link to website instead
  • Canvas hardening for iframe security

Plugins (separate repo)

  • Hermes Agent — overhauled and pinned to nousresearch/hermes-agent:v0.6.0. Fixed emoji tool markers, added agent profile, complete README for GitHub/Pinokio/npm install.
  • BHB Animated Characters — synced builder page, added voice samples
  • ByteRover Memory — updated description

Full Changelog: v2026.4.7...v2026.4.10

OpenVoiceUI 2026.4.7

07 Apr 06:11
af6904d

Choose a tag to compare

What's New

Features

  • Remote Plugin Catalog — browse and one-click download community plugins from GitHub
  • Conversation Interject — interrupt AI mid-response with new input
  • External STT Provider — bring your own speech-to-text transcription API
  • Text/Voice Mode Toggle — switch between text chat and voice conversation
  • Subagent Visibility — see active sub-agents in the UI
  • AI Config Admin Panel — configure AI settings from the admin interface

Fixes

  • Plugin system — profile isolation, lore deployment, gateway pairing, restart button
  • Canvas auth token bridge variable name fix (Auth → AuthModule)
  • Bulk upload throttled to 3 concurrent with auto-retry
  • External STT silence delay increased to 1500ms (prevents mid-sentence cutoff)
  • Gateway handshake + device pairing now use operator.admin scope
  • ByteRover memory made optional (moved to plugin)

Dependencies

  • requests 2.33.0 → 2.33.1
  • actions/setup-python 5 → 6
  • actions/setup-node 4 → 6
  • actions/deploy-pages 4 → 5
  • trivy-action 0.34.0 → 0.35.0

Full Changelog: v2026.4.2...v2026.4.7

OpenVoiceUI 2026.4.2

02 Apr 06:34
049ae1b

Choose a tag to compare

What's New

External STT Provider — Bring Your Own Transcription API

Users can now point OpenVoiceUI at any external Whisper-compatible STT service via STT_API_URL. Auto-detects OpenAI-compatible (/v1/audio/transcriptions) and generic Whisper ASR (/asr) formats. Selectable from the admin panel with full VAD + PTT support. (#193, #244)

Admin Panel Improvements

  • AI Models & API Keys panel — configure primary/fallback LLM models and manage provider API keys from the admin dashboard
  • Subagent visibility — live subagent status exposed in UI

Fixes

  • Gateway authoperator.admin scope added to device pairing and gateway handshake, prevents NOT_PAIRED errors on reconnect
  • STT accumulation delay — 1500ms silence threshold before sending transcript, prevents mid-sentence cutoff
  • ByteRover memory — made optional, moved to plugin system (no longer a required dependency)
  • Text/voice mode toggle — switch between text and voice input modes
  • Version display + update detection fix

Dependencies

  • cryptography 46.0.5 → 46.0.6
  • trivy-action 0.34.0 → 0.35.0
  • upload-artifact v4 → v7

OpenVoiceUI 2026.3.31

31 Mar 17:26

Choose a tag to compare

OpenVoiceUI 2026.3.31

What's New

  • Text/Voice mode toggle — Switch between text and voice input modes with persistent selection
  • Subagent visibility — Live subagent status tracking in the UI via OpenClaw gateway events
  • AI config admin panel — View and manage AI model configuration from the admin interface
  • Canvas screenshot — Capture canvas page screenshots programmatically
  • Version display & update detection — Desktop shows current version, detects when updates are available

Documentation

  • Docusaurus docs site — 23-page documentation site with full API reference (1,239 lines, 90+ endpoints)
  • Desktop canvas refactor plan — Architecture plan for improved page navigation and fuzzy matching

Dependencies

  • Bump cryptography 46.0.5 → 46.0.6
  • Bump actions/upload-artifact v4 → v7
  • Bump aquasecurity/trivy-action 0.34.0 → 0.35.0

OpenVoiceUI 2026.3.29-1

29 Mar 04:37

Choose a tag to compare

OpenVoiceUI 2026.3.29-1

Patch release — includes all 2026.3.29 changes with updated npm publish.

What's New

  • GLM-5-turbo upgrade — All LLM references updated from glm-4.7 to glm-5-turbo across providers and install paths
  • ByteRover long-term memory — Added ByteRover context engine (brv CLI + clawhub) to openclaw Dockerfile for persistent structured memory across sessions
  • Admin panel overhaul — Production-ready admin panel: mobile-responsive layout, all panels functional
  • README overhaul — Clean install instructions and updated feature list

Bug Fixes

  • Desktop first-run — Seeds ALL pages onto desktop on first load, not just knownPages
  • BigHead cleanup — Removed BigHead avatar content from base repo (moved to plugin system)
  • PTT mic restore — Toggling Push-to-Talk off no longer leaves mic permanently muted
  • Desktop version tag — Added version stamp for automatic desktop update propagation