Releases · MCERQUA/OpenVoiceUI

04 May 01:17

MCERQUA

v2026.5.4

d908777

OpenVoiceUI 2026.5.4 Latest

Latest

Highlights

Critical cost-leak fix. Every song generation since ~Apr 19 was firing 5-30+ paid Suno API calls instead of 1, due to streaming-text re-parse loops in app.js. Audit confirmed every "successful" song was burning 5-8× the credits silently. PR #294 caps the paid API at exactly 1 call per unique prompt per 60s.

Critical UX fix. The voice agent occasionally responded to users with a bare "NO" or "YES" — both spoken via TTS and shown in the transcript. Two-layer guard in routes/conversation.py now catches it before the user ever hears it. PR #296.

New: Office briefing injection. Voice agents now receive Clerk-driven open follow-ups + open matters when a known person logs in. PR #297.

Release-blocker fix. server.py was importing a JamBot-only song_tagger proxy unconditionally, so any fresh image build would have crashed at Flask startup. Made the import optional. PR #295.

Full changelog

Fixes

fix(suno): cap paid Suno API at 1 call per unique prompt per 60s window — #294
fix(conversation): eradicate bare "NO"/"YES" voice-agent leak — #296
fix(auth): prevent users from reading or burning other users' usage quotas
fix: suppress status TTS after interim response; guard playTTS against text mode

Features

feat(office): persist clerk_user_id + extend CURRENT_USER with office briefing — #297
feat: add Qwen3-local TTS provider + fix desktop first-run seeding

Chore / infrastructure

chore: bump version to 2026.5.4 — #298
chore(ovui): bump app.js cache-buster v=24 → v=25 (forces fresh JS load)
chore(server): make song_tagger import optional + gitignore the file — #295
chore: hide qwen3-local Voice Studio card when provider is offline

Dependencies (security/maintenance)

ci(deps): bump actions/upload-pages-artifact 3 → 5 — #286
ci(deps): bump aquasecurity/trivy-action 0.35.0 → 0.36.0 — #288
deps(deps): bump groq 1.1.2 → 1.2.0 — #287
deps(deps): update python-pptx ≥0.6.23 → ≥1.0.2 — #289
deps(deps): update openpyxl ≥3.1.0 → ≥3.1.5 — #290

Install

NPM: npm install -g [email protected]
Pinokio: install or click "Update" on existing instance — pulls this release automatically
Docker: docker compose pull && docker compose up -d (rebuilds from latest source)
Native Linux: git pull && bash setup-sudo.sh (tested with [email protected])

Assets 2

28 Apr 02:22

MCERQUA

v2026.4.28-1

c5175e1

OpenVoiceUI 2026.4.28-1

What's New

Fixes

Remove robot voice filler during retry cascades — browser SpeechSynthesis was saying "still working on it" in a robotic voice during empty response retries, which felt broken. Visual status indicators already cover the waiting state; silence is correct.

Full Changelog: v2026.4.28...v2026.4.28-1

Assets 2

28 Apr 02:10

MCERQUA

v2026.4.28

1b4dacd

OpenVoiceUI 2026.4.28

What's New

Features

Pi Coding CLI Default — openclaw now ships with the Pi coding agent by default, so the coding-agent skill works out of the box without requiring an Anthropic API key
bump-openclaw-version.sh — single script to atomically update all three installer paths (Pinokio, Docker, native) so the openclaw version never drifts between installs

Fixes

Remove hardcoded Anthropic model from setup-config.js — openclaw auto-selects from available API keys provided during setup
Pin openclaw to 2026.3.24 across all installer paths (was inconsistent across docker-compose.yml, Dockerfile, setup-sudo.sh)
Remove qwen3-local TTS provider from public repo — JamBot-internal GPU test only
Mark Deepgram as optional in install.js — WebSpeech is the default STT, Deepgram is opt-in
Fix diagnose.js to not report missing model as an issue when auto-select is in use

Full Changelog: v2026.4.19...v2026.4.28

Assets 2

19 Apr 06:44

MCERQUA

v2026.4.19

e9c86ac

OpenVoiceUI 2026.4.19

What's New

Session Recovery + Interrupt Handling (PR #284)

A 13-fix cascade addressing context loss, failed interrupts, repeated session poisoning, and MiniMax-M2.7-highspeed returning chat.final with zero text. Validated live over 6+ hours; zero "Sorry, I couldn't process that" terminal failures observed after deploy.

Gateway layer:

MiniMax empty-final retry — services/gateways/openclaw.py now retries chat.send once when a turn completes with no text. Catches ~50% of empties invisibly to the user. openclaw's own failover only triggers on timeout/auth/rate-limit, not on empty-final, so this closes the gap.

Conversation layer (routes/conversation.py):

Context-preserving recovery prime — on session_recovery, pulls last 30 turns from conversation_log (both session_id='default' AND session_id IS NULL) and injects as a [RECENT CONTEXT — …] prefix into the fresh recovery session so it resumes the conversation instead of starting fresh.
Sticky recovery with timestamped keys — recovery-<epoch> keys persist for process lifetime; if recovery itself poisons later, a new recovery-<newepoch> is spun up cleanly.
Steer-during-inference empty recovery — record_recent_steer / consume_recent_steer track per-session steers for 30s; if LLM empties right after a steer lands, the steered message is auto-refired as a fresh turn so user corrections aren't lost.
Uncommitted tool-promise auto-continue — detects "I'll build X / let me Y" responses with zero tool use and auto-sends a system follow-up to force tool execution.
Recovery idle-timeout 10min (was 60s elapsed) with activity-bump on every gateway event so productive multi-tool recovery turns aren't kicked out mid-work.
Split recovery timestamps — _recovery_entered_at, _recovery_last_activity_at, _recovery_last_exited_at — cooldown now measures against last exit (10s) so recovery can re-fire immediately when main re-poisons.
Removed time.sleep(1/2) padding from empty-retry and steer-recovery paths. Every second of artificial delay was dead silence for the user.

Classifier (routes/message_classifier.py):

Scope-refinement steer patterns — naw, nah, nuh-uh, "X only", "just X", "not Y", "exclude X", "filter out". Prevents scope corrections from being queued as context.

Client (src/app.js):

_textDoneReceived race guard — post-text_done messages abort + fresh path instead of orphan-steer into a closed openclaw turn.
Persistent cascade filler TTS via SpeechSynthesis — "one moment / still working / almost there / hang tight" progressively during >2s cascades. Auto-cancelled on real TTS.
Silent mic-resume on terminal cascade failure — no more "Sorry, I couldn't process that" polluting the transcript.
Live thinking indicator — dots stay animated while showing current tool + elapsed seconds; refreshes on every heartbeat between tool_start events.
Stop double-processing data.actions in text_done — actions already stream live, eliminating duplicate tool-call entries in the action panel.

Music Integrations (PR #283)

SoundCloud + Bandcamp in-player embeds — playSoundCloud(url) and playBandcamp(url) on MusicModule. New [SOUNDCLOUD:] / [BANDCAMP:] voice action tags.
External embed switching — cleared cleanly when switching to library playback.
Spotify stub removed (was non-functional).

CI / Housekeeping

Dependabot now targets dev instead of main (PRs #281, #282) so dependency PRs land on the integration branch first.
Version bump to 2026.4.19 across package.json + website/package.json.

Related artifacts (MIKE-AI repo)

Session-monitor patterns extended with 12 new cascade events + minimax_empty_cluster alert (3+ empty-finals in 5min window).
Full triage playbook at openclaw-expert/references/session-recovery-and-empty-finals.md so future debugging starts from the answers.

Full Changelog

v2026.4.13...v2026.4.19

Assets 2

13 Apr 06:34

MCERQUA

v2026.4.13

7bb694b

OpenVoiceUI 2026.4.13

What's New

Voice / STT

Deepgram STT reliability bundle — PTT release mute defer + AudioContext idempotence + interim results cancel accumulation timer + endpointing 300 → 500. Fixes PTT capture, mid-sentence cutoffs, and zero-transcript reconnects.
WebSpeech PTT release mute defer — parallel fix for the fallback STT provider
Conversation retry preserves session key on fast-empty responses (no more recovery-key context loss)
Server-side garbage STT filter — lower threshold + NDJSON format so the UI doesn't lock on "thinking" when filtering very short utterances
Canvas pages context list cap raised from 1000 → 5000 chars so the agent can see all canvas pages instead of the alphabetically first ~60

TTS

Resemble TTS reliability — shared httpx connection pool, chunk size 500 → 1500, request_id error logging, retry budget 5 → 8
Suno tag normalization — sloppy whitespace in [MUSIC_PLAY:...] and [CANVAS:...] tags now gets normalized before display/extract

Auth & API

Clerk __session cookie now persists for 30 days (Max-Age=2592000) — no more re-auth on browser restart
Auth bypass for /api/vault/oauth/callback/ so external OAuth redirects can land
Canvas page CSP connect-src allows blob: for fetching client-side generated audio/video

Vault & Plugins

Credential Vault Phase 1.5 — Cycles A/B/C: writes reach running agents, opt-in OAuth pattern, Platform Setup admin page
Hermes plugin install flow finished — lifecycle hooks, install_config, vault sync via provision service HTTP API
byterover-memory plugin removed and parked in catalog repo (quarantined draft PR for re-add when stable)
Twenty CRM and SEO Platform plugins now declare vault credentials

UI / UX

Action Console verbose tool detail with Hermes gateway shape support — full command/path/file content shown in the panel (transcript stays clean)
Hermes gateway indicator on initial load and on profile switch — Action Console label now correctly reflects the active gateway
Mute UX fix — clicking mute mid-call no longer triggers a fake error popup
Clawdbot → Agent rename in user-facing strings
Admin Connections "Not enabled by platform" link now navigates to the Platform Setup panel

Docs

New .claude/CLAUDE.md project context file for contributors with parked-plugin quarantine convention

Plugins (separate repo)

byterover-memory — removed from catalog and quarantined on parked/byterover-memory branch (draft PR #3 holds it open as a visible reminder until stability work lands)

Full Changelog: v2026.4.10...v2026.4.13

Assets 2

10 Apr 02:58

MCERQUA

v2026.4.10

0746dea

OpenVoiceUI 2026.4.10

What's New

Features

Plugin Config API — gateway plugins can now be configured before install with API keys and provider selection from the admin Plugins panel
Plugin Settings Panel — post-install "Settings" button for updating gateway plugin configuration
Plugin Catalog Stubs — all 8 community plugins now have plugin.json + README.md in the main repo for dashboard discovery
Custom Faces System — dynamic HTML face pages with an editor, no plugin required
Page-Icon Meta Tags — canvas pages can declare icons via meta tags, extracted and shown in the page menu

Fixes

Canvas icon extraction and iframe permissions
Plugin system — profile isolation, lore deployment, gateway pairing
Remove fake email from README footer, link to website instead
Canvas hardening for iframe security

Plugins (separate repo)

Hermes Agent — overhauled and pinned to nousresearch/hermes-agent:v0.6.0. Fixed emoji tool markers, added agent profile, complete README for GitHub/Pinokio/npm install.
BHB Animated Characters — synced builder page, added voice samples
ByteRover Memory — updated description

Full Changelog: v2026.4.7...v2026.4.10

Assets 2

07 Apr 06:11

MCERQUA

v2026.4.7

af6904d

OpenVoiceUI 2026.4.7

What's New

Features

Remote Plugin Catalog — browse and one-click download community plugins from GitHub
Conversation Interject — interrupt AI mid-response with new input
External STT Provider — bring your own speech-to-text transcription API
Text/Voice Mode Toggle — switch between text chat and voice conversation
Subagent Visibility — see active sub-agents in the UI
AI Config Admin Panel — configure AI settings from the admin interface

Fixes

Plugin system — profile isolation, lore deployment, gateway pairing, restart button
Canvas auth token bridge variable name fix (Auth → AuthModule)
Bulk upload throttled to 3 concurrent with auto-retry
External STT silence delay increased to 1500ms (prevents mid-sentence cutoff)
Gateway handshake + device pairing now use operator.admin scope
ByteRover memory made optional (moved to plugin)

Dependencies

requests 2.33.0 → 2.33.1
actions/setup-python 5 → 6
actions/setup-node 4 → 6
actions/deploy-pages 4 → 5
trivy-action 0.34.0 → 0.35.0

Full Changelog: v2026.4.2...v2026.4.7

Assets 2

02 Apr 06:34

MCERQUA

v2026.4.2

049ae1b

OpenVoiceUI 2026.4.2

What's New

External STT Provider — Bring Your Own Transcription API

Users can now point OpenVoiceUI at any external Whisper-compatible STT service via STT_API_URL. Auto-detects OpenAI-compatible (/v1/audio/transcriptions) and generic Whisper ASR (/asr) formats. Selectable from the admin panel with full VAD + PTT support. (#193, #244)

Admin Panel Improvements

AI Models & API Keys panel — configure primary/fallback LLM models and manage provider API keys from the admin dashboard
Subagent visibility — live subagent status exposed in UI

Fixes

Gateway auth — operator.admin scope added to device pairing and gateway handshake, prevents NOT_PAIRED errors on reconnect
STT accumulation delay — 1500ms silence threshold before sending transcript, prevents mid-sentence cutoff
ByteRover memory — made optional, moved to plugin system (no longer a required dependency)
Text/voice mode toggle — switch between text and voice input modes
Version display + update detection fix

Dependencies

cryptography 46.0.5 → 46.0.6
trivy-action 0.34.0 → 0.35.0
upload-artifact v4 → v7

Assets 2

31 Mar 17:26

MCERQUA

2026.3.31

f3b0bd7

OpenVoiceUI 2026.3.31

What's New

Text/Voice mode toggle — Switch between text and voice input modes with persistent selection
Subagent visibility — Live subagent status tracking in the UI via OpenClaw gateway events
AI config admin panel — View and manage AI model configuration from the admin interface
Canvas screenshot — Capture canvas page screenshots programmatically
Version display & update detection — Desktop shows current version, detects when updates are available

Documentation

Docusaurus docs site — 23-page documentation site with full API reference (1,239 lines, 90+ endpoints)
Desktop canvas refactor plan — Architecture plan for improved page navigation and fuzzy matching

Dependencies

Bump cryptography 46.0.5 → 46.0.6
Bump actions/upload-artifact v4 → v7
Bump aquasecurity/trivy-action 0.34.0 → 0.35.0

Assets 2

29 Mar 04:37

MCERQUA

2026.3.29-1

c51c496

OpenVoiceUI 2026.3.29-1

Patch release — includes all 2026.3.29 changes with updated npm publish.

What's New

GLM-5-turbo upgrade — All LLM references updated from glm-4.7 to glm-5-turbo across providers and install paths
ByteRover long-term memory — Added ByteRover context engine (brv CLI + clawhub) to openclaw Dockerfile for persistent structured memory across sessions
Admin panel overhaul — Production-ready admin panel: mobile-responsive layout, all panels functional
README overhaul — Clean install instructions and updated feature list

Bug Fixes

Desktop first-run — Seeds ALL pages onto desktop on first load, not just knownPages
BigHead cleanup — Removed BigHead avatar content from base repo (moved to plugin system)
PTT mic restore — Toggling Push-to-Talk off no longer leaves mic permanently muted
Desktop version tag — Added version stamp for automatic desktop update propagation

Assets 2

Releases: MCERQUA/OpenVoiceUI

OpenVoiceUI 2026.5.4

Highlights

Full changelog

Fixes

Features

Chore / infrastructure

Dependencies (security/maintenance)

Install

Uh oh!

OpenVoiceUI 2026.4.28-1

What's New

Fixes

Uh oh!

OpenVoiceUI 2026.4.28

What's New

Features

Fixes

Uh oh!

OpenVoiceUI 2026.4.19

What's New

Session Recovery + Interrupt Handling (PR #284)

Music Integrations (PR #283)

CI / Housekeeping

Related artifacts (MIKE-AI repo)

Full Changelog

Uh oh!

OpenVoiceUI 2026.4.13

What's New

Voice / STT

TTS

Auth & API

Vault & Plugins

UI / UX

Docs

Plugins (separate repo)

Uh oh!

OpenVoiceUI 2026.4.10

What's New

Features

Fixes

Plugins (separate repo)

Uh oh!

OpenVoiceUI 2026.4.7

What's New

Features

Fixes

Dependencies

Uh oh!

OpenVoiceUI 2026.4.2

What's New

External STT Provider — Bring Your Own Transcription API

Admin Panel Improvements

Fixes

Dependencies

Uh oh!

OpenVoiceUI 2026.3.31

OpenVoiceUI 2026.3.31

What's New

Documentation

Dependencies

Uh oh!

OpenVoiceUI 2026.3.29-1

OpenVoiceUI 2026.3.29-1

What's New

Bug Fixes

Uh oh!