Added
-
Added new
AudioContextWordTTSService
. This is a TTS base class for TTS services that handling multiple separate audio requests. -
Added new frames
EmulateUserStartedSpeakingFrame
andEmulateUserStoppedSpeakingFrame
which can be used to emulated VAD behavior without VAD being present or not being triggered. -
Added a new
audio_in_stream_on_start
field toTransportParams
. -
Added a new method
start_audio_in_streaming
in theBaseInputTransport
.- This method should be used to start receiving the input audio in case the field
audio_in_stream_on_start
is set tofalse
.
- This method should be used to start receiving the input audio in case the field
-
Added support for the
RTVIProcessor
to handle buffered audio inbase64
format, converting it into InputAudioRawFrame for transport. -
Added support for the
RTVIProcessor
to triggerstart_audio_in_streaming
only after theclient-ready
message. -
Added new
MUTE_UNTIL_FIRST_BOT_COMPLETE
strategy toSTTMuteStrategy
. This strategy starts muted and remains muted until the first bot speech completes, ensuring the bot's first response cannot be interrupted. This complements the existingFIRST_SPEECH
strategy which only mutes during the first detected bot speech. -
Added support for Google Cloud Speech-to-Text V2 through
GoogleSTTService
. -
Added
RimeTTSService
, a newWordTTSService
. Updated the foundational example07q-interruptible-rime.py
to useRimeTTSService
. -
Added support for Groq's Whisper API through the new
GroqSTTService
and OpenAI's Whisper API through the newOpenAISTTService
. Introduced a new base classBaseWhisperSTTService
to handle common Whisper API functionality. -
Added
PerplexityLLMService
for Perplexity NIM API integration, with an OpenAI-compatible interface. Also, added foundational example14n-function-calling-perplexity.py
. -
Added
DailyTransport.update_remote_participants()
. This allows you to update remote participant's settings, like their permissions or which of their devices are enabled. Requires that the local participant have participant admin permission.
Changed
-
We don't consider a colon
:
and end of sentence any more. -
Updated
DailyTransport
to respect theaudio_in_stream_on_start
field, ensuring it only starts receiving the audio input if it is enabled. -
Updated
FastAPIWebsocketOutputTransport
to sendTransportMessageFrame
andTransportMessageUrgentFrame
to the serializer. -
Updated
WebsocketServerOutputTransport
to sendTransportMessageFrame
andTransportMessageUrgentFrame
to the serializer. -
Enhanced
STTMuteConfig
to validate strategy combinations, preventingMUTE_UNTIL_FIRST_BOT_COMPLETE
andFIRST_SPEECH
from being used together as they handle first bot speech differently. -
Updated foundational example
07n-interruptible-google.py
to use all Google services. -
RimeHttpTTSService
now uses themistv2
model by default. -
Improved error handling in
AzureTTSService
to properly detect and log synthesis cancellation errors. -
Enhanced
WhisperSTTService
with full language support and improved model documentation. -
Updated foundation example
14f-function-calling-groq.py
to useGroqSTTService
for transcription. -
Updated
GroqLLMService
to usellama-3.3-70b-versatile
as the default model. -
RTVIObserver
doesn't handleLLMSearchResponseFrame
frames anymore. For now, to handle those frames you need to create aGoogleRTVIObserver
instead.
Deprecated
-
STTMuteFilter
constructor'sstt_service
parameter is now deprecated and will be removed in a future version. The filter now manages mute state internally instead of querying the STT service. -
RTVI.observer()
is now deprecated, instantiate anRTVIObserver
directly instead. -
All RTVI frame processors (e.g.
RTVISpeakingProcessor
,RTVIBotLLMProcessor
) are now deprecated, instantiate anRTVIObserver
instead.
Fixed
-
Fixed a
FalImageGenService
issue that was causing the event loop to be blocked while loading the downloadded image. -
Fixed a
CartesiaTTSService
service issue that would cause audio overlapping in some cases. -
Fixed a websocket-based service issue (e.g.
CartesiaTTSService
) that was preventing a reconnection after the server disconnected cleanly, which was causing an inifite loop instead. -
Fixed a
BaseOutputTransport
issue that was causing upstream frames to no be pushed upstream. -
Fixed multiple issue where user transcriptions where not being handled properly. It was possible for short utterances to not trigger VAD which would cause user transcriptions to be ignored. It was also possible for one or more transcriptions to be generated after VAD in which case they would also be ignored.
-
Fixed an issue that was causing
BotStoppedSpeakingFrame
to be generated too late. This could then cause issues unblockingSTTMuteFilter
later than desired. -
Fixed an issue that was causing
AudioBufferProcessor
to not record synchronized audio. -
Fixed an
RTVI
issue that was causingbot-tts-text
messages to be sent before being processed by the output transport. -
Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect the websocket connection even when the connection is already closed.
-
Fixed an issue where
has_regular_messages
condition was always true inGoogleLLMContext
due toPart
havingfunction_call
&function_response
withNone
values.
Other
-
Added new
instant-voice
example. This example showcases how to enable instant voice communication as soon as a user connects. -
Added new
local-input-select-stt
example. This examples allows you to play with local audio inputs by slecting them through a nice text interface.