v0.0.42
Added
-
SentryMetrics
has been added to report frame processor metrics to Sentry. This is now possible becauseFrameProcessorMetrics
can now be passed toFrameProcessor
. -
Added Google TTS service and corresponding foundational example
07n-interruptible-google.py
-
Added AWS Polly TTS support and
07m-interruptible-aws.py
as an example. -
Added InputParams to Azure TTS service.
-
Added
LivekitTransport
(audio-only for now). -
RTVI 0.2.0 is now supported.
-
All
FrameProcessors
can now register event handlers.
tts = SomeTTSService(...)
@tts.event_handler("on_connected"):
async def on_connected(processor):
...
-
Added
AsyncGeneratorProcessor
. This processor can be used together with aFrameSerializer
as an async generator. It provides agenerator()
function that returns anAsyncGenerator
and that yields serialized frames. -
Added
EndTaskFrame
andCancelTaskFrame
. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. -
Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.
-
Added
sample_rate
as a constructor parameter for TTS services. -
Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames. To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task. In this version all the frame processors have their own task to push frames. That is, when
push_frame()
is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. -
Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional
pts
field (prensentation timestamp). There's currently just one clock implementationSystemClock
and thepts
field is currently only used forTextFrame
s (audio and image frames will be next). -
A clock can now be specified to
PipelineTask
(defaults toSystemClock
). This clock will be passed to each frame processor via theStartFrame
. -
Added
CartesiaHttpTTSService
. -
DailyTransport
now supports setting the audio bitrate to improve audio quality through theDailyParams.audio_out_bitrate
parameter. The new default is 96kbps. -
DailyTransport
now uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. -
Interruptions support has been added to
TwilioFrameSerializer
when usingFastAPIWebsocketTransport
. -
Added new
LmntTTSService
text-to-speech service. (see https://www.lmnt.com/) -
Added
TTSModelUpdateFrame
,TTSLanguageUpdateFrame
,STTModelUpdateFrame
, andSTTLanguageUpdateFrame
frames to allow you to switch models, language and voices in TTS and STT services. -
Added new
transcriptions.Language
enum.
Changed
-
Context frames are now pushed downstream from assistant context aggregators.
-
Removed Silero VAD torch dependency.
-
Updated individual update settings frame classes into a single
ServiceUpdateSettingsFrame
class. -
We now distinguish between input and output audio and image frames. We introduce
InputAudioRawFrame
,OutputAudioRawFrame
,InputImageRawFrame
andOutputImageRawFrame
(and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. -
ParallelTask
has been renamed toSyncParallelPipeline
. ASyncParallelPipeline
is a frame processor that contains a list of different pipelines to be executed concurrently. The difference between aSyncParallelPipeline
and aParallelPipeline
is that, given an input frame, theSyncParallelPipeline
will wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). -
StartFrame
is back a system frame to make sure it's processed immediately by all processors.EndFrame
stays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. -
Updated
MoondreamService
revision to2024-08-26
. -
CartesiaTTSService
andElevenLabsTTSService
now add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. -
DailyTransport.on_joined
event now returns the full session data instead of just the participant. -
CartesiaTTSService
is now a subclass ofTTSService
. -
DeepgramSTTService
is now a subclass ofSTTService
. -
WhisperSTTService
is now a subclass ofSegmentedSTTService
. ASegmentedSTTService
is aSTTService
where the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.
Fixed
-
Fixed OpenAI multiple function calls.
-
Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.
-
Fixed a
BaseOutputTransport
issue that would stop audio and video rendering tasks (after receiving andEndFrame
) before the internal queue was emptied, causing the pipeline to finish prematurely. -
StartFrame
should be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens onStartFrame
) and other frames come in resulting in undesired behavior.
Performance
obj_id()
andobj_count()
now useitertools.count
avoiding the need ofthreading.Lock
.
Other
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).