Skip to content

v0.0.42

Compare
Choose a tag to compare
@aconchillo aconchillo released this 02 Oct 21:08
· 1397 commits to main since this release
65eeb0f

Added

  • SentryMetrics has been added to report frame processor metrics to Sentry. This is now possible because FrameProcessorMetrics can now be passed to FrameProcessor.

  • Added Google TTS service and corresponding foundational example 07n-interruptible-google.py

  • Added AWS Polly TTS support and 07m-interruptible-aws.py as an example.

  • Added InputParams to Azure TTS service.

  • Added LivekitTransport (audio-only for now).

  • RTVI 0.2.0 is now supported.

  • All FrameProcessors can now register event handlers.

tts = SomeTTSService(...)

@tts.event_handler("on_connected"):
async def on_connected(processor):
  ...
  • Added AsyncGeneratorProcessor. This processor can be used together with a FrameSerializer as an async generator. It provides a generator() function that returns an AsyncGenerator and that yields serialized frames.

  • Added EndTaskFrame and CancelTaskFrame. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively.

  • Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.

  • Added sample_rate as a constructor parameter for TTS services.

  • Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames. To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task. In this version all the frame processors have their own task to push frames. That is, when push_frame() is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out.

  • Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional pts field (prensentation timestamp). There's currently just one clock implementation SystemClock and the pts field is currently only used for TextFrames (audio and image frames will be next).

  • A clock can now be specified to PipelineTask (defaults to SystemClock). This clock will be passed to each frame processor via the StartFrame.

  • Added CartesiaHttpTTSService.

  • DailyTransport now supports setting the audio bitrate to improve audio quality through the DailyParams.audio_out_bitrate parameter. The new default is 96kbps.

  • DailyTransport now uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed.

  • Interruptions support has been added to TwilioFrameSerializer when using FastAPIWebsocketTransport.

  • Added new LmntTTSService text-to-speech service. (see https://www.lmnt.com/)

  • Added TTSModelUpdateFrame, TTSLanguageUpdateFrame, STTModelUpdateFrame, and STTLanguageUpdateFrame frames to allow you to switch models, language and voices in TTS and STT services.

  • Added new transcriptions.Language enum.

Changed

  • Context frames are now pushed downstream from assistant context aggregators.

  • Removed Silero VAD torch dependency.

  • Updated individual update settings frame classes into a single ServiceUpdateSettingsFrame class.

  • We now distinguish between input and output audio and image frames. We introduce InputAudioRawFrame, OutputAudioRawFrame, InputImageRawFrame and OutputImageRawFrame (and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport.

  • ParallelTask has been renamed to SyncParallelPipeline. A SyncParallelPipeline is a frame processor that contains a list of different pipelines to be executed concurrently. The difference between a SyncParallelPipeline and a ParallelPipeline is that, given an input frame, the SyncParallelPipeline will wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response).

  • StartFrame is back a system frame to make sure it's processed immediately by all processors. EndFrame stays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed.

  • Updated MoondreamService revision to 2024-08-26.

  • CartesiaTTSService and ElevenLabsTTSService now add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time.

  • DailyTransport.on_joined event now returns the full session data instead of just the participant.

  • CartesiaTTSService is now a subclass of TTSService.

  • DeepgramSTTService is now a subclass of STTService.

  • WhisperSTTService is now a subclass of SegmentedSTTService. A SegmentedSTTService is a STTService where the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.

Fixed

  • Fixed OpenAI multiple function calls.

  • Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.

  • Fixed a BaseOutputTransport issue that would stop audio and video rendering tasks (after receiving and EndFrame) before the internal queue was emptied, causing the pipeline to finish prematurely.

  • StartFrame should be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens on StartFrame) and other frames come in resulting in undesired behavior.

Performance

  • obj_id() and obj_count() now use itertools.count avoiding the need of threading.Lock.

Other