Dark-Sys-Jenkins · mitulkhemani2005 · Feb 2, 2026
diff --git a/README.md b/README.md
@@ -41,6 +41,178 @@ agents that can see, hear, and understand.
 - **MCP support**: Native support for MCP. Integrate tools provided by MCP servers with one loc.
 - **Builtin test framework**: Write tests and use judges to ensure your agent is performing as expected.
 - **Open-source**: Fully open-source, allowing you to run the entire stack on your own servers, including [LiveKit server](https://github.com/livekit/livekit), one of the most widely used WebRTC media servers.
+- **Backchannel Filtering**: Context-aware interruption handling that distinguishes between passive acknowledgements ("yeah", "ok") and real commands.
+
+## Backchannel Filtering
+
+This implementation includes a **context-aware backchannel filtering** system that prevents the agent from being interrupted by passive acknowledgement words while speaking.
+
+### How to Run the Agent
+
+**Prerequisites:**
+```bash
+# Create virtual environment
+python -m venv venv
+
+# Activate virtual environment
+# Windows:
+.\venv\Scripts\activate
+# Linux/Mac:
+source venv/bin/activate
+
+# Install dependencies
+pip install -e ./livekit-agents
+pip install -r examples/voice_agents/requirements.txt
+```
+
+**Environment Variables (create `.env` file):**
+```bash
+LIVEKIT_URL=wss://your-livekit-server.com
+LIVEKIT_API_KEY=your-api-key
+LIVEKIT_API_SECRET=your-api-secret
+OPENAI_API_KEY=your-openai-key
+DEEPGRAM_API_KEY=your-deepgram-key
+```
+
+**Running the Agent:**
+```bash
+# Console mode (local testing without LiveKit server)
+python examples/voice_agents/basic_agent.py console
+
+# Development mode (connects to LiveKit server with hot reload)
+python examples/voice_agents/basic_agent.py dev
+
+# Production mode
+python examples/voice_agents/basic_agent.py start
+```
+
+**Testing with LiveKit Playground:**
+1. Run agent in dev mode: `python examples/voice_agents/basic_agent.py dev`
+2. Open [Agents Playground](https://agents-playground.livekit.io/)
+3. Connect with your LiveKit credentials
+4. Test backchannel filtering by saying "ok" or "yeah" while agent speaks
+
+---
+
+### How the Interruption Logic Works
+
+The system intercepts voice activity and transcript events **before** they trigger an interruption:
+
+```
+User speaks → VAD detects voice → STT provides transcript
+                                         ↓
+                              Agent currently speaking?
+                             /                         \
+                           NO                          YES
+                           ↓                            ↓
+                    Process normally           Is transcript a backchannel?
+                    (respond to user)         /                          \
+                                            YES                          NO
+                                             ↓                            ↓
+                                      IGNORE completely           INTERRUPT agent
+                                      (agent continues)           (process command)
+```
+
+**Key Implementation Points:**
+
+1. **Transcript-First Approach**: When agent is speaking, interruptions are deferred until STT provides a transcript. This prevents VAD from triggering premature interruptions.
+
+2. **State-Based Filtering**: Backchannel detection ONLY applies when:
+   - Agent has active speech (`_current_speech is not None`)
+   - Speech is not already interrupted
+   - `allow_interruptions` is enabled
+
+3. **Command Word Override**: If transcript contains command words (stop, wait, what, etc.), it is NEVER treated as a backchannel, even if it also contains filler words.
+
+4. **No Audio Modification**: The filter works at the logic layer - VAD and STT are unchanged.
+
+**Code Location:** `livekit/agents/voice/agent_activity.py` → `_interrupt_by_audio_activity()`
+
+---
+
+### How Ignore Words Are Configured
+
+**Option 1: Use Default Configuration (Recommended)**
+```python
+session = AgentSession(
+    backchannel_filtering=True,  # Enabled by default
+)
+```
+
+**Option 2: Custom Ignore Words**
+```python
+session = AgentSession(
+    backchannel_filtering=True,
+    backchannel_words=frozenset({
+        'yeah', 'yep', 'ok', 'okay', 'hmm', 'mhm', 
+        'uh huh', 'right', 'got it', 'sure'
+    }),
+)
+```
+
+**Option 3: Disable Backchannel Filtering**
+```python
+session = AgentSession(
+    backchannel_filtering=False,  # VAD triggers interrupt on any speech
+)
+```
+
+**Default Ignored Words:**
+
+| Category | Words |
+|----------|-------|
+| Affirmations | yeah, yep, ok, okay, right, alright, sure |
+| Listening signals | aha, hmm, mhm, uh-huh, mm-hmm |
+| Understanding | got it, gotcha, i see, oh, cool, nice, great |
+| Hesitations | uh, um, er, ah |
+
+**Command Words (Never Ignored):**
+
+| Category | Words |
+|----------|-------|
+| Stop commands | stop, wait, hold, pause, halt |
+| Questions | what, why, how, when, where, who |
+| Negations | no, not, never |
+| Redirection | actually, instead, however |
+
+**Using BackchannelFilter Class Directly:**
+```python
+from livekit.agents.voice.backchannel_filter import BackchannelFilter
+
+filter = BackchannelFilter(
+    backchannel_words=frozenset({'yeah', 'ok', 'custom'}),
+    command_words=frozenset({'stop', 'wait'}),
+    max_words=5
+)
+
+filter.is_backchannel("yeah")           # True
+filter.is_backchannel("yeah wait")      # False (contains command)
+filter.contains_command_words("stop")   # True
+```
+
+---
+
+### Running Unit Tests
+
+```bash
+# Run all backchannel filter tests (79 tests)
+python -m pytest tests/test_backchannel_filter.py -v
+
+# Run specific scenario tests
+python -m pytest tests/test_backchannel_filter.py -v -k "scenario"
+
+# Run with short output
+python -m pytest tests/test_backchannel_filter.py --tb=short
+```
+
+### Files Modified
+
+| File | Changes |
+|------|---------|
+| `livekit/agents/voice/backchannel_filter.py` | New module with `BackchannelFilter` class |
+| `livekit/agents/voice/agent_activity.py` | Interruption handling with backchannel detection |
+| `livekit/agents/voice/agent_session.py` | Configuration options |
+| `tests/test_backchannel_filter.py` | 79 unit tests |
 
 ## Installation
 

diff --git a/livekit-agents/livekit/agents/ipc/supervised_proc.py b/livekit-agents/livekit/agents/ipc/supervised_proc.py
@@ -39,6 +39,13 @@ def _mask_ctrl_c() -> Generator[None, None, None]:
         finally:
             signal.pthread_sigmask(signal.SIG_UNBLOCK, [signal.SIGINT])
     else:
+        # On Windows, signal.signal() only works in the main thread
+        # Check if we're in the main thread before attempting to mask
+        if threading.current_thread() is not threading.main_thread():
+            # Not in main thread, skip signal masking
+            yield
+            return
+
         old = signal.signal(signal.SIGINT, signal.SIG_IGN)
         try:
             yield

diff --git a/livekit-agents/livekit/agents/voice/agent_activity.py b/livekit-agents/livekit/agents/voice/agent_activity.py
@@ -75,6 +75,7 @@
     update_instructions,
 )
 from .speech_handle import SpeechHandle
+from .backchannel_filter import is_backchannel, is_likely_continuation
 
 if TYPE_CHECKING:
     from ..llm import mcp
@@ -1166,33 +1167,49 @@ def _on_generation_created(self, ev: llm.GenerationCreatedEvent) -> None:
         )
         self._schedule_speech(handle, SpeechHandle.SPEECH_PRIORITY_NORMAL)
 
-    def _interrupt_by_audio_activity(self) -> None:
+    def _interrupt_by_audio_activity(self, *, speech_duration: float | None = None) -> None:
         opt = self._session.options
         use_pause = opt.resume_false_interruption and opt.false_interruption_timeout is not None
 
         if isinstance(self.llm, llm.RealtimeModel) and self.llm.capabilities.turn_detection:
             # ignore if realtime model has turn detection enabled
             return
 
+        # Get current transcript for backchannel and word count checks
+        transcript = ""
+        if self._audio_recognition is not None:
+            transcript = self._audio_recognition.current_transcript
+
+        # Check if agent has an active (non-interrupted) speech
+        has_active_speech = (
+            self._current_speech is not None
+            and not self._current_speech.interrupted
+            and self._current_speech.allow_interruptions
+        )
+
+        # BACKCHANNEL FILTERING: When agent is speaking, require transcript before interrupting
+        if opt.backchannel_filtering and has_active_speech:
+            # If no transcript yet, defer - don't interrupt until we know what user said
+            if not transcript:
+                return
+
+            # Check if it's a backchannel - if so, ignore
+            if is_backchannel(transcript, custom_words=opt.backchannel_words):
+                return
+
         if (
             self.stt is not None
             and opt.min_interruption_words > 0
-            and self._audio_recognition is not None
+            and transcript
         ):
-            text = self._audio_recognition.current_transcript
-
             # TODO(long): better word splitting for multi-language
-            if len(split_words(text, split_character=True)) < opt.min_interruption_words:
+            if len(split_words(transcript, split_character=True)) < opt.min_interruption_words:
                 return
 
         if self._rt_session is not None:
             self._rt_session.start_user_activity()
 
-        if (
-            self._current_speech is not None
-            and not self._current_speech.interrupted
-            and self._current_speech.allow_interruptions
-        ):
+        if has_active_speech:
             self._paused_speech = self._current_speech
 
             # reset the false interruption timer
@@ -1209,6 +1226,7 @@ def _interrupt_by_audio_activity(self) -> None:
 
                 self._current_speech.interrupt()
 
+
     # region recognition hooks
 
     def on_start_of_speech(self, ev: vad.VADEvent | None) -> None:
@@ -1241,23 +1259,50 @@ def on_vad_inference_done(self, ev: vad.VADEvent) -> None:
             return
 
         if ev.speech_duration >= self._session.options.min_interruption_duration:
-            self._interrupt_by_audio_activity()
+            self._interrupt_by_audio_activity(speech_duration=ev.speech_duration)
+
+    def _is_backchannel_during_speech(self, transcript: str) -> bool:
+        # Check if transcript is a backchannel while agent has active speech.
+        opt = self._session.options
+        if not opt.backchannel_filtering:
+            return False
+
+        has_active_speech = (
+            self._current_speech is not None
+            and not self._current_speech.interrupted
+            and self._current_speech.allow_interruptions
+        )
+
+        if not has_active_speech:
+            return False
+
+        return is_backchannel(transcript, custom_words=opt.backchannel_words)
 
     def on_interim_transcript(self, ev: stt.SpeechEvent, *, speaking: bool | None) -> None:
         if isinstance(self.llm, llm.RealtimeModel) and self.llm.capabilities.user_transcription:
             # skip stt transcription if user_transcription is enabled on the realtime model
             return
 
+        transcript = ev.alternatives[0].text
+
+        # Check if this is a backchannel while agent is speaking
+        is_bc = self._is_backchannel_during_speech(transcript)
+
+        # Always display the transcript on screen
         self._session._user_input_transcribed(
             UserInputTranscribedEvent(
                 language=ev.alternatives[0].language,
-                transcript=ev.alternatives[0].text,
+                transcript=transcript,
                 is_final=False,
                 speaker_id=ev.alternatives[0].speaker_id,
             ),
         )
+
+        # Skip interruption logic for backchannels
+        if is_bc:
+            return
 
-        if ev.alternatives[0].text and self._turn_detection not in (
+        if transcript and self._turn_detection not in (
             "manual",
             "realtime_llm",
         ):
@@ -1276,14 +1321,25 @@ def on_final_transcript(self, ev: stt.SpeechEvent, *, speaking: bool | None = No
             # skip stt transcription if user_transcription is enabled on the realtime model
             return
 
+        transcript = ev.alternatives[0].text
+
+        # Check if this is a backchannel while agent is speaking
+        is_bc = self._is_backchannel_during_speech(transcript)
+
+        # Always display the transcript on screen
         self._session._user_input_transcribed(
             UserInputTranscribedEvent(
                 language=ev.alternatives[0].language,
-                transcript=ev.alternatives[0].text,
+                transcript=transcript,
                 is_final=True,
                 speaker_id=ev.alternatives[0].speaker_id,
             ),
         )
+
+        # Skip interruption and turn completion for backchannels
+        if is_bc:
+            return
+
         # agent speech might not be interrupted if VAD failed and a final transcript is received
         # we call _interrupt_by_audio_activity (idempotent) to pause the speech, if possible
         # which will also be immediately interrupted
@@ -1365,6 +1421,11 @@ def on_end_of_turn(self, info: _EndOfTurnInfo) -> bool:
             # TODO(theomonnom): should we "forward" this new turn to the next agent/activity?
             return True
 
+        # BACKCHANNEL FILTERING: If agent is speaking and transcript is a backchannel, ignore
+        if self._is_backchannel_during_speech(info.new_transcript):
+            self._cancel_preemptive_generation()
+            return False
+
         if (
             self.stt is not None
             and self._turn_detection != "manual"

diff --git a/livekit-agents/livekit/agents/voice/agent_session.py b/livekit-agents/livekit/agents/voice/agent_session.py
@@ -89,6 +89,9 @@ class AgentSessionOptions:
     preemptive_generation: bool
     tts_text_transforms: Sequence[TextTransforms] | None
     ivr_detection: bool
+    # Backchannel filtering options
+    backchannel_filtering: bool
+    backchannel_words: frozenset[str] | None
 
 
 Userdata_T = TypeVar("Userdata_T")
@@ -159,6 +162,8 @@ def __init__(
         tts_text_transforms: NotGivenOr[Sequence[TextTransforms] | None] = NOT_GIVEN,
         preemptive_generation: bool = False,
         ivr_detection: bool = False,
+        backchannel_filtering: bool = True,
+        backchannel_words: frozenset[str] | None = None,
         conn_options: NotGivenOr[SessionConnectOptions] = NOT_GIVEN,
         loop: asyncio.AbstractEventLoop | None = None,
         # deprecated
@@ -288,6 +293,8 @@ def __init__(
             use_tts_aligned_transcript=use_tts_aligned_transcript
             if is_given(use_tts_aligned_transcript)
             else None,
+            backchannel_filtering=backchannel_filtering,
+            backchannel_words=backchannel_words,
         )
         self._conn_options = conn_options or SessionConnectOptions()
         self._started = False
@@ -490,6 +497,7 @@ async def start(
                 return None
 
             self._started_at = time.time()
+            logger.info("Agent session started at %s", self._started_at)
 
             # configure observability first
             job_ctx: JobContext | None = None