diff --git a/README.md b/README.md index 2a09aac241..9fba1b845b 100644 --- a/README.md +++ b/README.md @@ -41,6 +41,178 @@ agents that can see, hear, and understand. - **MCP support**: Native support for MCP. Integrate tools provided by MCP servers with one loc. - **Builtin test framework**: Write tests and use judges to ensure your agent is performing as expected. - **Open-source**: Fully open-source, allowing you to run the entire stack on your own servers, including [LiveKit server](https://github.com/livekit/livekit), one of the most widely used WebRTC media servers. +- **Backchannel Filtering**: Context-aware interruption handling that distinguishes between passive acknowledgements ("yeah", "ok") and real commands. + +## Backchannel Filtering + +This implementation includes a **context-aware backchannel filtering** system that prevents the agent from being interrupted by passive acknowledgement words while speaking. + +### How to Run the Agent + +**Prerequisites:** +```bash +# Create virtual environment +python -m venv venv + +# Activate virtual environment +# Windows: +.\venv\Scripts\activate +# Linux/Mac: +source venv/bin/activate + +# Install dependencies +pip install -e ./livekit-agents +pip install -r examples/voice_agents/requirements.txt +``` + +**Environment Variables (create `.env` file):** +```bash +LIVEKIT_URL=wss://your-livekit-server.com +LIVEKIT_API_KEY=your-api-key +LIVEKIT_API_SECRET=your-api-secret +OPENAI_API_KEY=your-openai-key +DEEPGRAM_API_KEY=your-deepgram-key +``` + +**Running the Agent:** +```bash +# Console mode (local testing without LiveKit server) +python examples/voice_agents/basic_agent.py console + +# Development mode (connects to LiveKit server with hot reload) +python examples/voice_agents/basic_agent.py dev + +# Production mode +python examples/voice_agents/basic_agent.py start +``` + +**Testing with LiveKit Playground:** +1. Run agent in dev mode: `python examples/voice_agents/basic_agent.py dev` +2. Open [Agents Playground](https://agents-playground.livekit.io/) +3. Connect with your LiveKit credentials +4. Test backchannel filtering by saying "ok" or "yeah" while agent speaks + +--- + +### How the Interruption Logic Works + +The system intercepts voice activity and transcript events **before** they trigger an interruption: + +``` +User speaks → VAD detects voice → STT provides transcript + ↓ + Agent currently speaking? + / \ + NO YES + ↓ ↓ + Process normally Is transcript a backchannel? + (respond to user) / \ + YES NO + ↓ ↓ + IGNORE completely INTERRUPT agent + (agent continues) (process command) +``` + +**Key Implementation Points:** + +1. **Transcript-First Approach**: When agent is speaking, interruptions are deferred until STT provides a transcript. This prevents VAD from triggering premature interruptions. + +2. **State-Based Filtering**: Backchannel detection ONLY applies when: + - Agent has active speech (`_current_speech is not None`) + - Speech is not already interrupted + - `allow_interruptions` is enabled + +3. **Command Word Override**: If transcript contains command words (stop, wait, what, etc.), it is NEVER treated as a backchannel, even if it also contains filler words. + +4. **No Audio Modification**: The filter works at the logic layer - VAD and STT are unchanged. + +**Code Location:** `livekit/agents/voice/agent_activity.py` → `_interrupt_by_audio_activity()` + +--- + +### How Ignore Words Are Configured + +**Option 1: Use Default Configuration (Recommended)** +```python +session = AgentSession( + backchannel_filtering=True, # Enabled by default +) +``` + +**Option 2: Custom Ignore Words** +```python +session = AgentSession( + backchannel_filtering=True, + backchannel_words=frozenset({ + 'yeah', 'yep', 'ok', 'okay', 'hmm', 'mhm', + 'uh huh', 'right', 'got it', 'sure' + }), +) +``` + +**Option 3: Disable Backchannel Filtering** +```python +session = AgentSession( + backchannel_filtering=False, # VAD triggers interrupt on any speech +) +``` + +**Default Ignored Words:** + +| Category | Words | +|----------|-------| +| Affirmations | yeah, yep, ok, okay, right, alright, sure | +| Listening signals | aha, hmm, mhm, uh-huh, mm-hmm | +| Understanding | got it, gotcha, i see, oh, cool, nice, great | +| Hesitations | uh, um, er, ah | + +**Command Words (Never Ignored):** + +| Category | Words | +|----------|-------| +| Stop commands | stop, wait, hold, pause, halt | +| Questions | what, why, how, when, where, who | +| Negations | no, not, never | +| Redirection | actually, instead, however | + +**Using BackchannelFilter Class Directly:** +```python +from livekit.agents.voice.backchannel_filter import BackchannelFilter + +filter = BackchannelFilter( + backchannel_words=frozenset({'yeah', 'ok', 'custom'}), + command_words=frozenset({'stop', 'wait'}), + max_words=5 +) + +filter.is_backchannel("yeah") # True +filter.is_backchannel("yeah wait") # False (contains command) +filter.contains_command_words("stop") # True +``` + +--- + +### Running Unit Tests + +```bash +# Run all backchannel filter tests (79 tests) +python -m pytest tests/test_backchannel_filter.py -v + +# Run specific scenario tests +python -m pytest tests/test_backchannel_filter.py -v -k "scenario" + +# Run with short output +python -m pytest tests/test_backchannel_filter.py --tb=short +``` + +### Files Modified + +| File | Changes | +|------|---------| +| `livekit/agents/voice/backchannel_filter.py` | New module with `BackchannelFilter` class | +| `livekit/agents/voice/agent_activity.py` | Interruption handling with backchannel detection | +| `livekit/agents/voice/agent_session.py` | Configuration options | +| `tests/test_backchannel_filter.py` | 79 unit tests | ## Installation diff --git a/livekit-agents/livekit/agents/ipc/supervised_proc.py b/livekit-agents/livekit/agents/ipc/supervised_proc.py index b1d192fbaa..19b5d77bb9 100644 --- a/livekit-agents/livekit/agents/ipc/supervised_proc.py +++ b/livekit-agents/livekit/agents/ipc/supervised_proc.py @@ -39,6 +39,13 @@ def _mask_ctrl_c() -> Generator[None, None, None]: finally: signal.pthread_sigmask(signal.SIG_UNBLOCK, [signal.SIGINT]) else: + # On Windows, signal.signal() only works in the main thread + # Check if we're in the main thread before attempting to mask + if threading.current_thread() is not threading.main_thread(): + # Not in main thread, skip signal masking + yield + return + old = signal.signal(signal.SIGINT, signal.SIG_IGN) try: yield diff --git a/livekit-agents/livekit/agents/voice/agent_activity.py b/livekit-agents/livekit/agents/voice/agent_activity.py index 0c3f7c743d..90f56bfe2d 100644 --- a/livekit-agents/livekit/agents/voice/agent_activity.py +++ b/livekit-agents/livekit/agents/voice/agent_activity.py @@ -75,6 +75,7 @@ update_instructions, ) from .speech_handle import SpeechHandle +from .backchannel_filter import is_backchannel, is_likely_continuation if TYPE_CHECKING: from ..llm import mcp @@ -1166,7 +1167,7 @@ def _on_generation_created(self, ev: llm.GenerationCreatedEvent) -> None: ) self._schedule_speech(handle, SpeechHandle.SPEECH_PRIORITY_NORMAL) - def _interrupt_by_audio_activity(self) -> None: + def _interrupt_by_audio_activity(self, *, speech_duration: float | None = None) -> None: opt = self._session.options use_pause = opt.resume_false_interruption and opt.false_interruption_timeout is not None @@ -1174,25 +1175,41 @@ def _interrupt_by_audio_activity(self) -> None: # ignore if realtime model has turn detection enabled return + # Get current transcript for backchannel and word count checks + transcript = "" + if self._audio_recognition is not None: + transcript = self._audio_recognition.current_transcript + + # Check if agent has an active (non-interrupted) speech + has_active_speech = ( + self._current_speech is not None + and not self._current_speech.interrupted + and self._current_speech.allow_interruptions + ) + + # BACKCHANNEL FILTERING: When agent is speaking, require transcript before interrupting + if opt.backchannel_filtering and has_active_speech: + # If no transcript yet, defer - don't interrupt until we know what user said + if not transcript: + return + + # Check if it's a backchannel - if so, ignore + if is_backchannel(transcript, custom_words=opt.backchannel_words): + return + if ( self.stt is not None and opt.min_interruption_words > 0 - and self._audio_recognition is not None + and transcript ): - text = self._audio_recognition.current_transcript - # TODO(long): better word splitting for multi-language - if len(split_words(text, split_character=True)) < opt.min_interruption_words: + if len(split_words(transcript, split_character=True)) < opt.min_interruption_words: return if self._rt_session is not None: self._rt_session.start_user_activity() - if ( - self._current_speech is not None - and not self._current_speech.interrupted - and self._current_speech.allow_interruptions - ): + if has_active_speech: self._paused_speech = self._current_speech # reset the false interruption timer @@ -1209,6 +1226,7 @@ def _interrupt_by_audio_activity(self) -> None: self._current_speech.interrupt() + # region recognition hooks def on_start_of_speech(self, ev: vad.VADEvent | None) -> None: @@ -1241,23 +1259,50 @@ def on_vad_inference_done(self, ev: vad.VADEvent) -> None: return if ev.speech_duration >= self._session.options.min_interruption_duration: - self._interrupt_by_audio_activity() + self._interrupt_by_audio_activity(speech_duration=ev.speech_duration) + + def _is_backchannel_during_speech(self, transcript: str) -> bool: + # Check if transcript is a backchannel while agent has active speech. + opt = self._session.options + if not opt.backchannel_filtering: + return False + + has_active_speech = ( + self._current_speech is not None + and not self._current_speech.interrupted + and self._current_speech.allow_interruptions + ) + + if not has_active_speech: + return False + + return is_backchannel(transcript, custom_words=opt.backchannel_words) def on_interim_transcript(self, ev: stt.SpeechEvent, *, speaking: bool | None) -> None: if isinstance(self.llm, llm.RealtimeModel) and self.llm.capabilities.user_transcription: # skip stt transcription if user_transcription is enabled on the realtime model return + transcript = ev.alternatives[0].text + + # Check if this is a backchannel while agent is speaking + is_bc = self._is_backchannel_during_speech(transcript) + + # Always display the transcript on screen self._session._user_input_transcribed( UserInputTranscribedEvent( language=ev.alternatives[0].language, - transcript=ev.alternatives[0].text, + transcript=transcript, is_final=False, speaker_id=ev.alternatives[0].speaker_id, ), ) + + # Skip interruption logic for backchannels + if is_bc: + return - if ev.alternatives[0].text and self._turn_detection not in ( + if transcript and self._turn_detection not in ( "manual", "realtime_llm", ): @@ -1276,14 +1321,25 @@ def on_final_transcript(self, ev: stt.SpeechEvent, *, speaking: bool | None = No # skip stt transcription if user_transcription is enabled on the realtime model return + transcript = ev.alternatives[0].text + + # Check if this is a backchannel while agent is speaking + is_bc = self._is_backchannel_during_speech(transcript) + + # Always display the transcript on screen self._session._user_input_transcribed( UserInputTranscribedEvent( language=ev.alternatives[0].language, - transcript=ev.alternatives[0].text, + transcript=transcript, is_final=True, speaker_id=ev.alternatives[0].speaker_id, ), ) + + # Skip interruption and turn completion for backchannels + if is_bc: + return + # agent speech might not be interrupted if VAD failed and a final transcript is received # we call _interrupt_by_audio_activity (idempotent) to pause the speech, if possible # which will also be immediately interrupted @@ -1365,6 +1421,11 @@ def on_end_of_turn(self, info: _EndOfTurnInfo) -> bool: # TODO(theomonnom): should we "forward" this new turn to the next agent/activity? return True + # BACKCHANNEL FILTERING: If agent is speaking and transcript is a backchannel, ignore + if self._is_backchannel_during_speech(info.new_transcript): + self._cancel_preemptive_generation() + return False + if ( self.stt is not None and self._turn_detection != "manual" diff --git a/livekit-agents/livekit/agents/voice/agent_session.py b/livekit-agents/livekit/agents/voice/agent_session.py index 628718a6b2..7896930f11 100644 --- a/livekit-agents/livekit/agents/voice/agent_session.py +++ b/livekit-agents/livekit/agents/voice/agent_session.py @@ -89,6 +89,9 @@ class AgentSessionOptions: preemptive_generation: bool tts_text_transforms: Sequence[TextTransforms] | None ivr_detection: bool + # Backchannel filtering options + backchannel_filtering: bool + backchannel_words: frozenset[str] | None Userdata_T = TypeVar("Userdata_T") @@ -159,6 +162,8 @@ def __init__( tts_text_transforms: NotGivenOr[Sequence[TextTransforms] | None] = NOT_GIVEN, preemptive_generation: bool = False, ivr_detection: bool = False, + backchannel_filtering: bool = True, + backchannel_words: frozenset[str] | None = None, conn_options: NotGivenOr[SessionConnectOptions] = NOT_GIVEN, loop: asyncio.AbstractEventLoop | None = None, # deprecated @@ -288,6 +293,8 @@ def __init__( use_tts_aligned_transcript=use_tts_aligned_transcript if is_given(use_tts_aligned_transcript) else None, + backchannel_filtering=backchannel_filtering, + backchannel_words=backchannel_words, ) self._conn_options = conn_options or SessionConnectOptions() self._started = False @@ -490,6 +497,7 @@ async def start( return None self._started_at = time.time() + logger.info("Agent session started at %s", self._started_at) # configure observability first job_ctx: JobContext | None = None diff --git a/livekit-agents/livekit/agents/voice/backchannel_filter.py b/livekit-agents/livekit/agents/voice/backchannel_filter.py new file mode 100644 index 0000000000..c04afb8da7 --- /dev/null +++ b/livekit-agents/livekit/agents/voice/backchannel_filter.py @@ -0,0 +1,192 @@ +""" +Backchannel filtering module for context-aware interruption handling. + +This module provides functionality to detect passive acknowledgement words +like "yeah", "ok", "hmm" that users say while listening to indicate engagement, +without intending to interrupt the speaker. + +When the agent is speaking, these filler words should be ignored to allow +seamless continuation of speech without pause or stutter. +""" + +from __future__ import annotations + +import re +from typing import FrozenSet + + +# Default words commonly used for backchanneling / passive acknowledgement +DEFAULT_BACKCHANNEL_WORDS: FrozenSet[str] = frozenset({ + "yeah", "yea", "ya", "yep", "yup", "yes", "yess", + "ok", "okay", "okie", "k", "kay", + "right", "alright", "aight", "sure", "surely", + "aha", "ah ha", "a-ha", + "hmm", "hm", "hmm-hmm", "hmmm", "mm", "mmm", "mhm", "mm-hmm", "mmhmm", + "uh huh", "uh-huh", "uhuh", + "got it", "gotcha", "gotchu", "i see", "see", "oh i see", + "oh", "ohh", "oh ok", "oh okay", "cool", "nice", "great", "good", + "uh", "um", "er", "ah", "eh" +}) + +# Words that indicate real intent/commands, not backchanneling +COMMAND_WORDS: FrozenSet[str] = frozenset({ + "stop", "wait", "hold", "pause", "halt", "quiet", "silence", "enough", + "what", "why", "how", "when", "where", "who", "which", + "no", "not", "never", + "can", "could", "would", "should", "will", "do", "does", "did", + "however", "actually", "instead", "rather", + "also", "plus", "another", + "hey", "listen", "excuse", "sorry", + "repeat", "again", "pardon", "huh" +}) + + +class BackchannelFilter: + # A class to detect and filter backchannel (passive acknowledgement) speech. + + def __init__( + self, + backchannel_words: FrozenSet[str] | None = None, + command_words: FrozenSet[str] | None = None, + max_words: int = 5 + ): + # Initialize the BackchannelFilter. + + self._backchannel_words = backchannel_words or DEFAULT_BACKCHANNEL_WORDS + self._command_words = command_words or COMMAND_WORDS + self._max_words = max_words + self._pattern: re.Pattern | None = None + + @property + def backchannel_words(self) -> FrozenSet[str]: + return self._backchannel_words + + @property + def command_words(self) -> FrozenSet[str]: + return self._command_words + + @property + def max_words(self) -> int: + return self._max_words + + def _get_pattern(self) -> re.Pattern: + # Get or create the regex pattern for backchannel matching. + if self._pattern is None: + sorted_words = sorted(self._backchannel_words, key=len, reverse=True) + pattern = "|".join(re.escape(word) for word in sorted_words) + self._pattern = re.compile(f"^({pattern})$", re.IGNORECASE) + return self._pattern + + @staticmethod + def normalize(transcript: str) -> str: + # Normalize transcript for comparison. + text = transcript.lower().strip() + text = re.sub(r"[.,!?;:'\"-]", "", text) + text = re.sub(r"\s+", " ", text) + return text.strip() + + def is_backchannel(self, transcript: str) -> bool: + # Determine if a transcript is a passive acknowledgement (backchannel). + # True if the transcript is a passive acknowledgement, False otherwise + if not transcript: + return False + + normalized = self.normalize(transcript) + if not normalized: + return False + + words = normalized.split() + word_count = len(words) + + # Too many words = likely a real statement + if word_count > self._max_words: + return False + + # Contains command words = likely a real interruption + if self.contains_command_words(normalized): + return False + + # Check if entire transcript matches a known backchannel phrase + if normalized in self._backchannel_words: + return True + + # Single word check + if word_count == 1 and words[0] in self._backchannel_words: + return True + + # All words are fillers (handles "ok ok ok", "yeah yeah yeah") + if all(word in self._backchannel_words for word in words): + return True + + # Pattern match for variations + if self._get_pattern().match(normalized): + return True + + return False + + def contains_command_words(self, transcript: str) -> bool: + # Check if transcript contains words indicating command intent. + if not transcript: + return False + + words = set(transcript.lower().split()) + return bool(words & self._command_words) + + def is_likely_continuation(self, transcript: str) -> bool: + # Check if transcript looks like the user is continuing their thought. + normalized = self.normalize(transcript) + words = normalized.split() + + if len(words) <= 2: + return False + + first_word = words[0] if words else "" + if first_word in self._backchannel_words and len(words) > 2: + return True + + return False + + +# Module-level functions for backward compatibility +_default_filter: BackchannelFilter | None = None + + +def _get_default_filter() -> BackchannelFilter: + # Get or create the default filter instance. + global _default_filter + if _default_filter is None: + _default_filter = BackchannelFilter() + return _default_filter + + +def normalize_transcript(transcript: str) -> str: + # Normalize transcript for comparison. + return BackchannelFilter.normalize(transcript) + + +def is_backchannel( + transcript: str, + *, + max_words: int = 5, + custom_words: FrozenSet[str] | None = None, +) -> bool: + # Determine if a transcript is a passive acknowledgement (backchannel). + if custom_words is not None or max_words != 5: + # Create a custom filter instance + filter_instance = BackchannelFilter( + backchannel_words=custom_words, + max_words=max_words + ) + return filter_instance.is_backchannel(transcript) + + return _get_default_filter().is_backchannel(transcript) + + +def contains_command_words(transcript: str) -> bool: + # Check if transcript contains words indicating command intent. + return _get_default_filter().contains_command_words(transcript) + + +def is_likely_continuation(transcript: str) -> bool: + # Check if transcript looks like the user is continuing their thought. + return _get_default_filter().is_likely_continuation(transcript) \ No newline at end of file diff --git a/tests/test_backchannel_filter.py b/tests/test_backchannel_filter.py new file mode 100644 index 0000000000..b63b722392 --- /dev/null +++ b/tests/test_backchannel_filter.py @@ -0,0 +1,229 @@ +""" +Unit test's for backchannel filtering check. + +The tests verify that the backchannel logic is correct or not: +1. Identifies single word backchannels like (yeah, ok, hmm, etc.) +2. Identifies multi word backchannels like (ok ok ok, yeah yeah) +3. Detects command words that should NOT be ignored like (stop, wait, hold on, etc.) +4. Handles mixed inputs like (yeah wait a second, etc.) +5. Handles edge cases and normalization like (etc.) +""" + +import pytest +from livekit.agents.voice.backchannel_filter import ( + is_backchannel, + contains_command_words, + is_likely_continuation, + normalize_transcript, + DEFAULT_BACKCHANNEL_WORDS, + COMMAND_WORDS, +) + + +class TestNormalizeTranscript: + # Tests for transcript normalization. + + def test_lowercase_conversion(self): + assert normalize_transcript("YEAH") == "yeah" + assert normalize_transcript("OK") == "ok" + + def test_strip_whitespace(self): + assert normalize_transcript(" yeah ") == "yeah" + assert normalize_transcript("\tyeah\n") == "yeah" + + def test_remove_punctuation(self): + assert normalize_transcript("yeah!") == "yeah" + assert normalize_transcript("ok.") == "ok" + assert normalize_transcript("hmm?") == "hmm" + assert normalize_transcript("yeah, ok") == "yeah ok" + + def test_collapse_multiple_spaces(self): + assert normalize_transcript("yeah ok") == "yeah ok" + + +class TestIsBackchannel: + # Tests for backchannel detection. + + # Test Scenario 1: Single word backchannels + @pytest.mark.parametrize("word", [ + "yeah", "yea", "ya", "yep", "yup", "yes", + "ok", "okay", "okie", "k", "kay", + "right", "alright", + "aha", "hmm", "hm", "mhm", "mmhmm", + "uhuh", + "got it", "gotcha", + "i see", "oh", "cool", "nice", "great", + "uh", "um", "er", "ah" + ]) + def test_single_word_backchannels(self, word): + # Single backchannel words should be detected. + assert is_backchannel(word) is True, f"'{word}' should be a backchannel" + + # Test Scenario 2: Repeated backchannels + @pytest.mark.parametrize("phrase", [ + "ok ok ok", + "yeah yeah", + "mmhmm mmhmm", + "uh uh uh" + ]) + def test_repeated_backchannels(self, phrase): + # Repeated backchannel words should be detected. + assert is_backchannel(phrase) is True, f"'{phrase}' should be a backchannel" + + # Test Scenario 3: Command words should NOT be backchannels + @pytest.mark.parametrize("command", [ + "stop", + "wait", + "hold on", + "pause", + "what", + "why", + "how", + "repeat", + ]) + def test_command_words_not_backchannels(self, command): + # Command words should NOT be detected as backchannels. + assert is_backchannel(command) is False, f"'{command}' should NOT be a backchannel" + + # Test Scenario 4: Mixed inputs with command words + @pytest.mark.parametrize("mixed", [ + "yeah wait", + "yeah wait a second", + "ok but stop", + "hmm actually", + "yeah okay but wait", + "uh huh what did you say" + ]) + def test_mixed_input_with_commands(self, mixed): + # Mixed inputs containing command words should NOT be backchannels. + assert is_backchannel(mixed) is False, f"'{mixed}' should NOT be a backchannel" + + # Test edge cases + def test_empty_string(self): + assert is_backchannel("") is False + + def test_whitespace_only(self): + assert is_backchannel(" ") is False + + def test_long_sentence_not_backchannel(self): + # Long sentences should not be backchannels regardless of content. + assert is_backchannel("yeah I think that is a really good idea") is False + + def test_case_insensitive(self): + assert is_backchannel("YEAH") is True + assert is_backchannel("Ok") is True + assert is_backchannel("HMM") is True + + def test_with_punctuation(self): + assert is_backchannel("yeah!") is True + assert is_backchannel("ok.") is True + assert is_backchannel("hmm?") is True + + def test_custom_words(self): + # Custom word list can extend detection to custom words. + custom = frozenset({"custom", "words"}) + # Custom words ARE detected + assert is_backchannel("custom", custom_words=custom) is True + assert is_backchannel("words", custom_words=custom) is True + + +class TestContainsCommandWords: + # Tests for command word detection. + + @pytest.mark.parametrize("transcript,expected", [ + ("stop", True), + ("wait a second", True), + ("what did you say", True), + ("yeah", False), + ("ok ok ok", False), + ("please stop talking", True), + ("actually I think", True), + ]) + def test_command_detection(self, transcript, expected): + assert contains_command_words(transcript) is expected + + def test_empty_string(self): + assert contains_command_words("") is False + + +class TestIsLikelyContinuation: + # Tests for continuation detection. + + @pytest.mark.parametrize("transcript,expected", [ + ("yeah so I was wondering", True), + ("ok but what if", True), + ("hmm actually I think", True), + ("yeah", False), # Too short + ("ok ok", False), # Too short + ("hello there", False), # Doesn't start with backchannel + ]) + def test_continuation_detection(self, transcript, expected): + assert is_likely_continuation(transcript) is expected + + +class TestDefaultWordLists: + # Tests for default word lists. + + def test_default_backchannel_words_not_empty(self): + assert len(DEFAULT_BACKCHANNEL_WORDS) > 0 + + def test_command_words_not_empty(self): + assert len(COMMAND_WORDS) > 0 + + def test_no_overlap_between_lists(self): + # Backchannel words and command words should not overlap. + overlap = DEFAULT_BACKCHANNEL_WORDS & COMMAND_WORDS + # Note: "no" may appear in both as it can be acknowledgement or command + # We allow specific exceptions + allowed_overlap = {"no", "nope", "nah"} + unexpected_overlap = overlap - allowed_overlap + assert len(unexpected_overlap) == 0, f"Unexpected overlap: {unexpected_overlap}" + + +class TestScenarios: + # End-to-end tests matching the challenge test scenarios. + + def test_scenario_1_long_explanation(self): + """ + Scenario 1: The Long Explanation + Context: Agent is reading a long paragraph about history. + User Action: User says "Okay... yeah... uhuh" while Agent is talking. + Result: These should all be detected as backchannels. + """ + assert is_backchannel("okay") is True + assert is_backchannel("yeah") is True + assert is_backchannel("uhuh") is True + assert is_backchannel("mhm") is True + + def test_scenario_2_passive_affirmation(self): + """ + Scenario 2: The Passive Affirmation + Context: Agent asks "Are you ready?" and goes silent. + User Action: User says "Yeah." + Result: "Yeah" is still detected as backchannel by this function. + Note: The state-based logic in agent_activity.py determines whether + to ignore it (agent speaking) or respond to it (agent silent). + """ + assert is_backchannel("yeah") is True + + def test_scenario_3_correction(self): + """ + Scenario 3: The Correction + Context: Agent is counting "One, two, three..." + User Action: User says "No stop." + Result: Should NOT be a backchannel due to "stop" command word. + """ + assert is_backchannel("no stop") is False + + def test_scenario_4_mixed_input(self): + """ + Scenario 4: The Mixed Input + Context: Agent is speaking. + User Action: User says "Yeah okay but wait." + Result: Should NOT be a backchannel due to "wait" command word. + """ + assert is_backchannel("yeah okay but wait") is False + + +if __name__ == "__main__": + pytest.main([__file__, "-v"])