Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,3 +373,90 @@ The Agents framework is under active development in a rapidly evolving field. We
</tbody>
</table>
<!--END_REPO_NAV-->




---

## Context-Aware Interruption Handling (Assignment Solution)

This section documents the solution implemented for the **LiveKit Intelligent Interruption Handling** assignment.

### Overview

The goal of this implementation is to improve conversational flow by distinguishing between:

- **Passive acknowledgements** such as “yeah”, “ok”, “hmm”
- **Active interruptions** such as “stop”, “wait”, “no”

based on whether the agent is **currently speaking or silent**, without modifying the low-level Voice Activity Detection (VAD) system.

---

### How to Run the Agent

1. Set environment variables:

```bash
setx OPENAI_API_KEY "<your_openai_key>"
setx DEEPGRAM_API_KEY "<your_deepgram_key>"

(On macOS/Linux, use export instead of setx.)
2.Run the agent:
python examples/voice_agents/interrupt_agent.py dev

3.Connect using the LiveKit Agents Playground or any LiveKit client.

Configurable Ignore List

Passive acknowledgement words are defined in a configurable set:
IGNORE_WORDS = {
"okay", "ok", "yeah", "hmm", "uh", "uhh", "uh-huh", "right", "aha"
}
These words are treated as soft inputs when the agent is speaking.

State-Based Filtering

The agent tracks whether it is currently speaking using audio lifecycle hooks:
async def on_audio_start(self):
self.speaking = True

async def on_audio_end(self):
self.speaking = False

| Agent State | User Input | Behavior |
| ----------- | ------------- | ---------------------- |
| Speaking | “yeah / ok” | Ignored |
| Silent | “yeah / ok” | Responded to |
| Speaking | “stop / wait” | Immediate interruption |


Semantic Interruption Detection

User input is classified semantically:

If an interruption keyword is present, the agent stops immediately

Mixed inputs such as “yeah okay but wait” correctly trigger an interruption

Handling False Interruptions (No VAD Modification)

Voice Activity Detection (VAD) operates faster than speech-to-text and may briefly interrupt agent audio before transcription is available.

Instead of modifying VAD (which is explicitly disallowed), this implementation:

Allows VAD to trigger normally

Identifies passive acknowledgements after transcription

Treats them as false interruptions

Automatically resumes the agent’s speech

Suppresses unintended responses

This behavior is enabled using LiveKit’s built-in recovery mechanism:
resume_false_interruption=True
false_interruption_timeout=1.5

156 changes: 156 additions & 0 deletions examples/voice_agents/interrupt_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
import os
import re
import logging
from dotenv import load_dotenv

from livekit.agents import (
Agent,
AgentServer,
AgentSession,
JobContext,
JobProcess,
cli,
room_io,
)
from livekit.plugins import silero

# --------------------------------------------------
# Setup
# --------------------------------------------------

load_dotenv()
logger = logging.getLogger("interrupt-agent")

# Passive backchannel words (configurable)
IGNORE_WORDS = {
"okay", "ok", "yeah", "hmm", "uh", "uhh", "uh-huh", "right", "aha"
}

# Explicit interruption commands
INTERRUPT_WORDS = {
"stop", "wait", "no", "pause", "hold"
}


def classify(text: str) -> str:
"""
Classify user input into:
- IGNORE: passive acknowledgement
- INTERRUPT: explicit command
- VALID: normal speech
"""
text = re.sub(r"[^\w\s]", "", text.lower().strip())
tokens = set(text.split())

# Semantic interruption always wins
if tokens & INTERRUPT_WORDS:
return "INTERRUPT"

# Pure filler only
if tokens and tokens.issubset(IGNORE_WORDS):
return "IGNORE"

return "VALID"


# --------------------------------------------------
# Agent
# --------------------------------------------------

class InterruptAwareAgent(Agent):
def __init__(self):
super().__init__(
instructions=(
"Your name is Kelly. You are a friendly voice assistant. "
"Speak clearly and naturally. English only."
)
)
self.speaking = False

async def on_audio_start(self):
self.speaking = True

async def on_audio_end(self):
self.speaking = False

async def on_enter(self):
self.speaking = True
self.session.generate_reply()

async def on_user_message(self, message: str):
decision = classify(message)

logger.info(
f"user_message='{message}' decision={decision} speaking={self.speaking}"
)

# --------------------------------------------------
# AGENT IS SPEAKING
# --------------------------------------------------
if self.speaking:

# Passive acknowledgement → FALSE INTERRUPTION
if decision == "IGNORE":
# Let LiveKit resume audio automatically
self.session.cancel_generation()
return

# Explicit command → HARD INTERRUPT
if decision == "INTERRUPT":
await self.session.stop_audio()
self.session.cancel_generation()
self.speaking = False
return

# Real speech → interrupt and respond
await self.session.stop_audio()
self.speaking = False
await self.session.generate_reply()
return

# --------------------------------------------------
# AGENT IS SILENT
# --------------------------------------------------
await self.session.generate_reply()


# --------------------------------------------------
# Server
# --------------------------------------------------

server = AgentServer()


def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
session = AgentSession(
stt="deepgram/nova-3",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-2",
vad=ctx.proc.userdata["vad"],

# 🔑 REQUIRED FOR FALSE-INTERRUPTION HANDLING
turn_detection=None,
preemptive_generation=False,
resume_false_interruption=True,
false_interruption_timeout=1.0,
)

await session.start(
agent=InterruptAwareAgent(),
room=ctx.room,
room_options=room_io.RoomOptions(
audio_input=room_io.AudioInputOptions()
),
)


if __name__ == "__main__":
cli.run_app(server)