Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11
76 changes: 76 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,4 +372,80 @@ The Agents framework is under active development in a rapidly evolving field. We
<tr><td>Resources</td><td><a href="https://docs.livekit.io">Docs</a> · <a href="https://github.com/livekit-examples">Example apps</a> · <a href="https://livekit.io/cloud">Cloud</a> · <a href="https://docs.livekit.io/home/self-hosting/deployment">Self-hosting</a> · <a href="https://github.com/livekit/livekit-cli">CLI</a></td></tr>
</tbody>
</table>

## Context-Aware Interruption Handling (Assignment Solution)

This section documents the solution implemented for the **LiveKit Intelligent Interruption Handling** assignment.

---

## Overview

The goal of this implementation is to improve conversational flow by distinguishing between:

- **Passive acknowledgements** such as “yeah”, “ok”, “hmm”
- **Active interruptions** such as “stop”, “wait”, “no”

based on whether the agent is **currently speaking or silent**, without modifying the low-level Voice Activity Detection (VAD) system.

---

## How to Run the Agent

### 1. Set environment variables

```bash
setx OPENAI_API_KEY "<your_openai_key>"
setx DEEPGRAM_API_KEY "<your_deepgram_key>"

python examples/voice_agents/interrupt_agent.py dev
```

IGNORE_WORDS = {
"okay", "ok", "yeah", "hmm", "uh", "uhh", "uh-huh", "right", "aha"
}
```bash
async def on_audio_start(self):
self.speaking = True

async def on_audio_end(self):
self.speaking = False
```

## Supported Scenarios

✔ Agent ignores “yeah / ok” while speaking
✔ Agent responds to “yeah / ok” when silent
✔ Agent stops immediately on “stop”
✔ Mixed inputs like “yeah okay but wait” interrupt correctly

---

## Semantic Interruption Detection

User input is classified semantically:

- If an interruption keyword is present, the agent stops immediately
- Mixed inputs such as **“yeah okay but wait”** correctly trigger an interruption

---

## Handling False Interruptions (No VAD Modification)

Voice Activity Detection (VAD) operates faster than speech-to-text and may briefly interrupt agent audio before transcription is available.

Instead of modifying VAD (which is explicitly disallowed), this implementation:

- Allows VAD to trigger normally
- Identifies passive acknowledgements after transcription
- Treats them as false interruptions
- Automatically resumes the agent’s speech
- Suppresses unintended responses

This behavior is enabled using LiveKit’s built-in recovery mechanism:

```python
resume_false_interruption = True
false_interruption_timeout = 1.5

<!--END_REPO_NAV-->
3 changes: 0 additions & 3 deletions examples/.env.example

This file was deleted.

156 changes: 156 additions & 0 deletions examples/voice_agents/myagent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
import os
import re
import logging
from dotenv import load_dotenv

from livekit.agents import (
Agent,
AgentServer,
AgentSession,
JobContext,
JobProcess,
cli,
room_io,
)
from livekit.plugins import silero

# --------------------------------------------------
# Setup
# --------------------------------------------------

load_dotenv()
logger = logging.getLogger("interrupt-agent")

# Passive backchannel words (configurable)
IGNORE_WORDS = {
"okay", "ok", "yeah", "hmm", "uh", "uhh", "uh-huh", "right", "aha"
}

# Explicit interruption commands
INTERRUPT_WORDS = {
"stop", "wait", "no", "pause", "hold"
}


def classify(text: str) -> str:
"""
Classify user input into:
- IGNORE: passive acknowledgement
- INTERRUPT: explicit command
- VALID: normal speech
"""
text = re.sub(r"[^\w\s]", "", text.lower().strip())
tokens = set(text.split())

# Semantic interruption always wins
if tokens & INTERRUPT_WORDS:
return "INTERRUPT"

# Pure filler only
if tokens and tokens.issubset(IGNORE_WORDS):
return "IGNORE"

return "VALID"


# --------------------------------------------------
# Agent
# --------------------------------------------------

class InterruptAwareAgent(Agent):
def __init__(self):
super().__init__(
instructions=(
"Your name is Kelly. You are a friendly voice assistant. "
"Speak clearly and naturally. English only."
)
)
self.speaking = False

async def on_audio_start(self):
self.speaking = True

async def on_audio_end(self):
self.speaking = False

async def on_enter(self):
self.speaking = True
self.session.generate_reply()

async def on_user_message(self, message: str):
decision = classify(message)

logger.info(
f"user_message='{message}' decision={decision} speaking={self.speaking}"
)

# --------------------------------------------------
# AGENT IS SPEAKING
# --------------------------------------------------
if self.speaking:

# Passive acknowledgement → FALSE INTERRUPTION
if decision == "IGNORE":
# Let LiveKit resume audio automatically
self.session.cancel_generation()
return

# Explicit command → HARD INTERRUPT
if decision == "INTERRUPT":
await self.session.stop_audio()
self.session.cancel_generation()
self.speaking = False
return

# Real speech → interrupt and respond
await self.session.stop_audio()
self.speaking = False
await self.session.generate_reply()
return

# --------------------------------------------------
# AGENT IS SILENT
# --------------------------------------------------
await self.session.generate_reply()


# --------------------------------------------------
# Server
# --------------------------------------------------

server = AgentServer()


def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
session = AgentSession(
stt="deepgram/nova-3",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-2",
vad=ctx.proc.userdata["vad"],

# 🔑 REQUIRED FOR FALSE-INTERRUPTION HANDLING
turn_detection=None,
preemptive_generation=False,
resume_false_interruption=True,
false_interruption_timeout=1.0,
)

await session.start(
agent=InterruptAwareAgent(),
room=ctx.room,
room_options=room_io.RoomOptions(
audio_input=room_io.AudioInputOptions()
),
)


if __name__ == "__main__":
cli.run_app(server)
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ livekit-plugins-aws = { workspace = true }
livekit-plugins-azure = { workspace = true }
livekit-plugins-baseten = { workspace = true }
livekit-plugins-bey = { workspace = true }
livekit-plugins-bithuman = { workspace = true }

livekit-plugins-cartesia = { workspace = true }
livekit-plugins-clova = { workspace = true }
livekit-plugins-deepgram = { workspace = true }
Expand Down Expand Up @@ -53,6 +53,7 @@ members = ["livekit-plugins/*", "livekit-agents"]
exclude = [
"livekit-plugins/livekit-plugins-browser",
"livekit-plugins/livekit-blingfire",
"livekit-plugins/livekit-plugins-bithuman",
]

[dependency-groups]
Expand Down
Loading