Fix/anam heygen #1911

plutoless · 2025-12-23T03:04:52Z

No description provided.

Avatar extensions (anam, generic, heygen): - Replace 500ms debounce timer with tts_audio_end event handling - Add send_voice_end() method triggered by tts_audio_end (reason=1) - Remove dead code from heygen (unused queues, is_speaking timeout) - Fix graph routing: tts_audio_end now routes to avatar (was only thymia) Deepgram WebSocket ASR: - Add silence sender for EOT detection when mic muted - Add auto-reconnect with exponential backoff - Add keep-alive silence to prevent WebSocket timeout Playground: - Add URL param support: ?graph=name pre-selects graph on load

- Send voice_end BEFORE interrupt command (per Anam requirements) - Add _voice_end_sent_for_request tracking to prevent duplicate sends - Applied to: anam_avatar, generic_video extensions

Resolved conflicts: - anam_avatar_python/extension.py: kept request ID deduplication - generic_video_python/extension.py: kept request ID deduplication - property.json: regenerated from rebuild_property.py

claude · 2025-12-28T12:59:28Z

Pull Request Review: Fix/anam heygen

Summary

This PR fixes avatar video streaming issues by preventing duplicate voice_end messages and ensuring proper message ordering during interrupts. It also optimizes wellness analysis timing parameters.

Code Quality & Best Practices ✅

Strengths:

Clean implementation with appropriate state tracking using _voice_end_sent_for_request
Good code reuse - identical pattern applied to both anam_avatar_python and generic_video_python extensions
Proper logging added to track voice_end deduplication and interrupt ordering
Comments clearly explain the business logic (e.g., "per Anam requirements", "reason=1 means TTS generation complete")

Suggestions:

State Management (anam_avatar_python/extension.py:74-76, generic_video_python/extension.py:74-76):
- Using None to track request_id might cause issues if a request actually has None as its ID
- Consider using a more explicit approach:
```
self._voice_end_sent_for_request = set()  # Track all request_ids that voice_end was sent for
```
Then check with if request_id not in self._voice_end_sent_for_request:
State Reset - The _voice_end_sent_for_request variable is never reset. Consider adding cleanup:
- Reset in on_stop() method
- Consider bounded tracking (e.g., using collections.deque with maxlen) to prevent memory growth in long-running sessions

Potential Bugs & Issues ⚠️

Race Condition Risk (anam_avatar_python/extension.py:232-235, generic_video_python/extension.py:186-189):
- During interrupt handling, send_voice_end() is called without checking/updating _voice_end_sent_for_request
- This could lead to duplicate voice_end messages if an interrupt occurs right after TTS completion
- Recommendation: Track and skip in interrupt handler too:
```
if self._voice_end_sent_for_request != request_id:  # or use set approach
    await self.recorder.send_voice_end()
    self._voice_end_sent_for_request = request_id
    self.ten_env.log_info("Sent voice_end before interrupt")
```
request_id Availability in Interrupt (both extensions):
- The interrupt handler always sends voice_end, but doesn't have access to request_id for tracking
- This means the state variable cannot be properly maintained during interrupts
- Recommendation: Store the current request_id as instance variable when processing TTS data, so interrupt handler can access it
Missing Import - Not visible in diff, but ensure numpy import (line 9 in anam_avatar_python) is actually used in the file

Configuration Changes 🔧

property.json changes (ai_agents/agents/examples/voice-assistant-advanced/tenapp/property.json):

Speech duration reduction (20s → 10s):
- Line 1299: Prompt changed from "20+ seconds" to "10+ seconds"
- Line 1304: Processing time message changed from "15 seconds" to "10 seconds"
- Line 1341: min_speech_duration changed from 20.0 to 10.0
- Line 1308: Prompt note changed from "20 seconds total speech" to "10 seconds"
Impact: This improves user experience with faster analysis, but ensure the wellness analysis algorithm still works accurately with only 10 seconds of speech data.
New routing (Line 1544-1546):
- Added avatar extension as destination for thymia_analyzer
- Question: Is this routing addition related to the voice_end fix? If so, please clarify the connection in the PR description.

Performance Considerations ⚡

Minimal Overhead: The deduplication check is O(1) with current implementation - good
Memory Consideration: As mentioned above, tracking request IDs indefinitely could grow memory. Consider cleanup strategy.
Logging Performance: Multiple log statements per request - acceptable for debugging, but consider using log_debug for production if this becomes noisy

Security Concerns 🔒

No significant security concerns identified. The changes are internal state management and don't expose new attack surfaces.

Test Coverage 📋

Missing Test Coverage:

No test files found for either extension
Critical test scenarios needed:
1. Duplicate voice_end prevention when TTS completes normally
2. Interrupt handler sending voice_end before interrupt
3. Race condition: TTS completion immediately followed by interrupt
4. State cleanup on extension stop/restart
5. Multiple sequential requests with different request_ids

Recommendation: Add integration tests to verify:

async def test_no_duplicate_voice_end():
    # Send tts_audio_end with reason=1 twice for same request_id
    # Verify send_voice_end() called only once
    
async def test_interrupt_sends_voice_end_first():
    # Trigger interrupt during active TTS
    # Verify voice_end sent before interrupt command
    
async def test_state_cleanup_on_stop():
    # Verify _voice_end_sent_for_request reset on on_stop()

Recommendations Summary

Must Fix:

Address race condition in interrupt handler by tracking request_id properly
Implement state cleanup to prevent memory growth

Should Fix:

Use set-based tracking instead of single value for request_id
Add test coverage for the new deduplication logic

Nice to Have:

Clarify the relationship between voice_end fixes and the avatar routing change
Document the 20s→10s speech duration change rationale

Overall Assessment

The core fix is sound and addresses a real issue with duplicate messages. The implementation pattern is consistent across both extensions. However, the race condition risk and lack of test coverage are concerns that should be addressed before merging.

Recommendation: Request changes to address the race condition and consider adding basic test coverage.

Ubuntu added 2 commits December 15, 2025 17:40

fix: anam and generic_video voice_end handling

9cc1b77

- Send voice_end BEFORE interrupt command (per Anam requirements) - Add _voice_end_sent_for_request tracking to prevent duplicate sends - Applied to: anam_avatar, generic_video extensions

plutoless requested a review from halajohn as a code owner December 23, 2025 03:04

chore: merge origin/main into fix/anam-heygen

2da7fd9

Resolved conflicts: - anam_avatar_python/extension.py: kept request ID deduplication - generic_video_python/extension.py: kept request ID deduplication - property.json: regenerated from rebuild_property.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/anam heygen #1911

Fix/anam heygen #1911

Uh oh!

plutoless commented Dec 23, 2025

Uh oh!

claude bot commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix/anam heygen #1911

Are you sure you want to change the base?

Fix/anam heygen #1911

Uh oh!

Conversation

plutoless commented Dec 23, 2025

Uh oh!

claude bot commented Dec 28, 2025

Pull Request Review: Fix/anam heygen

Summary

Code Quality & Best Practices ✅

Potential Bugs & Issues ⚠️

Configuration Changes 🔧

Performance Considerations ⚡

Security Concerns 🔒

Test Coverage 📋

Recommendations Summary

Overall Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants