Skip to content

Conversation

@plutoless
Copy link
Contributor

No description provided.

Ubuntu added 2 commits December 15, 2025 17:40
Avatar extensions (anam, generic, heygen):
- Replace 500ms debounce timer with tts_audio_end event handling
- Add send_voice_end() method triggered by tts_audio_end (reason=1)
- Remove dead code from heygen (unused queues, is_speaking timeout)
- Fix graph routing: tts_audio_end now routes to avatar (was only thymia)

Deepgram WebSocket ASR:
- Add silence sender for EOT detection when mic muted
- Add auto-reconnect with exponential backoff
- Add keep-alive silence to prevent WebSocket timeout

Playground:
- Add URL param support: ?graph=name pre-selects graph on load
- Send voice_end BEFORE interrupt command (per Anam requirements)
- Add _voice_end_sent_for_request tracking to prevent duplicate sends
- Applied to: anam_avatar, generic_video extensions
@plutoless plutoless requested a review from halajohn as a code owner December 23, 2025 03:04
Resolved conflicts:
- anam_avatar_python/extension.py: kept request ID deduplication
- generic_video_python/extension.py: kept request ID deduplication
- property.json: regenerated from rebuild_property.py
@claude
Copy link

claude bot commented Dec 28, 2025

Pull Request Review: Fix/anam heygen

Summary

This PR fixes avatar video streaming issues by preventing duplicate voice_end messages and ensuring proper message ordering during interrupts. It also optimizes wellness analysis timing parameters.


Code Quality & Best Practices ✅

Strengths:

  • Clean implementation with appropriate state tracking using _voice_end_sent_for_request
  • Good code reuse - identical pattern applied to both anam_avatar_python and generic_video_python extensions
  • Proper logging added to track voice_end deduplication and interrupt ordering
  • Comments clearly explain the business logic (e.g., "per Anam requirements", "reason=1 means TTS generation complete")

Suggestions:

  1. State Management (anam_avatar_python/extension.py:74-76, generic_video_python/extension.py:74-76):

    • Using None to track request_id might cause issues if a request actually has None as its ID
    • Consider using a more explicit approach:
    self._voice_end_sent_for_request = set()  # Track all request_ids that voice_end was sent for

    Then check with if request_id not in self._voice_end_sent_for_request:

  2. State Reset - The _voice_end_sent_for_request variable is never reset. Consider adding cleanup:

    • Reset in on_stop() method
    • Consider bounded tracking (e.g., using collections.deque with maxlen) to prevent memory growth in long-running sessions

Potential Bugs & Issues ⚠️

  1. Race Condition Risk (anam_avatar_python/extension.py:232-235, generic_video_python/extension.py:186-189):

    • During interrupt handling, send_voice_end() is called without checking/updating _voice_end_sent_for_request
    • This could lead to duplicate voice_end messages if an interrupt occurs right after TTS completion
    • Recommendation: Track and skip in interrupt handler too:
    if self._voice_end_sent_for_request != request_id:  # or use set approach
        await self.recorder.send_voice_end()
        self._voice_end_sent_for_request = request_id
        self.ten_env.log_info("Sent voice_end before interrupt")
  2. request_id Availability in Interrupt (both extensions):

    • The interrupt handler always sends voice_end, but doesn't have access to request_id for tracking
    • This means the state variable cannot be properly maintained during interrupts
    • Recommendation: Store the current request_id as instance variable when processing TTS data, so interrupt handler can access it
  3. Missing Import - Not visible in diff, but ensure numpy import (line 9 in anam_avatar_python) is actually used in the file


Configuration Changes 🔧

property.json changes (ai_agents/agents/examples/voice-assistant-advanced/tenapp/property.json):

  1. Speech duration reduction (20s → 10s):

    • Line 1299: Prompt changed from "20+ seconds" to "10+ seconds"
    • Line 1304: Processing time message changed from "15 seconds" to "10 seconds"
    • Line 1341: min_speech_duration changed from 20.0 to 10.0
    • Line 1308: Prompt note changed from "20 seconds total speech" to "10 seconds"

    Impact: This improves user experience with faster analysis, but ensure the wellness analysis algorithm still works accurately with only 10 seconds of speech data.

  2. New routing (Line 1544-1546):

    • Added avatar extension as destination for thymia_analyzer
    • Question: Is this routing addition related to the voice_end fix? If so, please clarify the connection in the PR description.

Performance Considerations ⚡

  1. Minimal Overhead: The deduplication check is O(1) with current implementation - good
  2. Memory Consideration: As mentioned above, tracking request IDs indefinitely could grow memory. Consider cleanup strategy.
  3. Logging Performance: Multiple log statements per request - acceptable for debugging, but consider using log_debug for production if this becomes noisy

Security Concerns 🔒

No significant security concerns identified. The changes are internal state management and don't expose new attack surfaces.


Test Coverage 📋

Missing Test Coverage:

  • No test files found for either extension
  • Critical test scenarios needed:
    1. Duplicate voice_end prevention when TTS completes normally
    2. Interrupt handler sending voice_end before interrupt
    3. Race condition: TTS completion immediately followed by interrupt
    4. State cleanup on extension stop/restart
    5. Multiple sequential requests with different request_ids

Recommendation: Add integration tests to verify:

async def test_no_duplicate_voice_end():
    # Send tts_audio_end with reason=1 twice for same request_id
    # Verify send_voice_end() called only once
    
async def test_interrupt_sends_voice_end_first():
    # Trigger interrupt during active TTS
    # Verify voice_end sent before interrupt command
    
async def test_state_cleanup_on_stop():
    # Verify _voice_end_sent_for_request reset on on_stop()

Recommendations Summary

Must Fix:

  1. Address race condition in interrupt handler by tracking request_id properly
  2. Implement state cleanup to prevent memory growth

Should Fix:

  1. Use set-based tracking instead of single value for request_id
  2. Add test coverage for the new deduplication logic

Nice to Have:

  1. Clarify the relationship between voice_end fixes and the avatar routing change
  2. Document the 20s→10s speech duration change rationale

Overall Assessment

The core fix is sound and addresses a real issue with duplicate messages. The implementation pattern is consistent across both extensions. However, the race condition risk and lack of test coverage are concerns that should be addressed before merging.

Recommendation: Request changes to address the race condition and consider adding basic test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants