TEN-framework · plutoless · Dec 15, 2025 · Dec 22, 2025 · Dec 28, 2025 · Jan 1, 2026
@@ -1296,7 +1296,7 @@
                 "api_key": "${env:OPENAI_API_KEY}",
                 "model": "gpt-5.1",
                 "max_tokens": 1000,
-                "prompt": "You are a mental wellness research assistant conducting a demonstration. Guide the conversation efficiently:\n\nWORD LIMITS:\n- Steps 1-4 (data gathering): MAX 15 WORDS per response\n- Step 6 (announcing results): MAX 15 WORDS of added context\n- Step 7 (therapeutic conversation): MAX 30 WORDS per response\n\n1. When user provides their name, sex, and year of birth, IMMEDIATELY respond warmly asking about their day. If they don't provide all three pieces, ask for what's missing before proceeding. (MAX 15 WORDS)\n\n2. Ask: 'Tell me about your interests and hobbies.' (wait for response - aim for 20+ seconds total speech) (MAX 15 WORDS)\n\n3. Continue with follow-up questions about their experiences, feelings, or daily life. Keep them talking naturally. (MAX 15 WORDS)\n\n4. Before announcing results, MUST call check_phase_progress(name, year_of_birth, sex) to verify enough speech has been collected AND register user info. Based on the result:\n   - If phase_complete=false: Ask another question to gather more speech (MAX 15 WORDS)\n   - If phase_complete=true: Say 'Perfect. I'm processing your responses now, this should take around 15 seconds.' (MAX 15 WORDS)\n   - NEVER say 'processing your responses' without first confirming phase_complete=true\n\n5. You will receive a [SYSTEM ALERT] message when wellness metrics are ready.\n   - CRITICAL: Only respond to [SYSTEM ALERT] messages that are actually sent to you\n   - NEVER generate or say '[SYSTEM ALERT]' yourself - these come from the system only\n\n6. When you receive '[SYSTEM ALERT] Wellness metrics ready':\n   - Call get_wellness_metrics\n   - Announce the 5 wellness metrics (stress, distress, burnout, fatigue, low_self_esteem) as PERCENTAGES 0-100\n   - Use plain numbered lists only (NO markdown **, *, _ formatting)\n   - Keep any added context to MAX 15 WORDS\n   - After announcing results, silently call confirm_announcement with phase='hellos'\n\n7. THERAPEUTIC CONVERSATION - After results announced:\n   - First, let user know: \"Feel free to chat as long as you like, or say goodbye whenever you're ready to end.\"\n   - Focus on building resilience and reducing stress/anxiety with these evidence-based strategies:\n     a) Reframe setbacks as information - get curious about what you can learn, focus on what's in your control\n     b) Build tolerance for discomfort - small challenges (difficult conversations, sitting with boredom) build evidence you can handle hard things\n     c) Invest in your foundation - sleep, movement, and genuine connection with trusted people\n   - Frame as research-based insights, not clinical diagnosis\n   - MAX 30 WORDS per response\n   - Use warm, empathetic tone\n   - Ask open questions to explore their experiences\n\n8. When user indicates they want to end, thank them warmly for participating\n\nNote: We need 20 seconds total speech for wellness analysis (no reading phase required).",
+                "prompt": "You are a mental wellness research assistant conducting a demonstration. Guide the conversation efficiently:\n\nWORD LIMITS:\n- Steps 1-4 (data gathering): MAX 15 WORDS per response\n- Step 6 (announcing results): MAX 15 WORDS of added context\n- Step 7 (therapeutic conversation): MAX 30 WORDS per response\n\n1. When user provides their name, sex, and year of birth, IMMEDIATELY respond warmly asking about their day. If they don't provide all three pieces, ask for what's missing before proceeding. (MAX 15 WORDS)\n\n2. Ask: 'Tell me about your interests and hobbies.' (wait for response - aim for 10+ seconds total speech) (MAX 15 WORDS)\n\n3. Continue with follow-up questions about their experiences, feelings, or daily life. Keep them talking naturally. (MAX 15 WORDS)\n\n4. Before announcing results, MUST call check_phase_progress(name, year_of_birth, sex) to verify enough speech has been collected AND register user info. Based on the result:\n   - If phase_complete=false: Ask another question to gather more speech (MAX 15 WORDS)\n   - If phase_complete=true: Say 'Perfect. I'm processing your responses now, this should take around 10 seconds.' (MAX 15 WORDS)\n   - NEVER say 'processing your responses' without first confirming phase_complete=true\n\n5. You will receive a [SYSTEM ALERT] message when wellness metrics are ready.\n   - CRITICAL: Only respond to [SYSTEM ALERT] messages that are actually sent to you\n   - NEVER generate or say '[SYSTEM ALERT]' yourself - these come from the system only\n\n6. When you receive '[SYSTEM ALERT] Wellness metrics ready':\n   - Call get_wellness_metrics\n   - Announce the 5 wellness metrics (stress, distress, burnout, fatigue, low_self_esteem) as PERCENTAGES 0-100\n   - Use plain numbered lists only (NO markdown **, *, _ formatting)\n   - Keep any added context to MAX 15 WORDS\n   - After announcing results, silently call confirm_announcement with phase='hellos'\n\n7. THERAPEUTIC CONVERSATION - After results announced:\n   - First, let user know: \"Feel free to chat as long as you like, or say goodbye whenever you're ready to end.\"\n   - Focus on building resilience and reducing stress/anxiety with these evidence-based strategies:\n     a) Reframe setbacks as information - get curious about what you can learn, focus on what's in your control\n     b) Build tolerance for discomfort - small challenges (difficult conversations, sitting with boredom) build evidence you can handle hard things\n     c) Invest in your foundation - sleep, movement, and genuine connection with trusted people\n   - Frame as research-based insights, not clinical diagnosis\n   - MAX 30 WORDS per response\n   - Use warm, empathetic tone\n   - Ask open questions to explore their experiences\n\n8. When user indicates they want to end, thank them warmly for participating\n\nNote: We need 10 seconds total speech for wellness analysis (no reading phase required).",
                 "proxy_url": "${env:OPENAI_PROXY_URL|}",
                 "greeting": "Hi there! I would like to chat with you briefly to predict your mood and energy levels including stress, fatigue, and burnout. Nothing will be recorded and this is purely a demonstration. Please begin by telling me your name, sex and year of birth.",
                 "max_memory_length": 10,
@@ -1338,7 +1338,7 @@
               "property": {
                 "api_key": "${env:THYMIA_API_KEY}",
                 "analysis_mode": "hellos_only",
-                "min_speech_duration": 20.0
+                "min_speech_duration": 10.0
               }
             },
             {
@@ -1541,6 +1541,9 @@
                   "dest": [
                     {
                       "extension": "thymia_analyzer"
+                    },
+                    {
+                      "extension": "avatar"
                     }
                   ]
                 }

@@ -71,6 +71,9 @@ def __init__(self, name: str):
         self._audio_task = None
         self._config_valid = False  # Track configuration validation status
         self._connection_task = None
+        self._voice_end_sent_for_request = (
+            None  # Track request_id that voice_end was sent for
+        )
 
     async def on_init(self, ten_env: AsyncTenEnv) -> None:
         ten_env.log_debug("on_init")
@@ -225,8 +228,13 @@ async def _handle_interrupt(self) -> None:
         self.ten_env.log_info("Handling interrupt")
         await self._clear_audio_queue()
 
-        # Send interrupt command
         if self.recorder and self.recorder.ws_connected():
+            # Send voice_end BEFORE interrupt (per Anam requirements)
+            # Always send voice_end on interrupt since user interrupted mid-speech
+            await self.recorder.send_voice_end()
+            self.ten_env.log_info("Sent voice_end before interrupt")
+
+            # Then send interrupt command
             success = await self.recorder.interrupt()
             if success:
                 self.ten_env.log_info(
@@ -315,11 +323,18 @@ async def on_data(self, ten_env: AsyncTenEnv, data: Data) -> None:
 
                 # reason=1 means TTS generation complete (all audio sent to avatar)
                 if reason == 1:
-                    ten_env.log_info(
-                        "[ANAM_TTS_END] TTS complete - sending voice_end to Anam"
-                    )
                     if self.recorder and self.recorder.ws_connected():
-                        await self.recorder.send_voice_end()
+                        # Only send voice_end if we haven't already for this request
+                        if self._voice_end_sent_for_request != request_id:
+                            ten_env.log_info(
+                                f"[ANAM_TTS_END] TTS complete - sending voice_end for {request_id}"
+                            )
+                            await self.recorder.send_voice_end()
+                            self._voice_end_sent_for_request = request_id
+                        else:
+                            ten_env.log_info(
+                                f"[ANAM_TTS_END] voice_end already sent for {request_id}, skipping"
+                            )
                     else:
                         ten_env.log_warn(
                             "[ANAM_TTS_END] Recorder not ready, cannot send voice_end"

@@ -71,6 +71,9 @@ def __init__(self, name: str):
         self._audio_task = None
         self._config_valid = False  # Track configuration validation status
         self._connection_task = None
+        self._voice_end_sent_for_request = (
+            None  # Track request_id that voice_end was sent for
+        )
 
     async def on_init(self, ten_env: AsyncTenEnv) -> None:
         ten_env.log_debug("on_init")
@@ -179,8 +182,13 @@ async def _handle_interrupt(self) -> None:
         self.ten_env.log_info("Handling interrupt")
         await self._clear_audio_queue()
 
-        # Send interrupt command
         if self.recorder and self.recorder.ws_connected():
+            # Send voice_end BEFORE interrupt (required by video avatar services)
+            # Always send voice_end on interrupt since user interrupted mid-speech
+            await self.recorder.send_voice_end()
+            self.ten_env.log_info("Sent voice_end before interrupt")
+
+            # Then send interrupt command
             success = await self.recorder.interrupt()
             if success:
                 self.ten_env.log_info(
@@ -269,11 +277,18 @@ async def on_data(self, ten_env: AsyncTenEnv, data: Data) -> None:
 
                 # reason=1 means TTS generation complete (all audio sent to avatar)
                 if reason == 1:
-                    ten_env.log_info(
-                        "[GENERIC_TTS_END] TTS complete - sending voice_end"
-                    )
                     if self.recorder and self.recorder.ws_connected():
-                        await self.recorder.send_voice_end()
+                        # Only send voice_end if we haven't already for this request
+                        if self._voice_end_sent_for_request != request_id:
+                            ten_env.log_info(
+                                f"[GENERIC_TTS_END] TTS complete - sending voice_end for {request_id}"
+                            )
+                            await self.recorder.send_voice_end()
+                            self._voice_end_sent_for_request = request_id
+                        else:
+                            ten_env.log_info(
+                                f"[GENERIC_TTS_END] voice_end already sent for {request_id}, skipping"
+                            )
                     else:
                         ten_env.log_warn(
                             "[GENERIC_TTS_END] Recorder not ready, cannot send voice_end"