Skip to content

Fix/animations#308

Merged
XargonWan merged 45 commits intodevelopfrom
fix/animations
Apr 22, 2026
Merged

Fix/animations#308
XargonWan merged 45 commits intodevelopfrom
fix/animations

Conversation

@XargonWan
Copy link
Copy Markdown
Owner

This pull request introduces several improvements and new features to the animation handling and WebUI integration, with a focus on supporting touch interactions, enhancing animation state fidelity, and improving the user experience during VRM model loading. The changes span backend logic, API endpoints, frontend handling, and test coverage.

Animation State and Touch Interaction Enhancements:

  • Added support for a new TOUCH animation state, including context ID and priority management, and a dedicated handler for touch interactions from the WebUI. This enables authoritative touch-triggered animations with proper fallback and metadata handling. [1] [2] [3] [4]
  • Extended animation state payloads (REST and WebSocket) to include new metadata fields: play_section, frame_range, phase_authoritative, and animation_state, improving frontend fidelity and recovery after reloads. [1] [2] [3]

WebUI and Frontend Improvements:

  • Implemented a VRM loading overlay with a spinner and label, and corresponding CSS for a smoother user experience while the avatar and animations are loading. [1] [2]
  • Updated frontend JavaScript (chat-window.mjs, webui-bootstrap.js) to handle new animation state fields, cache richer animation state for recovery, and support the extended startAction signature. [1] [2] [3]
  • Updated script references to the latest vrm-viewer.mjs version in HTML templates. [1] [2] [3]

Backend Robustness and Testing:

  • Ensured face/animation state is cleared on LLM fallback to prevent stale expressions after upstream outages.
  • Added and improved tests for animation fallback logic, state broadcasting, and message formats, increasing reliability and correctness of animation state management. [1] [2] [3] [4] [5] [6]

These changes collectively make the animation system more robust, interactive, and user-friendly, especially when handling touch events and recovering from errors.

References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

XargonWan and others added 30 commits February 19, 2026 08:57
- Update all docs/gemini/*.md to latest API surface (3.1 Flash Live,
  liveapi capabilities, live session management, live tools, deprecations)
- Add fetch_gemini_docs.py helper script
- Misc improvements across core/ (action_parser, config, message_queue,
  plugin_instance, prompt_engine) and engines/external_engines/gemini_api
- Update grillo outreach plugin and telegram_bot interface
- Refresh test_live_session_manager and uv.lock

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

# Conflicts:
#	uv.lock
Introduces a full engine-agnostic tool call pipeline for Live sessions,
replacing the Gemini-locked _build_gemini_tool_declarations approach.

New modules:
- core/live_tool_registry.py: ToolManifest + LiveToolRegistry — reads
  get_action_plugin_instructions() into model-agnostic manifests
- core/live_tool_executor.py: LiveToolExecutor — dispatches TOOL_CALL
  LiveEvents to run_action with per-tool timeout and bot resolution
- core/live_tool_adapters/gemini.py: GeminiToolAdapter — manifests →
  genai.types.Tool; honours async_ok + engine_supports_nonblocking for
  Gemini 2.5 NON_BLOCKING mode
- core/live_tool_adapters/openai_realtime.py: stub adapter for future
  OpenAI Realtime engine (standard JSON Schema format)
- docs/gemini/live-tool-calls-plan.md: architecture plan and migration
  phases (5-step, no breaking changes)

Updated:
- plugins/live_base.py: add TOOL_CALL LiveEventType, ToolCallPayload
  dataclass, tool_call field on LiveEvent, send_tool_response() on
  LiveEngineBase
- plugins/live_engines/gemini.py: replace stub with full implementation
  (_pump_events, send_audio, send_text, send_tool_response,
  receive_events, open/close_session)
- core/live_session_manager.py: add history_config
  (initial_history_in_client_content=True) to enable send_client_content
  on Gemini 3.1; add inject_initial_context() for multimodal history
  seeding; extend send_multimodal_context() with multiple-attachment
  support; add set_tool_executor() + send_tool_response(); wire
  LiveToolExecutor into _receive_loop (legacy callback kept as fallback)
- interface/discord_interface.py: swap _build_gemini_tool_declarations
  for LiveToolRegistry + GeminiToolAdapter; register LiveToolExecutor
  via manager.set_tool_executor

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Implement Google's recommended approach for long-running Gemini Live sessions:
- SessionResumptionConfig passed in every LiveConnectConfig; server sends
  SessionResumptionUpdate messages with a handle the manager stores in
  LiveSessionState.resumption_handle
- GoAway messages set _go_away_triggered; _receive_loop triggers a proactive
  reconnect at the next turn boundary instead of waiting for the socket to die
- _reconnect_inner has two paths: fast-path uses the stored handle so the new
  WebSocket resumes the previous context window without a cold restart; fallback
  rebuilds system instruction + tools from scratch if resumption exhausts retries
- ContextWindowCompressionConfig with SlidingWindow keeps audio-only sessions
  alive beyond the raw 15-min token limit (officially unlimited with compression)
- start_session accepts resumption_handle kwarg; None = fresh session, str = resume

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…rovements

- ai_diary.py: Fix SQL format string bug in diary consolidation

- vox_plugin.py: Fix TTS fallback suppression when auto-injected

- message_queue.py: Fix 'bound to different event loop' via global _queue/_lock fixes

- live_session_manager.py: Add cold-restart history injection, turn_was_interrupted

- discord_interface.py: Add bot-left VSU guard, image attachment injection

- config.py: Add LIVE_AUDIO_MIN_RMS config var
…loud

- Skip kick text on session resumption: Gemini restores context server-side
  so injecting text via send_realtime_input would trigger an unsolicited model
  turn before any audio, causing the model to speak system-level data aloud
- Strip numeric intensities from emotion NL in live system instruction:
  "devotion (5.0 - moderate)" → "moderate devotion" so there are no numbers
  for the model to accidentally read out
- Filter bot self-messages and internal agents (grillo, ai_diary) from live
  context updates in chat_history_cache to prevent feedback loops
- Cap diary blob length in history_engine and ai_diary to prevent 50k+ char
  merged daily entries from flooding the prompt context
- Elevate persona + verbose instructions to Gemini system instruction
- Suppress false-positive CryptoError log noise from discord-ext-voice-recv
  RTCP PSFB packets (known upstream limitation, non-fatal)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…x_api logging

- Remove spurious PluginBase from GeminiAPIPlugin inheritance (MRO/registry pollution)
- Guard _get_gemini_model() against None return from get_current_model()
- Filter thought parts (thought=true) from response before returning to message chain
- Add log_cortex_request/log_cortex_response to ExternalCortexEngine.generate_response
  so cortex_api.log is populated regardless of which cortex engine is active

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Add mcp_servers/synth_logs.py: stdio MCP server with four tools
  (list_log_files, search_logs, tail_log, get_recent_errors) that span
  log rotations transparently — logs rotate at 2000 lines in DEBUG mode
  so the active file alone is often nearly empty
- Wire synth-logs and gitnexus into .mcp.json (Claude Code/Antigravity),
  .vscode/mcp.json (Copilot), and document Cline entry for manual setup
- Unignore .vscode/mcp.json so MCP config is shared on clone
- Extend AGENTS.md with: first-time setup instructions, DB table quick
  reference (§13), config registry key reference (§14), and known issues
  registry (§12) with agent instruction to document new issues in-place
- Extend CLAUDE.md with: MCP tools reference, debugging SOP, token trap
  warnings, and known-issues documentation rule
- Add AI-assisted development section to README for contributors

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…/dislikes from config

- Prepend static_persona to instructions_verbose so all LLM engines see it

- Load SYNTH_LIKES/SYNTH_DISLIKES from config_registry in load_persona() instead of hardcoding []

- Skip empty Likes/Dislikes/Interests lines in get_static_injection()

- Add responseMimeType: application/json to Gemini REST generationConfig

- Add IDENTITY INTEGRITY and PRONOUN CONSISTENCY rules to chat instructions

- Fix history_engine verbosity slicing to apply per-group after dedup
When correction_context.successful_actions is empty the corrector was
building a 'PARTIAL SUCCESS - 0 actions succeeded' message that told
the LLM 'do NOT repeat successful ones' - a self-contradictory
instruction that caused the LLM to return empty string all 4 attempts,
exhausting retries and falling back to the dizzy emoji fallback.

Now when successful_actions is empty the corrector produces a plain
'CORRECTION NEEDED' prompt asking the LLM to resend the full response,
listing only the invalid actions and stating that emotion names must
not be used as action types.

Also added 'Every action object MUST have a type field' to
strict_requirements to address the root trigger: bare dicts without
a type key appearing in the actions array.

Partial-success path (>=1 successful action) is unchanged.
CLAUDE.md: bump gitnexus index counts to 7489 symbols / 24879
relationships (auto-updated by
px gitnexus analyze).

.claude/skills/gitnexus/: add six skill reference files generated by
gitnexus (exploring, debugging, impact-analysis, refactoring, guide,
cli) for use by AI agents navigating the codebase.
…rrector

- Add 'method' and 'command' to type-normalization in message_chain and transport_layer
- Add flat-field-to-payload gathering fallback via _ACTION_SYSTEM_KEYS frozenset
- Register meta.autonomous and introspection keys (thoughts, reasoning, etc.) as response metadata
- Strip {meta.autonomous: true} and similar meta tags from outbound text in emotion_manager
- Smart message-key handling in corrector_orchestrator: map bare 'message'/'text' keys to interface-specific message actions
- Strip emotion/meta tags from action payload text before dispatch in action_parser
…est rewrite (Phases 1-6)

Multimodal:
- Extract and forward multimodal attachments (images, audio, video, docs) in cortex_bridge
- Add safety settings (all 4 harm categories OFF) to gemini_adapter
- Preserve __prompt_request dataclass through sanitize_for_json in plugin_instance

Bidirectional API logging:
- Add sanitize_for_log() to cortex_api_logger for safe payload redaction
- Add _engine_label to BaseProtocolAdapter, set by all bridges
- Instrument all adapter methods (chat, stream, TTS, STT, vision) with REQUEST/RESPONSE logging
- Remove duplicate bridge-level logging from cortex_bridge

PromptRequest prompt rewrite (Phases 1-6):
- New core/prompt_request.py: PromptRequest, Turn, RuntimeContext, Attachment dataclasses
- New core/prompt_renderers.py: OpenAI, Anthropic, Gemini, Text renderers
- Attach PromptRequest to build_json_prompt output under __prompt_request key
- Add _history_to_turns, _build_context_summary, _assemble_prompt_request helpers
- Add build_delivery_request for auto_response delivery mode
- Extend LiveToolRegistry with filtering and build_manifests_from_actions
- Migrate openapi, openrouter, anthropic, gemini_api engines to PromptRequest native paths
- Add REWRITE-TASK.md documenting the full 8-phase plan

Tests:
- 13 multimodal extraction tests (test_cortex_bridge_multimodal.py)
- 32 prompt renderer tests (test_prompt_renderers.py)
- PromptRequest attachment and mode tests (test_prompt_engine.py)
- _history_to_turns role parsing tests
- Update gitnexus stats in AGENTS.md and CLAUDE.md (7701 symbols, 25509 relationships)
- Add tools/synth_log_mcp.py: FastMCP-based log query server for local development
…y LLM responses

- Pre-normalize [{'actions': [...]}], [{'tool_calls': [...]}], and [{'text': '...'}] single-element list wrappers by collapsing to inner dict
- Rename top-level 'tool_calls' key to 'actions' (Gemini synonym)
- Add text-only response handler for {'text': '...', 'meta': {...}} without actions array — infers message action type from context interface_path
- Propagate top-level meta (autonomous flag) to synthesized action
- Prevents correction loop timeouts that produced fallback emoji responses
…points

- Add _format_mm_part() helper that checks endpoint protocol to emit correct wire format
- Gemini endpoints get native inline_data format
- OpenAI/Anthropic/Custom endpoints get image_url data-URI format
- Fixes images being sent as unrecognized inline_data to OpenRouter and other OpenAI-compat providers
…document support

- Add supports_json_mode field to OpenRouterModel with API detection
- Enforce response_format: {type: json_object} when model supports it
- Add safety_settings (4 BLOCK_NONE categories) to both legacy and PromptRequest paths
- Inspect finish_reason for content_filter and length truncation
- Add document MIME type support (PDF, text, CSV, HTML, markdown)
- Expand vision MIME types with bmp, tiff, avif, heic, heif
…tex payloads

Grillo beats and diary merge beats now set skip_history=True in context_memory
so HistoryEngine returns an empty context rather than loading irrelevant chat
history into autonomous-prompt requests. Also adds allowed_action_types to
diary merge beats to constrain LLM output. log_cortex_request now calls
sanitize_for_log automatically so callers no longer need to pre-sanitize payloads.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Registers the affine-mcp stdio server in .mcp.json pointing to the
self-hosted AFFiNE instance at https://board.zwiz.town. Documents
one-time credential setup, usage guidelines, and tool reference in
AGENTS.md (§8a) and CLAUDE.md.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@XargonWan XargonWan marked this pull request as draft April 22, 2026 00:20
@XargonWan XargonWan marked this pull request as ready for review April 22, 2026 00:56
@XargonWan XargonWan merged commit c72601f into develop Apr 22, 2026
12 of 18 checks passed
@XargonWan XargonWan deleted the fix/animations branch April 22, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant