Merged
Conversation
…Heart into Scarlet-Raine-develop
- Update all docs/gemini/*.md to latest API surface (3.1 Flash Live, liveapi capabilities, live session management, live tools, deprecations) - Add fetch_gemini_docs.py helper script - Misc improvements across core/ (action_parser, config, message_queue, plugin_instance, prompt_engine) and engines/external_engines/gemini_api - Update grillo outreach plugin and telegram_bot interface - Refresh test_live_session_manager and uv.lock Co-Authored-By: Claude Sonnet 4.6 <[email protected]> # Conflicts: # uv.lock
Introduces a full engine-agnostic tool call pipeline for Live sessions, replacing the Gemini-locked _build_gemini_tool_declarations approach. New modules: - core/live_tool_registry.py: ToolManifest + LiveToolRegistry — reads get_action_plugin_instructions() into model-agnostic manifests - core/live_tool_executor.py: LiveToolExecutor — dispatches TOOL_CALL LiveEvents to run_action with per-tool timeout and bot resolution - core/live_tool_adapters/gemini.py: GeminiToolAdapter — manifests → genai.types.Tool; honours async_ok + engine_supports_nonblocking for Gemini 2.5 NON_BLOCKING mode - core/live_tool_adapters/openai_realtime.py: stub adapter for future OpenAI Realtime engine (standard JSON Schema format) - docs/gemini/live-tool-calls-plan.md: architecture plan and migration phases (5-step, no breaking changes) Updated: - plugins/live_base.py: add TOOL_CALL LiveEventType, ToolCallPayload dataclass, tool_call field on LiveEvent, send_tool_response() on LiveEngineBase - plugins/live_engines/gemini.py: replace stub with full implementation (_pump_events, send_audio, send_text, send_tool_response, receive_events, open/close_session) - core/live_session_manager.py: add history_config (initial_history_in_client_content=True) to enable send_client_content on Gemini 3.1; add inject_initial_context() for multimodal history seeding; extend send_multimodal_context() with multiple-attachment support; add set_tool_executor() + send_tool_response(); wire LiveToolExecutor into _receive_loop (legacy callback kept as fallback) - interface/discord_interface.py: swap _build_gemini_tool_declarations for LiveToolRegistry + GeminiToolAdapter; register LiveToolExecutor via manager.set_tool_executor Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Implement Google's recommended approach for long-running Gemini Live sessions: - SessionResumptionConfig passed in every LiveConnectConfig; server sends SessionResumptionUpdate messages with a handle the manager stores in LiveSessionState.resumption_handle - GoAway messages set _go_away_triggered; _receive_loop triggers a proactive reconnect at the next turn boundary instead of waiting for the socket to die - _reconnect_inner has two paths: fast-path uses the stored handle so the new WebSocket resumes the previous context window without a cold restart; fallback rebuilds system instruction + tools from scratch if resumption exhausts retries - ContextWindowCompressionConfig with SlidingWindow keeps audio-only sessions alive beyond the raw 15-min token limit (officially unlimited with compression) - start_session accepts resumption_handle kwarg; None = fresh session, str = resume Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…rovements - ai_diary.py: Fix SQL format string bug in diary consolidation - vox_plugin.py: Fix TTS fallback suppression when auto-injected - message_queue.py: Fix 'bound to different event loop' via global _queue/_lock fixes - live_session_manager.py: Add cold-restart history injection, turn_was_interrupted - discord_interface.py: Add bot-left VSU guard, image attachment injection - config.py: Add LIVE_AUDIO_MIN_RMS config var
…loud - Skip kick text on session resumption: Gemini restores context server-side so injecting text via send_realtime_input would trigger an unsolicited model turn before any audio, causing the model to speak system-level data aloud - Strip numeric intensities from emotion NL in live system instruction: "devotion (5.0 - moderate)" → "moderate devotion" so there are no numbers for the model to accidentally read out - Filter bot self-messages and internal agents (grillo, ai_diary) from live context updates in chat_history_cache to prevent feedback loops - Cap diary blob length in history_engine and ai_diary to prevent 50k+ char merged daily entries from flooding the prompt context - Elevate persona + verbose instructions to Gemini system instruction - Suppress false-positive CryptoError log noise from discord-ext-voice-recv RTCP PSFB packets (known upstream limitation, non-fatal) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…x_api logging - Remove spurious PluginBase from GeminiAPIPlugin inheritance (MRO/registry pollution) - Guard _get_gemini_model() against None return from get_current_model() - Filter thought parts (thought=true) from response before returning to message chain - Add log_cortex_request/log_cortex_response to ExternalCortexEngine.generate_response so cortex_api.log is populated regardless of which cortex engine is active Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Add mcp_servers/synth_logs.py: stdio MCP server with four tools (list_log_files, search_logs, tail_log, get_recent_errors) that span log rotations transparently — logs rotate at 2000 lines in DEBUG mode so the active file alone is often nearly empty - Wire synth-logs and gitnexus into .mcp.json (Claude Code/Antigravity), .vscode/mcp.json (Copilot), and document Cline entry for manual setup - Unignore .vscode/mcp.json so MCP config is shared on clone - Extend AGENTS.md with: first-time setup instructions, DB table quick reference (§13), config registry key reference (§14), and known issues registry (§12) with agent instruction to document new issues in-place - Extend CLAUDE.md with: MCP tools reference, debugging SOP, token trap warnings, and known-issues documentation rule - Add AI-assisted development section to README for contributors Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…/dislikes from config - Prepend static_persona to instructions_verbose so all LLM engines see it - Load SYNTH_LIKES/SYNTH_DISLIKES from config_registry in load_persona() instead of hardcoding [] - Skip empty Likes/Dislikes/Interests lines in get_static_injection() - Add responseMimeType: application/json to Gemini REST generationConfig - Add IDENTITY INTEGRITY and PRONOUN CONSISTENCY rules to chat instructions - Fix history_engine verbosity slicing to apply per-group after dedup
When correction_context.successful_actions is empty the corrector was building a 'PARTIAL SUCCESS - 0 actions succeeded' message that told the LLM 'do NOT repeat successful ones' - a self-contradictory instruction that caused the LLM to return empty string all 4 attempts, exhausting retries and falling back to the dizzy emoji fallback. Now when successful_actions is empty the corrector produces a plain 'CORRECTION NEEDED' prompt asking the LLM to resend the full response, listing only the invalid actions and stating that emotion names must not be used as action types. Also added 'Every action object MUST have a type field' to strict_requirements to address the root trigger: bare dicts without a type key appearing in the actions array. Partial-success path (>=1 successful action) is unchanged.
CLAUDE.md: bump gitnexus index counts to 7489 symbols / 24879 relationships (auto-updated by px gitnexus analyze). .claude/skills/gitnexus/: add six skill reference files generated by gitnexus (exploring, debugging, impact-analysis, refactoring, guide, cli) for use by AI agents navigating the codebase.
…rrector
- Add 'method' and 'command' to type-normalization in message_chain and transport_layer
- Add flat-field-to-payload gathering fallback via _ACTION_SYSTEM_KEYS frozenset
- Register meta.autonomous and introspection keys (thoughts, reasoning, etc.) as response metadata
- Strip {meta.autonomous: true} and similar meta tags from outbound text in emotion_manager
- Smart message-key handling in corrector_orchestrator: map bare 'message'/'text' keys to interface-specific message actions
- Strip emotion/meta tags from action payload text before dispatch in action_parser
…est rewrite (Phases 1-6) Multimodal: - Extract and forward multimodal attachments (images, audio, video, docs) in cortex_bridge - Add safety settings (all 4 harm categories OFF) to gemini_adapter - Preserve __prompt_request dataclass through sanitize_for_json in plugin_instance Bidirectional API logging: - Add sanitize_for_log() to cortex_api_logger for safe payload redaction - Add _engine_label to BaseProtocolAdapter, set by all bridges - Instrument all adapter methods (chat, stream, TTS, STT, vision) with REQUEST/RESPONSE logging - Remove duplicate bridge-level logging from cortex_bridge PromptRequest prompt rewrite (Phases 1-6): - New core/prompt_request.py: PromptRequest, Turn, RuntimeContext, Attachment dataclasses - New core/prompt_renderers.py: OpenAI, Anthropic, Gemini, Text renderers - Attach PromptRequest to build_json_prompt output under __prompt_request key - Add _history_to_turns, _build_context_summary, _assemble_prompt_request helpers - Add build_delivery_request for auto_response delivery mode - Extend LiveToolRegistry with filtering and build_manifests_from_actions - Migrate openapi, openrouter, anthropic, gemini_api engines to PromptRequest native paths - Add REWRITE-TASK.md documenting the full 8-phase plan Tests: - 13 multimodal extraction tests (test_cortex_bridge_multimodal.py) - 32 prompt renderer tests (test_prompt_renderers.py) - PromptRequest attachment and mode tests (test_prompt_engine.py) - _history_to_turns role parsing tests
- Update gitnexus stats in AGENTS.md and CLAUDE.md (7701 symbols, 25509 relationships) - Add tools/synth_log_mcp.py: FastMCP-based log query server for local development
…y LLM responses
- Pre-normalize [{'actions': [...]}], [{'tool_calls': [...]}], and [{'text': '...'}] single-element list wrappers by collapsing to inner dict
- Rename top-level 'tool_calls' key to 'actions' (Gemini synonym)
- Add text-only response handler for {'text': '...', 'meta': {...}} without actions array — infers message action type from context interface_path
- Propagate top-level meta (autonomous flag) to synthesized action
- Prevents correction loop timeouts that produced fallback emoji responses
…points - Add _format_mm_part() helper that checks endpoint protocol to emit correct wire format - Gemini endpoints get native inline_data format - OpenAI/Anthropic/Custom endpoints get image_url data-URI format - Fixes images being sent as unrecognized inline_data to OpenRouter and other OpenAI-compat providers
…document support
- Add supports_json_mode field to OpenRouterModel with API detection
- Enforce response_format: {type: json_object} when model supports it
- Add safety_settings (4 BLOCK_NONE categories) to both legacy and PromptRequest paths
- Inspect finish_reason for content_filter and length truncation
- Add document MIME type support (PDF, text, CSV, HTML, markdown)
- Expand vision MIME types with bmp, tiff, avif, heic, heif
…tex payloads Grillo beats and diary merge beats now set skip_history=True in context_memory so HistoryEngine returns an empty context rather than loading irrelevant chat history into autonomous-prompt requests. Also adds allowed_action_types to diary merge beats to constrain LLM output. log_cortex_request now calls sanitize_for_log automatically so callers no longer need to pre-sanitize payloads. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Registers the affine-mcp stdio server in .mcp.json pointing to the self-hosted AFFiNE instance at https://board.zwiz.town. Documents one-time credential setup, usage guidelines, and tool reference in AGENTS.md (§8a) and CLAUDE.md. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…changed but it was good
…s response validation
…ic_heart into fix/animations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several improvements and new features to the animation handling and WebUI integration, with a focus on supporting touch interactions, enhancing animation state fidelity, and improving the user experience during VRM model loading. The changes span backend logic, API endpoints, frontend handling, and test coverage.
Animation State and Touch Interaction Enhancements:
TOUCHanimation state, including context ID and priority management, and a dedicated handler for touch interactions from the WebUI. This enables authoritative touch-triggered animations with proper fallback and metadata handling. [1] [2] [3] [4]play_section,frame_range,phase_authoritative, andanimation_state, improving frontend fidelity and recovery after reloads. [1] [2] [3]WebUI and Frontend Improvements:
chat-window.mjs,webui-bootstrap.js) to handle new animation state fields, cache richer animation state for recovery, and support the extendedstartActionsignature. [1] [2] [3]vrm-viewer.mjsversion in HTML templates. [1] [2] [3]Backend Robustness and Testing:
These changes collectively make the animation system more robust, interactive, and user-friendly, especially when handling touch events and recovering from errors.
References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]