Skip to content

Progress: core runtime refactor checkpoints#192

Closed
shuxueshuxue wants to merge 240 commits intomainfrom
pr188-agent-optimize
Closed

Progress: core runtime refactor checkpoints#192
shuxueshuxue wants to merge 240 commits intomainfrom
pr188-agent-optimize

Conversation

@shuxueshuxue
Copy link
Copy Markdown
Collaborator

Summary

  • add a minimal QueryLoop.aget_state/aupdate_state bridge for backend/web callers after the reopened ql-06 regression
  • cover both live caller shapes: resumed-thread __start__ appends and RemoveMessage-based repair updates
  • add backend-facing regression tests for _repair_incomplete_tool_calls() and get_thread_history() so the caller contract stays locked

Test Plan

  • uv run pytest tests/unit/test_loop.py tests/test_query_loop_backend_bridge.py -q
  • hostile re-review on backend :8010 reported the original caller-surface blocker no longer reproduces

nmhjklnm and others added 30 commits April 1, 2026 22:48
- Add docs/architecture/ with 11 deep-dive docs covering CC patterns:
  query loop, tool execution, state/agents, security/permissions,
  API/prompt infra, PowerShell, plugins, settings/platform,
  compaction pipeline (4-layer, SM-Compact, Legacy Compact details)
- Add cc-patterns.md master blueprint with LangChain mapping,
  implementation priority roadmap (Phase 1-5), and PARTIAL gap registry
- Refactor core agent modules: chat_tool_service, delivery, service,
  agent runtime, registry, filesystem/search/wechat tool services
- Add core/runtime/prompts.py
- Phase 1: slim system prompt — move tool usage guidance to descriptions,
  keep only sub-agent type routing in system prompt
- Phase 2: rewrite all tool descriptions to convey non-intuitive boundary
  conditions (Read/Write/Edit/Glob/Grep/Bash/Agent/WebSearch/WebFetch/
  TaskOutput/TaskStop/TaskCreate/tool_search/load_skill)
- Phase 3: add pages param to Read schema; add line_numbers param to Grep
  schema and handler; add subagent_type enum to Agent schema
- Phase 4: mark WebSearch/WebFetch/tool_search/load_skill/TaskGet/TaskList/
  wechat_contacts as is_concurrency_safe + is_read_only
- Phase 5: sub-agent tool filtering — AGENT_DISALLOWED/EXPLORE_ALLOWED/
  PLAN_ALLOWED/BASH_ALLOWED constants; LeonAgent accepts extra_blocked_tools
  and allowed_tools; _run_agent applies per-type filters
- Phase 6: add LSP placeholder to tool_catalog (deferred, default=False)
- Extras: search_hint for Agent/TaskOutput/TaskStop/chat tools/wechat_send;
  TaskOutput marked is_read_only; Edit description adds .ipynb workaround;
  fix prompt caching to place cache_control on system_message content block;
  add forkContext parent message inheritance with _filter_fork_messages;
  expose set_current_messages ContextVar for sub-agent context passing
- Add --max-columns 500 to suppress minified/base64 output
- Add missing VCS excludes: .svn, .hg, .bzr, .jj, .sl
- Default head_limit 250 (matches CC's undocumented cap)
Registers a DEFERRED LSP tool providing code intelligence:
goToDefinition, findReferences, hover, documentSymbol, workspaceSymbol.

- _LSPSession: holds multilspy LanguageServer alive in a background asyncio
  task using start_server() context manager + Event-based lifecycle control
- LSPService: lazy per-language session pool, auto-detects language from
  file extension, converts absolute paths to workspace-relative
- Integrated into LeonAgent._init_services() with CleanupRegistry at priority 1
- Optional dep: pip install multilspy (or leonai[lsp])
- Supported: python, typescript, javascript, go, rust, java, ruby, kotlin, csharp
- Language servers auto-downloaded on first use per multilspy design
- multilspy moved from optional to core dependencies (avoid restart cost)
- Add 10 MB file size limit (matches CC LSP spec)
- Add gitignore filtering on returned locations via git check-ignore,
  batched in groups of 50 (matches CC batch size)
- Remove multilspy availability check from handler (always available now)
Adds 4 missing LSP operations via multilspy internal API:
- goToImplementation (textDocument/implementation)
- prepareCallHierarchy (textDocument/prepareCallHierarchy)
- incomingCalls (callHierarchy/incomingCalls)
- outgoingCalls (callHierarchy/outgoingCalls)

Total supported operations: 9 (matches CC LSP tool surface).
incomingCalls/outgoingCalls take the 'item' output from prepareCallHierarchy.
Language auto-detected from item.uri for call hierarchy ops.
- _fmt_symbol: handle both SymbolInformation (workspaceSymbol, has location.uri)
  and DocumentSymbol (documentSymbol, has top-level range/selectionRange)
- request_definition/references/hover/document_symbols: catch AssertionError
  from multilspy when server returns None (maps to empty result / no hover)
…langserver

Python's Jedi server doesn't support goToImplementation or call hierarchy.
Add _PyrightSession — a minimal asyncio LSP client over stdio — that talks to
pyright-langserver (bundled with `pip install pyright`, already a core dep).

Changes:
- _PyrightSession: JSON-RPC/Content-Length stdio client, initialize handshake,
  textDocument/didOpen, callHierarchy/{incomingCalls,outgoingCalls},
  textDocument/{implementation,prepareCallHierarchy}
- Acks server-to-client requests (window/workDoneProgress/create etc.)
- Keeps files open for session lifetime (required for call hierarchy)
- LSPService routes Python advanced ops to pyright, other languages to multilspy
- Fix _fmt_symbol: handle both SymbolInformation (workspaceSymbol) and
  DocumentSymbol (documentSymbol) response formats
- Fix AssertionError from multilspy null responses → empty result
- pyproject.toml: add core.tools.lsp to packages list (was missing — would
  cause lsp tool to be absent after pip install leonai)
- pyproject.toml: add pyright>=1.1.0 as core dep (required by _PyrightSession)
- lsp/service.py: remove unused _wait_for_idle, _active_progress, _idle_event,
  _progress_started from _PyrightSession (pyright doesn't send $/progress)
- plan-tool-alignment.md: replace Phase 6 placeholder with actual implementation
  summary (9 operations, dual-backend architecture, deps)
Language servers (multilspy + pyright) now live in a module-level
_LSPSessionPool instead of per-LSPService instances. Sessions are keyed
by (language, workspace_root), start lazily on first use, and survive
agent restarts. Cleanup moved from CleanupRegistry to the backend
lifespan finally block via `await lsp_pool.close_all()`.

- Add _LSPSessionPool with asyncio.Task-based dedup for concurrent starts
- Simplify LSPService to delegate all session management to lsp_pool
- Remove _cleanup_lsp_service from LeonAgent and CleanupRegistry
- Add lsp_pool.close_all() to backend/web/core/lifespan.py shutdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest closure update on current HEAD 90415ffa.

New slices landed after the previous checkpoint wave:

  • 847f1ae5 Remove debug backdoors and fix path schemas
    • deleted backend /api/debug/log
    • removed frontend console.log interception + window.__debugEntries
    • removed unconditional command-service debug print(...)
    • simplified use-thread-stream.ts manager lifetime wiring
    • fixed the real Windows CI regression from the new filesystem absolute-path schema by accepting both POSIX and Windows drive-absolute paths
  • 90415ffa Fix ask-user question prompt identity
    • AskUserQuestion answer state is now keyed by prompt position instead of rendered text, so duplicate question text no longer collides
    • frontend title/message no longer present AskUserQuestion as a generic permission gate

Current verification on the integrated branch:

  • local Python workflow-equivalent pack
    • uv run pytest tests/ --ignore=tests/test_e2e_providers.py --ignore=tests/test_sandbox_e2e.py --ignore=tests/test_daytona_e2e.py --ignore=tests/test_e2e_backend_api.py --ignore=tests/test_e2e_summary_persistence.py --ignore=tests/test_p3_e2e.py --maxfail=5 --timeout=60 -q
    • 1014 passed, 44 skipped
  • targeted frontend proof
    • vitest focused packs green
    • touched-file eslint green
    • cd frontend/app && npm run build green
  • targeted backend proof
    • uv run pytest -q tests/Unit/core/test_agent_service.py -k ask_user_question tests/Integration/test_threads_router.py -k ask_user_question
    • 4 passed
  • touched filesystem-service pyright
    • 0 errors
  • fresh manual brutal probes still green:
    • local thread m_50tMO7PmFp7f-56 -> runtime=idle, /tasks=[completed], /tmp/leon-nu56/local-agent/done.txt = NU56_LOCAL_AGENT_1775429554
    • Daytona thread m_x6b9LVBMNj1l-70 -> runtime=idle, /tasks=[completed], final assistant token NU56_DAYTONA_AGENT_1775429554

Fresh GitHub / staging proof on this exact head:

  • CI run 24012576961 -> success
  • Deploy Staging run 24012576126 -> success
  • live staging containers are now on image tag 90415ffa64addbd8a639ab3f7d50c8ec342318ac
  • black-box staging check after deploy is green:
    • POST /api/auth/login -> 200
    • GET /api/threads/m_x6b9LVBMNj1l-21/runtime -> 200

At this point the remaining merge blocker is not CI/runtime correctness anymore; it is the required approving review gate.

@shuxueshuxue shuxueshuxue requested a review from nmhjklnm April 5, 2026 23:15
@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

@nmhjklnm latest head 90415ffa is green on CI + staging and the remaining blocker is the required approval gate. Could you take a final pass when convenient?

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest closure delta on cb9262b8:

  • fixed AskUserQuestion stale pending lifecycle without keeping debug instrumentation
  • fixed Windows pricing cache/bundled models seam by making cache + bundled JSON reads/writes explicit UTF-8

Fresh proof:

  • local real Playwright CLI on thread m_50tMO7PmFp7f-64: rendered the AskUserQuestion card, clicked Alpha, submitted, backend /permissions cleared to [], backend detail ended with PLAYWRIGHT_ASK_OKAlpha, and the same page later rendered that terminal text too
  • focused local tests green
  • CI run 24017380550 green, including Unit Tests (windows-latest)
  • Deploy Staging run 24017379784 green

Checkpoint memory updated under nu-45 + nu-57.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest live staging caller proof on current head cb9262b8:\n\n- real Playwright CLI on staging thread m_50tMO7PmFp7f-65\n- rendered the real AskUserQuestion card 回答问题\n- clicked Alpha, submitted\n- staging backend /api/threads/m_50tMO7PmFp7f-65/permissions then returned requests=[]\n- staging thread detail later ended with exact assistant text STAGING_PLAYWRIGHT_ASK_OK Alpha\n\nSo the AskUserQuestion caller proof is now green on both local and live staging, not only on local dev.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

superseded by #206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants