Skip to content

Test speedwagon tool#48

Merged
ljhh-0611 merged 11 commits intorefactoring-applied-backed-v2from
add-speedwagon-tool
May 6, 2026
Merged

Test speedwagon tool#48
ljhh-0611 merged 11 commits intorefactoring-applied-backed-v2from
add-speedwagon-tool

Conversation

@ljhh-0611
Copy link
Copy Markdown
Collaborator

@ljhh-0611 ljhh-0611 commented Apr 28, 2026

Summary

  • Add POST /sessions/{id}/messages endpoint that streams agent responses and returns collected messages as JSON (with 120s timeout)
  • Integrate Speedwagon as a subagent — main agent (GPT-5.4-mini) delegates RAG queries to speedwagon subagent via tool call, with instruction forcing document corpus search before answering factual questions
  • Migrate SharedStore from Mutex to RwLock for concurrent read access in tool operations (search/find/read use store.read().await)
  • Replace Arc<Mutex<AppState>> with Arc<AppState> using DashMap for interior mutability — eliminates unnecessary outer lock
  • Add lib.rs to expose modules for integration test access
  • Add ApiError/ApiResult type aliases and AppError::internal()/not_found() helpers

Changed Files

Area Files What
backend-v2 router.rs Message handler, Speedwagon store/toolset singletons (OnceLock), subagent session creation
backend-v2 state.rs HashMapDashMap<Uuid, Arc<Mutex<Agent>>>, sync API
backend-v2 main.rs Remove outer Mutex on AppState
backend-v2 error.rs ApiError, ApiResult types + helper constructors
backend-v2 model/message.rs SendMessageRequest, SendMessageResponse DTOs
backend-v2 lib.rs New — public module re-exports
backend-v2 Cargo.toml Add dashmap, tokio/time, dev-deps for E2E
speedwagon store/mod.rs SharedStore = Arc<RwLock<Store>> type alias
speedwagon tool/*.rs .lock().await.read().await
speedwagon main.rs Use SharedStore with RwLock
root .env.example, .gitignore, Cargo.toml Env template, ignore .speedwagon/, ailoy rev bump

Design History

1. SharedStore: Arc<Store>Arc<RwLock<Store>>

Why not just Arc<Store>?

build_toolset(store) creates tool closures that capture a clone of the Arc. The same Arc is also held by router/test code that calls ingest()/purge(). Here's where it breaks:

// Tool closure captures Arc<Store>, calls &self methods — fine
let store = store.clone();
move |args| { store.search(&query, ...) }  // store: Arc<Store> → deref to &Store ✓

// But ingest/purge need &mut self — impossible from Arc<Store>
store.ingest(bytes, file_type)  // needs &mut Store, Arc only gives &Store ✗

Arc<T> can only provide &T (shared reference). There's no way to get &mut T from it — that's by design, since multiple clones of the same Arc could exist. So any code path that needs mutation requires interior mutability.

Why not Arc<Mutex<Store>>?

Mutex works, but every access — including concurrent tool reads — takes an exclusive lock. If agent A is searching while agent B tries to search, B blocks until A finishes. Tools are read-only (search, find, read all take &self), so serializing them is unnecessary.

Decision: Arc<RwLock<Store>> — the standard reader-writer solution.

Tool closures:  store.read().await   → multiple tools can search simultaneously
ingest/purge:   store.write().await  → exclusive access, blocks until all readers finish

Type alias pub type SharedStore = Arc<RwLock<Store>> is exported from the speedwagon crate so backend-v2, CLI, and tests all share the same type.

2. AppState: Remove Outer Mutex → DashMap Interior Mutability

Problem: Arc<Mutex<AppState>> locks the entire state for every handler call. Two concurrent requests to different sessions serialize on the same Mutex.

Decision: Replace HashMap with DashMap (sharded concurrent map). insert_agent and get_agent become &self methods — no .await, no outer lock. Main becomes Arc::new(AppState::new()).

Before: Arc<Mutex<AppState { HashMap<Uuid, Agent> }>>
After:  Arc<AppState { DashMap<Uuid, Arc<Mutex<Agent>>> }>

3. Why Arc<Mutex<Agent>> Inside DashMap

Three layers, each handling a different concurrency level:

Layer What It Does
DashMap Concurrent session lookup/insert across requests
Arc Cloneable handle — decouples agent lifetime from DashMap entry
Mutex<Agent> Per-agent exclusivity — Agent::run(&mut self) requires exclusive access while the stream is alive

Two messages to different sessions run fully in parallel. Two messages to the same session serialize at the Agent Mutex — correct behavior since agent maintains conversation state.

4. Why lib.rs Is Needed

Rust integration tests (tests/*.rs) are external to the crate. Without lib.rs, modules defined in main.rs are invisible to tests. Adding lib.rs with pub mod router; pub mod state; ... creates a library crate that both the binary and tests can import as agent_k_backend::*.

5. STORE / TOOLSET Singletons (OnceLock)

Problem: Creating a new Store per session means each agent gets a separate store — documents ingested in session A are invisible to session B.

Decision: OnceLock<SharedStore> and OnceLock<ToolSet> — process-wide singletons, initialized lazily on first call. The ToolSet captures the Store via closures, so they must reference the same instance.

⚠️ Development-only design. Production would need:

Current Limitation Production Requirement
Store path hardcoded to CARGO_MANIFEST_DIR Configurable via env/config
Single Store for entire process Per-knowledge-base stores (multi-tenant)
Tests share the same singleton Test isolation (separate Store instances)

6. Agent Creation: Direct vs Subagent Pattern

Two patterns exist in create_session — subagent is active, direct is commented out for easy switching.

Direct: Agent::try_with_tools(SpeedwagonSpec.into_spec(), provider, toolset) — agent owns search/find/read tools directly. SpeedwagonSpec's built-in SYSTEM_PROMPT instructs tool usage.

Subagent: AgentBuilder::new(model).subagent(card, sw_agent).build() — main agent has a single "speedwagon" tool that delegates to a separate agent. Allows main/sub to use different models.

E2E Test Behavior Comparison

Direct Subagent
Tools on agent search, find, read, calculate speedwagon (delegates internally)
Tool invocation reliability High — SYSTEM_PROMPT directly instructs Depends on instruction + card quality
Initial E2E result ✅ Pass (Glorkville in response) ❌ Fail — main agent skipped tool call
Failure cause N/A Vague card description + no instruction
After fix N/A Instruction forcing tool use + descriptive card → improved
Model flexibility Locked to SpeedwagonSpec default Main and sub can use different models
Complexity Low (one line) High (AgentBuilder + AgentCard + instruction)
Debuggability Tool calls visible directly Subagent internals are opaque

Subagent Failure → Fix Timeline

  1. Initial: Card description was vague ("Speedwagon agent") → main agent didn't recognize the tool's purpose
  2. Added instruction: "You MUST use the speedwagon tool..." → still failed
  3. Improved card description: "Search the knowledge base for answers. This tool has access to uploaded documents..." → improved
  4. Both instruction AND card description must be strong for reliable subagent invocation

Both patterns are intentionally kept in the code (active vs commented) because they have different trade-offs.

7. Why send_message API Was Added

create_session only creates a session with an agent. Without send_message, there's no way to talk to it. Current design is synchronous JSON (not SSE):

Client → POST /sessions/{id}/messages { content: "..." }
                    ↓
         agent.run(query) — consume entire stream
         collect all messages + extract final assistant text
                    ↓
Client ← 200 { messages: [...], final_content: "..." }
Decision Reason
120s timeout Subagent round-trip (main → sub → main) can be slow
final_content field Convenience — client gets last assistant text without parsing messages array
No SSE Goal at this stage is E2E validation. SSE already implemented in backend v1, will be ported later
Return full messages array Includes tool_call/tool_result intermediate messages — useful for debugging and UI

E2E Test

test_ingest_message_purge_cycle (#[ignore], requires OPENAI_API_KEY):

  1. Ingest a document with unique fact ("The capital of Freedonia is Glorkville")
  2. Create session → send message asking about the fact
  3. Assert response contains "Glorkville" (verifies subagent RAG pipeline)
  4. Purge document → send same message → assert non-empty response
cargo test -p agent-k-backend test_ingest_message_purge_cycle -- --ignored

🤖 Generated with Claude Code

ljhh-0611 and others added 5 commits April 29, 2026 00:12
Integrate base branch features (SQLite repository, per-session sandbox,
SSE streaming, message persistence, session CRUD) with speedwagon RAG
subagent, DashMap concurrency, and ApiError/ApiResult error patterns.

Key integration points:
- build_agent() combines builtin tools (bash/python/web_search),
  per-session sandbox, and speedwagon subagent
- DashMap-based AppState with injected SharedStore/ToolSet
- resolve_agent() lazy-creates agents with full history restoration
- All tests updated for new AppState::new(repo, store, toolset) signature

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ljhh-0611 ljhh-0611 requested review from jhlee525 and khj809 and removed request for khj809 April 29, 2026 06:24
@ljhh-0611 ljhh-0611 self-assigned this Apr 29, 2026
Comment thread backend-v2/Cargo.toml Outdated
Comment thread backend-v2/src/router.rs Outdated
Comment thread backend-v2/src/router.rs Outdated
@ljhh-0611 ljhh-0611 marked this pull request as ready for review April 30, 2026 04:08
@ljhh-0611 ljhh-0611 marked this pull request as draft April 30, 2026 04:31
@khj809
Copy link
Copy Markdown
Contributor

khj809 commented May 4, 2026

@jhlee525 Please review my latest commit, which applies the changes in brekkylab/ailoy#391. I'm not sure if this follows your expected practices exactly.

@jhlee525
Copy link
Copy Markdown

jhlee525 commented May 6, 2026

I think it align with current design! Thank you for changes.

- Remove unused `into_runtime` and `into_runtime_with_provider` from
  SpeedwagonSpec (never called anywhere)
- Migrate e2e_test.rs to use shared `common` module helpers instead of
  local `json_request` and `extract_assistant_text` duplicates
- Fix `.sandbox()` → `.runenv()` in commented-out alternative build_agent

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ljhh-0611 ljhh-0611 marked this pull request as ready for review May 6, 2026 07:35
@ljhh-0611 ljhh-0611 merged commit fa99b3c into refactoring-applied-backed-v2 May 6, 2026
@ljhh-0611 ljhh-0611 deleted the add-speedwagon-tool branch May 6, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants