Conversation
- Enhanced ComplexityClassifier prompt with 7 detailed example categories - Added multi-metric query detection (P/E + revenue + margins) - Expanded heuristic fallback from 2 to 7 financial patterns - Redesigned TaskPlanner prompt (34 162 lines) with decomposition focus - Added 4 detailed GOOD/BAD task planning examples - Included clear principles: when to decompose, benefits, task rules Multi-metric queries now properly decompose into 3-4 focused subtasks instead of bundled single tasks, improving reliability and clarity.
- Enhanced ComplexityClassifier with comprehensive LLM prompt - Added 7 example categories (multi-metric, multi-company, calculations, etc.) - Improved decision criteria with clear SIMPLE vs COMPLEX guidelines - Expanded heuristic fallback from 2 to 7 patterns - Added multi-metric detection - Added calculations, temporal analysis, aggregations, and analysis keywords - Redesigned TaskPlanner prompt (34 lines 162 lines) - Added 4 detailed examples with GOOD/BAD comparisons - Emphasized decomposition benefits (reliability, clarity, parallelism) - Included structured decision criteria for when to decompose Result: Multi-metric queries like 'AAPL valuation and profitability' now properly decompose into 3-4 focused tasks instead of 1 generic task. Verified working with scratchpad logs showing proper task breakdown.
There was a problem hiding this comment.
Pull request overview
Adds a structured “task planning + execution” pathway to Dexter so complex, multi-metric financial queries can be decomposed into multiple focused subtasks, streamed to the UI, and (optionally) persisted to disk.
Changes:
- Introduces a task system: complexity classification, LLM-based task planning, dependency-aware execution, and on-disk persistence.
- Extends agent streaming events with task-plan/task-status events and updates the runner UI hook to display them.
- Switches the UI runner to call
agent.runWithTasks(...)instead ofagent.run(...).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/hooks/useAgentRunner.ts | Handles new task-related streamed events and uses runWithTasks() as the entry point. |
| src/agent/types.ts | Adds task_plan_created / task_status_changed events to the AgentEvent union. |
| src/agent/task/types.ts | Defines core task/task-plan types (Task, ToolCall, TaskPlan, validation result). |
| src/agent/task/planner.ts | Implements LLM-driven task-plan creation plus DAG validation/cycle detection. |
| src/agent/task/persistence.ts | Adds filesystem persistence for task plans under .dexter/tasks. |
| src/agent/task/index.ts | Re-exports task subsystem modules. |
| src/agent/task/executor.ts | Executes tasks respecting dependencies and streams task status changes. |
| src/agent/task/classifier.ts | Adds LLM-first complexity classification with heuristic fallback. |
| src/agent/agent.ts | Adds runWithTasks() to classify, plan, execute, and persist task plans. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Delete a task plan | ||
| */ | ||
| async delete(planId: string): Promise<void> { |
There was a problem hiding this comment.
delete() calls fs.readdir(this.baseDir) without ensuring the directory exists, which will throw ENOENT when there are no saved plans yet. Call ensureDir() first or treat ENOENT as a no-op.
| async delete(planId: string): Promise<void> { | |
| async delete(planId: string): Promise<void> { | |
| await this.ensureDir(); |
| // Build task plan with proper status | ||
| const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({ | ||
| id: t.id!, | ||
| description: t.description!, | ||
| status: 'pending' as const, | ||
| toolCalls: t.toolCalls || [], | ||
| dependencies: t.dependencies || [], | ||
| })); | ||
|
|
There was a problem hiding this comment.
The LLM response is treated as trusted: t.id! / t.description! can be undefined at runtime and then propagate into the plan, causing downstream failures. Add explicit validation for required fields and toolCalls/dependencies shapes (and fallback to run() or throw a clear error) instead of relying on non-null assertions.
| // Build task plan with proper status | |
| const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({ | |
| id: t.id!, | |
| description: t.description!, | |
| status: 'pending' as const, | |
| toolCalls: t.toolCalls || [], | |
| dependencies: t.dependencies || [], | |
| })); | |
| // Build task plan with proper status and validation | |
| const tasks: Task[] = parsed.tasks.map((t: Partial<Task>, index: number) => { | |
| if (!t || typeof t !== 'object') { | |
| throw new Error(`LLM task at index ${index} is not a valid object`); | |
| } | |
| if (typeof t.id !== 'string' || t.id.trim() === '') { | |
| throw new Error(`LLM task at index ${index} is missing a valid "id"`); | |
| } | |
| if (typeof t.description !== 'string' || t.description.trim() === '') { | |
| throw new Error(`LLM task at index ${index} is missing a valid "description"`); | |
| } | |
| let toolCalls: Task['toolCalls']; | |
| if (t.toolCalls === undefined) { | |
| toolCalls = []; | |
| } else if (Array.isArray(t.toolCalls)) { | |
| toolCalls = t.toolCalls as Task['toolCalls']; | |
| } else { | |
| throw new Error(`LLM task "${t.id}" has invalid "toolCalls": expected an array`); | |
| } | |
| let dependencies: string[]; | |
| if (t.dependencies === undefined) { | |
| dependencies = []; | |
| } else if (Array.isArray(t.dependencies)) { | |
| dependencies = t.dependencies.filter( | |
| (d): d is string => typeof d === 'string' && d.trim() !== '' | |
| ); | |
| } else { | |
| throw new Error(`LLM task "${t.id}" has invalid "dependencies": expected an array of strings`); | |
| } | |
| return { | |
| id: t.id, | |
| description: t.description, | |
| status: 'pending' as const, | |
| toolCalls, | |
| dependencies, | |
| }; | |
| }); |
| } | ||
|
|
||
| // Final save | ||
| await store.save(plan); |
There was a problem hiding this comment.
runWithTasks() completes after saving the plan (end of method) without emitting a terminal done event for the overall run. The UI (useAgentRunner) expects exactly one done to finalize the history item and surface the final answer. Consider emitting a single final done after task execution (e.g., from the synthesis task result or an explicit aggregation step).
| await store.save(plan); | |
| await store.save(plan); | |
| // Emit a single terminal done event for the overall run | |
| yield { | |
| type: 'done', | |
| } as AgentEvent; |
| // Forward all agent events | ||
| yield event as AgentEvent; | ||
|
|
||
| if (event.type === 'done') { | ||
| results.push(event.answer); |
There was a problem hiding this comment.
This loop forwards every inner Agent.run() event (including done). The outer UI treats the first done as “the whole query is complete”, so a subtask finishing will prematurely finalize the run and later task events/answers will be dropped. Filter/suppress inner terminal events and instead emit a single final done for the overall plan once all tasks finish.
| // Forward all agent events | |
| yield event as AgentEvent; | |
| if (event.type === 'done') { | |
| results.push(event.answer); | |
| if (event.type === 'done') { | |
| // Capture the final answer for this task but do not forward the inner | |
| // terminal event, so the outer UI only sees a single final "done" | |
| // for the overall plan. | |
| results.push(event.answer); | |
| } else { | |
| // Forward non-terminal agent events | |
| yield event as AgentEvent; |
| console.log('[Task System] Complexity classification:', { | ||
| query, | ||
| isComplex: complexity.isComplex, | ||
| reason: complexity.reason, | ||
| estimatedSteps: complexity.estimatedSteps, |
There was a problem hiding this comment.
These console.log statements introduce noisy stdout logging and bypass the app’s shared logger/DebugPanel pipeline used elsewhere. Consider routing through utils/logger (and/or gating behind a debug flag) so logs are consistent and can be surfaced in the UI when needed.
| // Emit task plan created event | ||
| yield { | ||
| type: 'task_plan_created', | ||
| plan, | ||
| }; | ||
|
|
There was a problem hiding this comment.
runWithTasks() yields a task_plan_created event (lines 110-114), and TaskExecutor.execute() also yields a task_plan_created event at the start of execution. This results in duplicate plan-created events in the UI/history. Emit this event in exactly one place (either keep it in runWithTasks() and remove it from the executor, or vice-versa).
| // Emit task plan created event | |
| yield { | |
| type: 'task_plan_created', | |
| plan, | |
| }; |
| import type { AgentEvent } from '../types.js'; | ||
|
|
||
| /** | ||
| * Task execution events | ||
| */ | ||
| export interface TaskStatusChangedEvent { | ||
| type: 'task_status_changed'; | ||
| taskId: string; | ||
| status: 'pending' | 'running' | 'complete' | 'failed'; | ||
| result?: string; | ||
| error?: string; | ||
| } |
There was a problem hiding this comment.
TaskStatusChangedEvent is re-defined here even though the same event type exists in src/agent/types.ts. Duplicating the event contract in multiple places is likely to drift; prefer importing/re-exporting the shared event types instead of redefining them.
| import type { AgentEvent } from '../types.js'; | |
| /** | |
| * Task execution events | |
| */ | |
| export interface TaskStatusChangedEvent { | |
| type: 'task_status_changed'; | |
| taskId: string; | |
| status: 'pending' | 'running' | 'complete' | 'failed'; | |
| result?: string; | |
| error?: string; | |
| } | |
| import type { AgentEvent, TaskStatusChangedEvent } from '../types.js'; | |
| /** | |
| * Task execution events | |
| */ |
Co-authored-by: Copilot <[email protected]>
PROBLEM
Multi-metric queries like "AAPL P/E ratio, net income, and profit margins" were creating single-task plans instead of being decomposed into focused subtasks.
Impact:
All-or-nothing execution (one failure = total failure)
No clear task progression visible to users
Generic bundled tool calls
SOLUTION
Enhanced the ComplexityClassifier and TaskPlanner prompts to properly identify and decompose multi-metric queries.
Classifier improvements:
Added 7 example categories (was 2), including multi-metric same-company queries
Expanded heuristic fallback from 2 to 7 patterns
Planner improvements:
Redesigned prompt (34 to 162 lines) with decomposition principles
Added 4 detailed examples with GOOD vs BAD comparisons
Emphasized benefits: reliability, clarity, parallelism
RESULTS
Before: 1 generic task with bundled tool call After: 4 focused tasks (3 data collection + 1 synthesis)
Tasks created: 1 → 4 Tool calls: 1 bundled → 3 focused Synthesis: None → Dedicated task Failure resilience: Low → High Tool call specificity: Generic → Focused
Verified: Scratchpad logs confirm proper task breakdown with "Context from previous tasks" showing separate data fetching before synthesis.
FILES CHANGED
src/agent/task/classifier.ts - Enhanced prompt and heuristics
src/agent/task/planner.ts - Comprehensive prompt redesign
7 files total, 531 insertions(+), 1 deletion(-)
TESTING
✓ Multi-metric query decomposition ✓ Synthesis task aggregation ✓ Heuristic fallback patterns ✓ Scratchpad log verification
Note: Hardcoded financial keywords in heuristics are intentional for Dexter's finance-specific domain.
Relates to #128
Comment
Ctrl+Alt+M