Feature/improve task decomposition by himax12 · Pull Request #131 · virattt/dexter

himax12 · 2026-02-11T15:20:52Z

PROBLEM

Multi-metric queries like "AAPL P/E ratio, net income, and profit margins" were creating single-task plans instead of being decomposed into focused subtasks.

Impact:

All-or-nothing execution (one failure = total failure)
No clear task progression visible to users
Generic bundled tool calls
SOLUTION

Enhanced the ComplexityClassifier and TaskPlanner prompts to properly identify and decompose multi-metric queries.

Classifier improvements:

Added 7 example categories (was 2), including multi-metric same-company queries
Expanded heuristic fallback from 2 to 7 patterns
Planner improvements:

Redesigned prompt (34 to 162 lines) with decomposition principles
Added 4 detailed examples with GOOD vs BAD comparisons
Emphasized benefits: reliability, clarity, parallelism
RESULTS

Before: 1 generic task with bundled tool call After: 4 focused tasks (3 data collection + 1 synthesis)

Tasks created: 1 → 4 Tool calls: 1 bundled → 3 focused Synthesis: None → Dedicated task Failure resilience: Low → High Tool call specificity: Generic → Focused

Verified: Scratchpad logs confirm proper task breakdown with "Context from previous tasks" showing separate data fetching before synthesis.

FILES CHANGED

src/agent/task/classifier.ts - Enhanced prompt and heuristics
src/agent/task/planner.ts - Comprehensive prompt redesign
7 files total, 531 insertions(+), 1 deletion(-)
TESTING

✓ Multi-metric query decomposition ✓ Synthesis task aggregation ✓ Heuristic fallback patterns ✓ Scratchpad log verification

Note: Hardcoded financial keywords in heuristics are intentional for Dexter's finance-specific domain.

Relates to #128

Comment
Ctrl+Alt+M

- Enhanced ComplexityClassifier prompt with 7 detailed example categories - Added multi-metric query detection (P/E + revenue + margins) - Expanded heuristic fallback from 2 to 7 financial patterns - Redesigned TaskPlanner prompt (34 162 lines) with decomposition focus - Added 4 detailed GOOD/BAD task planning examples - Included clear principles: when to decompose, benefits, task rules Multi-metric queries now properly decompose into 3-4 focused subtasks instead of bundled single tasks, improving reliability and clarity.

- Enhanced ComplexityClassifier with comprehensive LLM prompt - Added 7 example categories (multi-metric, multi-company, calculations, etc.) - Improved decision criteria with clear SIMPLE vs COMPLEX guidelines - Expanded heuristic fallback from 2 to 7 patterns - Added multi-metric detection - Added calculations, temporal analysis, aggregations, and analysis keywords - Redesigned TaskPlanner prompt (34 lines 162 lines) - Added 4 detailed examples with GOOD/BAD comparisons - Emphasized decomposition benefits (reliability, clarity, parallelism) - Included structured decision criteria for when to decompose Result: Multi-metric queries like 'AAPL valuation and profitability' now properly decompose into 3-4 focused tasks instead of 1 generic task. Verified working with scratchpad logs showing proper task breakdown.

Copilot

Pull request overview

Adds a structured “task planning + execution” pathway to Dexter so complex, multi-metric financial queries can be decomposed into multiple focused subtasks, streamed to the UI, and (optionally) persisted to disk.

Changes:

Introduces a task system: complexity classification, LLM-based task planning, dependency-aware execution, and on-disk persistence.
Extends agent streaming events with task-plan/task-status events and updates the runner UI hook to display them.
Switches the UI runner to call agent.runWithTasks(...) instead of agent.run(...).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
src/hooks/useAgentRunner.ts	Handles new task-related streamed events and uses `runWithTasks()` as the entry point.
src/agent/types.ts	Adds `task_plan_created` / `task_status_changed` events to the AgentEvent union.
src/agent/task/types.ts	Defines core task/task-plan types (Task, ToolCall, TaskPlan, validation result).
src/agent/task/planner.ts	Implements LLM-driven task-plan creation plus DAG validation/cycle detection.
src/agent/task/persistence.ts	Adds filesystem persistence for task plans under `.dexter/tasks`.
src/agent/task/index.ts	Re-exports task subsystem modules.
src/agent/task/executor.ts	Executes tasks respecting dependencies and streams task status changes.
src/agent/task/classifier.ts	Adds LLM-first complexity classification with heuristic fallback.
src/agent/agent.ts	Adds `runWithTasks()` to classify, plan, execute, and persist task plans.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T15:28:33Z

src/agent/task/persistence.ts

+  /**
+   * Delete a task plan
+   */
+  async delete(planId: string): Promise<void> {


delete() calls fs.readdir(this.baseDir) without ensuring the directory exists, which will throw ENOENT when there are no saved plans yet. Call ensureDir() first or treat ENOENT as a no-op.

Suggested change

async delete(planId: string): Promise<void> {

async delete(planId: string): Promise<void> {

await this.ensureDir();

src/agent/task/planner.ts

Copilot · 2026-02-11T15:28:34Z

src/agent/task/planner.ts

+    // Build task plan with proper status
+    const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({
+      id: t.id!,
+      description: t.description!,
+      status: 'pending' as const,
+      toolCalls: t.toolCalls || [],
+      dependencies: t.dependencies || [],
+    }));
+


The LLM response is treated as trusted: t.id! / t.description! can be undefined at runtime and then propagate into the plan, causing downstream failures. Add explicit validation for required fields and toolCalls/dependencies shapes (and fallback to run() or throw a clear error) instead of relying on non-null assertions.

Suggested change

// Build task plan with proper status

const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({

id: t.id!,

description: t.description!,

status: 'pending' as const,

toolCalls: t.toolCalls || [],

dependencies: t.dependencies || [],

}));

// Build task plan with proper status and validation

const tasks: Task[] = parsed.tasks.map((t: Partial<Task>, index: number) => {

if (!t || typeof t !== 'object') {

throw new Error(`LLM task at index ${index} is not a valid object`);

}

if (typeof t.id !== 'string' || t.id.trim() === '') {

throw new Error(`LLM task at index ${index} is missing a valid "id"`);

}

if (typeof t.description !== 'string' || t.description.trim() === '') {

throw new Error(`LLM task at index ${index} is missing a valid "description"`);

}

let toolCalls: Task['toolCalls'];

if (t.toolCalls === undefined) {

toolCalls = [];

} else if (Array.isArray(t.toolCalls)) {

toolCalls = t.toolCalls as Task['toolCalls'];

} else {

throw new Error(`LLM task "${t.id}" has invalid "toolCalls": expected an array`);

}

let dependencies: string[];

if (t.dependencies === undefined) {

dependencies = [];

} else if (Array.isArray(t.dependencies)) {

dependencies = t.dependencies.filter(

(d): d is string => typeof d === 'string' && d.trim() !== ''

);

} else {

throw new Error(`LLM task "${t.id}" has invalid "dependencies": expected an array of strings`);

}

return {

id: t.id,

description: t.description,

status: 'pending' as const,

toolCalls,

dependencies,

};

});

Copilot · 2026-02-11T15:28:34Z

src/agent/agent.ts

+    }
+
+    // Final save
+    await store.save(plan);


runWithTasks() completes after saving the plan (end of method) without emitting a terminal done event for the overall run. The UI (useAgentRunner) expects exactly one done to finalize the history item and surface the final answer. Consider emitting a single final done after task execution (e.g., from the synthesis task result or an explicit aggregation step).

Suggested change

await store.save(plan);

await store.save(plan);

// Emit a single terminal done event for the overall run

yield {

type: 'done',

} as AgentEvent;

Copilot · 2026-02-11T15:28:34Z

src/agent/task/executor.ts

+        // Forward all agent events
+        yield event as AgentEvent;
+
+        if (event.type === 'done') {
+          results.push(event.answer);


This loop forwards every inner Agent.run() event (including done). The outer UI treats the first done as “the whole query is complete”, so a subtask finishing will prematurely finalize the run and later task events/answers will be dropped. Filter/suppress inner terminal events and instead emit a single final done for the overall plan once all tasks finish.

Suggested change

// Forward all agent events

yield event as AgentEvent;

if (event.type === 'done') {

results.push(event.answer);

if (event.type === 'done') {

// Capture the final answer for this task but do not forward the inner

// terminal event, so the outer UI only sees a single final "done"

// for the overall plan.

results.push(event.answer);

} else {

// Forward non-terminal agent events

yield event as AgentEvent;

Copilot · 2026-02-11T15:28:36Z

src/agent/agent.ts

+    console.log('[Task System] Complexity classification:', {
+      query,
+      isComplex: complexity.isComplex,
+      reason: complexity.reason,
+      estimatedSteps: complexity.estimatedSteps,


These console.log statements introduce noisy stdout logging and bypass the app’s shared logger/DebugPanel pipeline used elsewhere. Consider routing through utils/logger (and/or gating behind a debug flag) so logs are consistent and can be surfaced in the UI when needed.

Copilot · 2026-02-11T15:28:36Z

src/agent/agent.ts

+    // Emit task plan created event
+    yield {
+      type: 'task_plan_created',
+      plan,
+    };
+


runWithTasks() yields a task_plan_created event (lines 110-114), and TaskExecutor.execute() also yields a task_plan_created event at the start of execution. This results in duplicate plan-created events in the UI/history. Emit this event in exactly one place (either keep it in runWithTasks() and remove it from the executor, or vice-versa).

Suggested change

// Emit task plan created event

yield {

type: 'task_plan_created',

plan,

};

Copilot · 2026-02-11T15:28:37Z

src/agent/task/executor.ts

+import type { AgentEvent } from '../types.js';
+
+/**
+ * Task execution events
+ */
+export interface TaskStatusChangedEvent {
+  type: 'task_status_changed';
+  taskId: string;
+  status: 'pending' | 'running' | 'complete' | 'failed';
+  result?: string;
+  error?: string;
+}


TaskStatusChangedEvent is re-defined here even though the same event type exists in src/agent/types.ts. Duplicating the event contract in multiple places is likely to drift; prefer importing/re-exporting the shared event types instead of redefining them.

Suggested change

import type { AgentEvent } from '../types.js';

/**

* Task execution events

*/

export interface TaskStatusChangedEvent {

type: 'task_status_changed';

taskId: string;

status: 'pending' | 'running' | 'complete' | 'failed';

result?: string;

error?: string;

}

import type { AgentEvent, TaskStatusChangedEvent } from '../types.js';

/**

* Task execution events

*/

src/agent/task/classifier.ts

src/agent/task/executor.ts

Co-authored-by: Copilot <[email protected]>

himax12 added 2 commits February 11, 2026 20:11

Copilot AI review requested due to automatic review settings February 11, 2026 15:20

Copilot started reviewing on behalf of himax12 February 11, 2026 15:21 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Update src/agent/task/planner.ts

4fc5be6

Co-authored-by: Copilot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/improve task decomposition#131

Feature/improve task decomposition#131
himax12 wants to merge 3 commits intovirattt:mainfrom
himax12:feature/improve-task-decomposition

himax12 commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

	async delete(planId: string): Promise<void> {
	async delete(planId: string): Promise<void> {
	await this.ensureDir();

-    // Build task plan with proper status
-    const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({
-      id: t.id!,
-      description: t.description!,
-      status: 'pending' as const,
-      toolCalls: t.toolCalls || [],
-      dependencies: t.dependencies || [],
-    }));
+    // Build task plan with proper status and validation
+    const tasks: Task[] = parsed.tasks.map((t: Partial<Task>, index: number) => {
+      if (!t || typeof t !== 'object') {
+        throw new Error(`LLM task at index ${index} is not a valid object`);
+      }
+      if (typeof t.id !== 'string' || t.id.trim() === '') {
+        throw new Error(`LLM task at index ${index} is missing a valid "id"`);
+      }
+      if (typeof t.description !== 'string' || t.description.trim() === '') {
+        throw new Error(`LLM task at index ${index} is missing a valid "description"`);
+      }
+      let toolCalls: Task['toolCalls'];
+      if (t.toolCalls === undefined) {
+        toolCalls = [];
+      } else if (Array.isArray(t.toolCalls)) {
+        toolCalls = t.toolCalls as Task['toolCalls'];
+      } else {
+        throw new Error(`LLM task "${t.id}" has invalid "toolCalls": expected an array`);
+      }
+      let dependencies: string[];
+      if (t.dependencies === undefined) {
+        dependencies = [];
+      } else if (Array.isArray(t.dependencies)) {
+        dependencies = t.dependencies.filter(
+          (d): d is string => typeof d === 'string' && d.trim() !== ''
+        );
+      } else {
+        throw new Error(`LLM task "${t.id}" has invalid "dependencies": expected an array of strings`);
+      }
+      return {
+        id: t.id,
+        description: t.description,
+        status: 'pending' as const,
+        toolCalls,
+        dependencies,
+      };
+    });

-    await store.save(plan);
+    await store.save(plan);
+    // Emit a single terminal done event for the overall run
+    yield {
+      type: 'done',
+    } as AgentEvent;

Conversation

himax12 commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments