Skip to content

Feature/improve task decomposition#131

Open
himax12 wants to merge 3 commits intovirattt:mainfrom
himax12:feature/improve-task-decomposition
Open

Feature/improve task decomposition#131
himax12 wants to merge 3 commits intovirattt:mainfrom
himax12:feature/improve-task-decomposition

Conversation

@himax12
Copy link

@himax12 himax12 commented Feb 11, 2026

PROBLEM

Multi-metric queries like "AAPL P/E ratio, net income, and profit margins" were creating single-task plans instead of being decomposed into focused subtasks.

Impact:

All-or-nothing execution (one failure = total failure)
No clear task progression visible to users
Generic bundled tool calls
SOLUTION

Enhanced the ComplexityClassifier and TaskPlanner prompts to properly identify and decompose multi-metric queries.

Classifier improvements:

Added 7 example categories (was 2), including multi-metric same-company queries
Expanded heuristic fallback from 2 to 7 patterns
Planner improvements:

Redesigned prompt (34 to 162 lines) with decomposition principles
Added 4 detailed examples with GOOD vs BAD comparisons
Emphasized benefits: reliability, clarity, parallelism
RESULTS

Before: 1 generic task with bundled tool call After: 4 focused tasks (3 data collection + 1 synthesis)

Tasks created: 1 → 4 Tool calls: 1 bundled → 3 focused Synthesis: None → Dedicated task Failure resilience: Low → High Tool call specificity: Generic → Focused

Verified: Scratchpad logs confirm proper task breakdown with "Context from previous tasks" showing separate data fetching before synthesis.

FILES CHANGED

src/agent/task/classifier.ts - Enhanced prompt and heuristics
src/agent/task/planner.ts - Comprehensive prompt redesign
7 files total, 531 insertions(+), 1 deletion(-)
TESTING

✓ Multi-metric query decomposition ✓ Synthesis task aggregation ✓ Heuristic fallback patterns ✓ Scratchpad log verification

Note: Hardcoded financial keywords in heuristics are intentional for Dexter's finance-specific domain.

Relates to #128

Comment
Ctrl+Alt+M

- Enhanced ComplexityClassifier prompt with 7 detailed example categories
- Added multi-metric query detection (P/E + revenue + margins)
- Expanded heuristic fallback from 2 to 7 financial patterns
- Redesigned TaskPlanner prompt (34  162 lines) with decomposition focus
- Added 4 detailed GOOD/BAD task planning examples
- Included clear principles: when to decompose, benefits, task rules

Multi-metric queries now properly decompose into 3-4 focused subtasks instead
of bundled single tasks, improving reliability and clarity.
- Enhanced ComplexityClassifier with comprehensive LLM prompt
  - Added 7 example categories (multi-metric, multi-company, calculations, etc.)
  - Improved decision criteria with clear SIMPLE vs COMPLEX guidelines

- Expanded heuristic fallback from 2 to 7 patterns
  - Added multi-metric detection
  - Added calculations, temporal analysis, aggregations, and analysis keywords

- Redesigned TaskPlanner prompt (34 lines  162 lines)
  - Added 4 detailed examples with GOOD/BAD comparisons
  - Emphasized decomposition benefits (reliability, clarity, parallelism)
  - Included structured decision criteria for when to decompose

Result: Multi-metric queries like 'AAPL valuation and profitability' now
properly decompose into 3-4 focused tasks instead of 1 generic task.

Verified working with scratchpad logs showing proper task breakdown.
Copilot AI review requested due to automatic review settings February 11, 2026 15:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a structured “task planning + execution” pathway to Dexter so complex, multi-metric financial queries can be decomposed into multiple focused subtasks, streamed to the UI, and (optionally) persisted to disk.

Changes:

  • Introduces a task system: complexity classification, LLM-based task planning, dependency-aware execution, and on-disk persistence.
  • Extends agent streaming events with task-plan/task-status events and updates the runner UI hook to display them.
  • Switches the UI runner to call agent.runWithTasks(...) instead of agent.run(...).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/hooks/useAgentRunner.ts Handles new task-related streamed events and uses runWithTasks() as the entry point.
src/agent/types.ts Adds task_plan_created / task_status_changed events to the AgentEvent union.
src/agent/task/types.ts Defines core task/task-plan types (Task, ToolCall, TaskPlan, validation result).
src/agent/task/planner.ts Implements LLM-driven task-plan creation plus DAG validation/cycle detection.
src/agent/task/persistence.ts Adds filesystem persistence for task plans under .dexter/tasks.
src/agent/task/index.ts Re-exports task subsystem modules.
src/agent/task/executor.ts Executes tasks respecting dependencies and streams task status changes.
src/agent/task/classifier.ts Adds LLM-first complexity classification with heuristic fallback.
src/agent/agent.ts Adds runWithTasks() to classify, plan, execute, and persist task plans.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/**
* Delete a task plan
*/
async delete(planId: string): Promise<void> {
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete() calls fs.readdir(this.baseDir) without ensuring the directory exists, which will throw ENOENT when there are no saved plans yet. Call ensureDir() first or treat ENOENT as a no-op.

Suggested change
async delete(planId: string): Promise<void> {
async delete(planId: string): Promise<void> {
await this.ensureDir();

Copilot uses AI. Check for mistakes.
Comment on lines +238 to +246
// Build task plan with proper status
const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({
id: t.id!,
description: t.description!,
status: 'pending' as const,
toolCalls: t.toolCalls || [],
dependencies: t.dependencies || [],
}));

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM response is treated as trusted: t.id! / t.description! can be undefined at runtime and then propagate into the plan, causing downstream failures. Add explicit validation for required fields and toolCalls/dependencies shapes (and fallback to run() or throw a clear error) instead of relying on non-null assertions.

Suggested change
// Build task plan with proper status
const tasks: Task[] = parsed.tasks.map((t: Partial<Task>) => ({
id: t.id!,
description: t.description!,
status: 'pending' as const,
toolCalls: t.toolCalls || [],
dependencies: t.dependencies || [],
}));
// Build task plan with proper status and validation
const tasks: Task[] = parsed.tasks.map((t: Partial<Task>, index: number) => {
if (!t || typeof t !== 'object') {
throw new Error(`LLM task at index ${index} is not a valid object`);
}
if (typeof t.id !== 'string' || t.id.trim() === '') {
throw new Error(`LLM task at index ${index} is missing a valid "id"`);
}
if (typeof t.description !== 'string' || t.description.trim() === '') {
throw new Error(`LLM task at index ${index} is missing a valid "description"`);
}
let toolCalls: Task['toolCalls'];
if (t.toolCalls === undefined) {
toolCalls = [];
} else if (Array.isArray(t.toolCalls)) {
toolCalls = t.toolCalls as Task['toolCalls'];
} else {
throw new Error(`LLM task "${t.id}" has invalid "toolCalls": expected an array`);
}
let dependencies: string[];
if (t.dependencies === undefined) {
dependencies = [];
} else if (Array.isArray(t.dependencies)) {
dependencies = t.dependencies.filter(
(d): d is string => typeof d === 'string' && d.trim() !== ''
);
} else {
throw new Error(`LLM task "${t.id}" has invalid "dependencies": expected an array of strings`);
}
return {
id: t.id,
description: t.description,
status: 'pending' as const,
toolCalls,
dependencies,
};
});

Copilot uses AI. Check for mistakes.
}

// Final save
await store.save(plan);
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runWithTasks() completes after saving the plan (end of method) without emitting a terminal done event for the overall run. The UI (useAgentRunner) expects exactly one done to finalize the history item and surface the final answer. Consider emitting a single final done after task execution (e.g., from the synthesis task result or an explicit aggregation step).

Suggested change
await store.save(plan);
await store.save(plan);
// Emit a single terminal done event for the overall run
yield {
type: 'done',
} as AgentEvent;

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +131
// Forward all agent events
yield event as AgentEvent;

if (event.type === 'done') {
results.push(event.answer);
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop forwards every inner Agent.run() event (including done). The outer UI treats the first done as “the whole query is complete”, so a subtask finishing will prematurely finalize the run and later task events/answers will be dropped. Filter/suppress inner terminal events and instead emit a single final done for the overall plan once all tasks finish.

Suggested change
// Forward all agent events
yield event as AgentEvent;
if (event.type === 'done') {
results.push(event.answer);
if (event.type === 'done') {
// Capture the final answer for this task but do not forward the inner
// terminal event, so the outer UI only sees a single final "done"
// for the overall plan.
results.push(event.answer);
} else {
// Forward non-terminal agent events
yield event as AgentEvent;

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +79
console.log('[Task System] Complexity classification:', {
query,
isComplex: complexity.isComplex,
reason: complexity.reason,
estimatedSteps: complexity.estimatedSteps,
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These console.log statements introduce noisy stdout logging and bypass the app’s shared logger/DebugPanel pipeline used elsewhere. Consider routing through utils/logger (and/or gating behind a debug flag) so logs are consistent and can be surfaced in the UI when needed.

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +115
// Emit task plan created event
yield {
type: 'task_plan_created',
plan,
};

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runWithTasks() yields a task_plan_created event (lines 110-114), and TaskExecutor.execute() also yields a task_plan_created event at the start of execution. This results in duplicate plan-created events in the UI/history. Emit this event in exactly one place (either keep it in runWithTasks() and remove it from the executor, or vice-versa).

Suggested change
// Emit task plan created event
yield {
type: 'task_plan_created',
plan,
};

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +14
import type { AgentEvent } from '../types.js';

/**
* Task execution events
*/
export interface TaskStatusChangedEvent {
type: 'task_status_changed';
taskId: string;
status: 'pending' | 'running' | 'complete' | 'failed';
result?: string;
error?: string;
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskStatusChangedEvent is re-defined here even though the same event type exists in src/agent/types.ts. Duplicating the event contract in multiple places is likely to drift; prefer importing/re-exporting the shared event types instead of redefining them.

Suggested change
import type { AgentEvent } from '../types.js';
/**
* Task execution events
*/
export interface TaskStatusChangedEvent {
type: 'task_status_changed';
taskId: string;
status: 'pending' | 'running' | 'complete' | 'failed';
result?: string;
error?: string;
}
import type { AgentEvent, TaskStatusChangedEvent } from '../types.js';
/**
* Task execution events
*/

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments