Skip to content

Feat/reliability performance overhaul#33

Merged
cohen-liel merged 6 commits into
mainfrom
feat/reliability-performance-overhaul
Apr 11, 2026
Merged

Feat/reliability performance overhaul#33
cohen-liel merged 6 commits into
mainfrom
feat/reliability-performance-overhaul

Conversation

@cohen-liel
Copy link
Copy Markdown
Owner

@cohen-liel cohen-liel commented Apr 11, 2026

Description

Major reliability, performance, and extensibility update. Overhauls the DAG executor with async I/O, smarter
retry/self-healing, and a code review system. Simplifies the agent pipeline by removing the architect subprocess. Adds
a plugin system for custom agent roles and an interactive DAG visualization page. Includes memory management
improvements, frontend fixes, and comprehensive new test coverage.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Refactoring (no functional changes)

Related Issues

None

Component(s) Affected

  • Dashboard UI
  • DAG Engine
  • Agent System
  • API / Backend

Changes Made

DAG Executor & Orchestrator:

  • Async I/O overhaul, review system, fix retry and stuck graph detection
  • PM reads codebase directly, architect subprocess removed
  • Simplified failure categories and improved task semantics
  • Tightened agent limits and added memory management constants
  • Prevent unbounded memory growth
  • Disabled experimental features by default

Plugin System (new):

  • PluginBase ABC, PluginRegistry with hot-reload and security hardening
  • Full CRUD API: create, delete, enable/disable plugins from the UI
  • PM agent discovers plugins, DAG executor resolves plugin system prompts
  • Sample documentation_writer plugin

DAG Visualization (new):

  • React Flow interactive page with 7 status styles, BFS layout, live polling
  • Replay mode with timeline slider for execution history
  • DAG API endpoints reading from checkpoint DB

Dashboard & Frontend:

  • Health metrics, project stats, and system endpoints
  • Fix stale state on project navigation
  • Fix unicode crash in dashboard

Bug Fixes:

  • Lazy init _escalation_counts preventing AttributeError
  • Updated agent_runtime isolated_query signature

Testing

  • I have tested these changes locally
  • Python syntax check passes (python3 -m py_compile)
  • TypeScript compilation passes (npx tsc --noEmit)
  • Existing tests still pass (python3 -m pytest tests/)
  • New tests: 70 plugin, 28 DAG API, config validation, health metrics, project stats
  • 420+ total tests passing

Checklist

  • My code follows the project's code style
  • I have added comments where necessary
  • I have updated documentation if needed
  • My changes don't introduce new warnings

@cohen-liel cohen-liel force-pushed the feat/reliability-performance-overhaul branch 2 times, most recently from ee9ed66 to 912ab52 Compare April 11, 2026 19:33
Liel Cohen added 6 commits April 11, 2026 22:38
- DAG executor: async I/O, code review system, smarter retry and
  stuck graph detection, improved batch planning
- Contracts: simplified failure categories, role type changed to str
  with runtime validation for plugin extensibility
- PM agent: reads codebase directly, architect subprocess removed
- Config: tightened agent limits, memory management constants,
  experimental features disabled by default
- Memory: prevent unbounded growth, improved snapshot management
- Bug fixes: lazy init escalation tracking, agent runtime SDK signature
- New stats router with project-level analytics
- System endpoint improvements for health monitoring
- Project stats tracking across sessions
- Tests: config validation, health metrics, project stats
Users can create, enable/disable, and delete custom agent roles from
the UI or by dropping Python files in the plugins/ directory.

- PluginBase ABC with role_name, system_prompt, file_scope, is_writer
- PluginRegistry with discovery, hot-reload, enable/disable lifecycle
- Security: symlink traversal protection, role name validation,
  prompt injection sanitization in PM context
- API: POST create, DELETE remove, GET list, POST enable/disable
- Frontend: PluginsPage with create form, table, and delete
- DAG executor and debate engine resolve plugin system prompts
- Sample plugin: documentation_writer
- 70 tests covering registry, API CRUD, contracts validation
Interactive task graph page using React Flow with live updates and
execution replay.

- Backend: GET /dag and /dag/history endpoints from checkpoint DB
- DagNode component with 7 status styles and animations
- BFS layout algorithm for hierarchical positioning
- Live polling (3s) during active execution
- Replay mode with timeline slider to step through history
- Links from ConductorBar and DesktopLayout
- 28 tests covering API endpoints and edge cases
@cohen-liel cohen-liel force-pushed the feat/reliability-performance-overhaul branch from aaa8a17 to f3e3b8f Compare April 11, 2026 19:39
@cohen-liel cohen-liel merged commit b4a3f10 into main Apr 11, 2026
3 checks passed
@cohen-liel cohen-liel deleted the feat/reliability-performance-overhaul branch April 11, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant