A fault-tolerant multi-agent orchestration framework built with LangGraph, featuring autonomous recovery mechanisms, hierarchical supervision, and real-time monitoring capabilities.
ROSAN is a production-ready framework for creating resilient multi-agent systems that can:
- Autonomous Recovery: Automatically detect and recover from agent failures
- Hierarchical Supervision: Supervisor-worker architecture with intelligent task distribution
- Real-time Monitoring: Live dashboard showing system health, agent status, and workflow progress
- Fault Tolerance: Byzantine fault-tolerant consensus mechanisms
- Scalable Orchestration: Dynamic agent scaling and workflow management
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROSAN Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Frontend β β Backend API β β Data Layer β β
β β Dashboard β β (Port 3001) β β β β
β β (Port 5173) β β β β PostgreSQL β β
β β β β βββββββββββββββ β β (Port 5432) β β
β β β’ React UI β β β LangGraph β β β β β
β β β’ Real-time β β β Orchestrationβ β β Redis β β
β β β’ Monitoring β β β Engine β β β (Port 6379) β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β β β β
β ββββββββββββββββββββββββΌβββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Agent Layer β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β β
β β β Supervisor β β Inspector β β Recovery Subgraphs β β β
β β β Agents β β Agents β β β β β
β β β β β β β β’ Failure Detection β β β
β β β β’ Task Mgmt β β β’ Validation β β β’ Auto-Recovery β β β
β β β β’ Monitoringβ β β’ Auditing β β β’ Rollback β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β β
β β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Worker Agents β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β β
β β β Worker 1 β β Worker 2 β β Worker N β β β
β β β β β β β β β β
β β β β’ Execute β β β’ Execute β β β’ Execute Tasks β β β
β β β β’ Report β β β’ Report β β β’ Report Status β β β
β β β β’ Challenge β β β’ Challenge β β β’ Challenge Invalid β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
System Requirements:
- Node.js: >= 18.0.0
- npm: >= 8.0.0
- Docker: >= 20.10.0
- Docker Compose: >= 2.0.0
- Operating System: Linux, macOS, or Windows with WSL2
Required Services:
- PostgreSQL 14+ (automatically provisioned)
- Redis 7+ (automatically provisioned)
- LangGraph API key (required for orchestration)
# Clone the repository
git clone <repository-url>
cd ROSAN
# Set up everything automatically
chmod +x scripts/setup-docker.sh
./scripts/setup-docker.sh
# Start the application
npm startWhat this does:
- β Creates PostgreSQL and Redis containers
- β Configures environment variables
- β Builds Docker images
- β Starts all services
- β Initializes database schema
- β Validates setup
If you prefer manual setup or need custom configuration:
# 1. Install dependencies
npm install
# 2. Set up environment
cp .env.example .env
# Edit .env with your configuration
# 3. Start database services
docker-compose -f docker-compose.dev.yml up -d postgres redis
# 4. Initialize database
npm run db:migrate
# 5. Start application
npm run dev:backend
npm run dev:frontendCreate a .env file with the following configuration:
# Server Configuration
NODE_ENV=development
PORT=3001
HOST=localhost
# Database Configuration
DATABASE_URL=postgresql://rosan_user:password@localhost:5432/rosan_db
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_NAME=rosan_db
DATABASE_USER=rosan_user
DATABASE_PASSWORD=your_secure_password
DATABASE_SSL=false
# Redis Configuration
REDIS_URL=redis://localhost:6379
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your_redis_password
REDIS_DB=0
# LangGraph Configuration (REQUIRED)
LANGGRAPH_API_KEY=your_langgraph_api_key
LANGGRAPH_PROJECT_ID=rosan_dev
LANGGRAPH_ENVIRONMENT=development
# JWT Configuration
JWT_SECRET=your_super_secret_jwt_key_change_in_production
JWT_EXPIRES_IN=24h
JWT_REFRESH_EXPIRES_IN=7d
# Agent Configuration
AGENT_TIMEOUT=30000
AGENT_MAX_RETRIES=3
AGENT_RETRY_DELAY=1000
MAX_CONCURRENT_AGENTS=10
# WebSocket Configuration
WS_PORT=3002
WS_PATH=/socket.io
WS_CORS_ORIGIN=http://localhost:3000
# Logging Configuration
LOG_LEVEL=info
LOG_FILE_PATH=./logs/rosan.log
LOG_MAX_SIZE=20m
LOG_MAX_FILES=14d
# Security Configuration
CORS_ORIGIN=http://localhost:3000
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100
HELMET_ENABLED=trueRequired for ROSAN functionality:
- Visit LangGraph
- Create an account and obtain API key
- Set
LANGGRAPH_API_KEYin your environment - Test connection:
curl -H "Authorization: Bearer YOUR_KEY" https://api.langgraph.com
Development Mode:
# Start all services
npm start
# Or start components individually
npm run dev:backend # Backend API on port 3001
npm run dev:frontend # Frontend dashboard on port 5173
npm run dashboard # Alternative dashboard on port 3000Production Mode:
# Build for production
npm run build
# Start production server
npm run start:prod| Service | URL | Description |
|---|---|---|
| Backend API | http://localhost:3001 | REST API and WebSocket server |
| Frontend Dashboard | http://localhost:5173 | Main monitoring dashboard |
| Alternative Dashboard | http://localhost:3000 | Alternative dashboard interface |
| Health Check | http://localhost:3001/health | System health status |
| API Status | http://localhost:3001/api/v1/status | Detailed system status |
# Check system health
curl http://localhost:3001/health
# Get detailed system status
curl http://localhost:3001/api/v1/status
# Create a workflow (when agents are running)
curl -X POST http://localhost:3001/api/v1/workflows/create \
-H "Content-Type: application/json" \
-d '{"name":"my-workflow","description":"Test workflow"}'The ROSAN dashboard provides:
- System Overview: Real-time system health and metrics
- Agent Network: Visual representation of agent topology
- Workflow Monitoring: Active workflows and their progress
- Recovery Operations: Failure detection and recovery status
- Performance Metrics: Resource usage and response times
- Alert Center: System alerts and notifications
# Run all tests
npm test
# Run specific test suites
npm run test:unit # Unit tests
npm run test:integration # Integration tests
npm run test:e2e # End-to-end tests
npm run test:coverage # Tests with coverage reportROSAN includes comprehensive test coverage for:
- β Backend API: All endpoints and business logic
- β Database Layer: Schema validation and queries
- β Agent Framework: Supervisor-worker interactions
- β Recovery Systems: Failure detection and recovery
- β WebSocket Communication: Real-time data flow
- β Security: Authentication and authorization
# Validate entire setup
npm run validate
# Check service health
npm run health-check
# Test database connectivity
npm run test:db
# Test Redis connectivity
npm run test:redisROSAN/
βββ src/
β βββ agents/ # Agent framework
β β βββ supervisor/ # Supervisor agents
β β βββ workers/ # Worker agents
β β βββ inspector/ # Inspector agents
β β βββ communication/ # Agent communication
β βββ dashboard/ # Frontend React dashboard
β β βββ components/ # React components
β β βββ pages/ # Dashboard pages
β β βββ hooks/ # Custom React hooks
β β βββ store/ # Redux state management
β βββ database/ # Database models and schemas
β βββ langgraph/ # LangGraph orchestration
β βββ security/ # Security and monitoring
β βββ server/ # Express.js API server
β βββ utils/ # Utility functions
βββ tests/ # Test files
βββ scripts/ # Setup and utility scripts
βββ docs/ # Documentation
βββ config/ # Configuration files
βββ docker-compose.dev.yml # Docker configuration
// src/agents/your-agent/YourAgent.ts
import { ResilientAgent } from '../base/resilient-agent';
export class YourAgent extends ResilientAgent {
constructor(config: AgentConfig) {
super(config);
}
async execute(task: Task): Promise<TaskResult> {
// Implement your agent logic here
try {
const result = await this.processTask(task);
// Inspector validation
const isValid = await this.inspector.validate(result);
if (!isValid) {
throw new Error('Task validation failed');
}
return result;
} catch (error) {
// Automatic recovery
return this.supervisor.handleFailure(task, error);
}
}
}// src/langgraph/workflows/your-workflow.ts
import { StateGraph } from '@langchain/langgraph';
export const createYourWorkflow = () => {
const workflow = new StateGraph(YourStateSchema);
workflow
.addNode('validate', validateInput)
.addNode('process', processData)
.addNode('recover', handleFailure)
.addEdge('validate', 'process')
.addEdge('process', END)
.addConditionalEdges('process', shouldRecover, {
recover: 'recover',
end: END
});
return workflow.compile();
};# Start all services
docker-compose -f docker-compose.dev.yml up -d
# View logs
docker-compose -f docker-compose.dev.yml logs -f
# Stop services
docker-compose -f docker-compose.dev.yml down# Build production images
docker build -f Dockerfile.backend -t rosan-backend .
docker build -f Dockerfile.frontend -t rosan-frontend .
# Run with external databases
docker run -d \
--name rosan-backend \
-p 3001:3001 \
--env-file .env \
rosan-backend| Service | Internal Port | External Port | Description |
|---|---|---|---|
| Backend API | 3001 | 3001 | Main API server |
| Frontend | 5173 | 5173 | Development dashboard |
| PostgreSQL | 5432 | 5432 | Primary database |
| Redis | 6379 | 6379 | Cache and session store |
| Prometheus | 9090 | 9090 | Metrics collection |
| Grafana | 3000 | 3002 | Monitoring dashboard |
1. Port Conflicts
# Check what's using ports
netstat -tulpn | grep :3001
lsof -i :5432
# Kill conflicting processes
sudo kill -9 <PID>2. Database Connection Issues
# Check PostgreSQL container
docker ps | grep postgres
docker logs rosan-postgres
# Test connection
PGPASSWORD=your_password psql -h localhost -p 5432 -U rosan_user -d rosan_db3. Redis Connection Issues
# Check Redis container
docker ps | grep redis
docker logs rosan-redis
# Test connection
redis-cli -h localhost -p 6379 -a your_password ping4. Frontend Build Errors
# Clear node modules and reinstall
rm -rf node_modules package-lock.json
npm install
# Check TypeScript compilation
npm run type-check5. API Not Responding
# Check backend logs
docker logs rosan-backend
# Verify environment variables
cat .env | grep -E "(PORT|DATABASE|REDIS)"
# Test health endpoint
curl -v http://localhost:3001/healthEnable detailed logging:
# Set debug environment
export DEBUG=rosan:*
export LOG_LEVEL=debug
# Start with verbose logging
npm run dev:backend -- --verbose# Monitor resource usage
docker stats
# Check database performance
PGPASSWORD=your_password psql -h localhost -p 5432 -U rosan_user -d rosan_db -c "
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC LIMIT 10;"
# Profile Node.js application
npm run profile| Metric | Value | Description |
|---|---|---|
| API Response Time | < 50ms | Average API endpoint response |
| WebSocket Latency | < 10ms | Real-time communication latency |
| Database Query Time | < 100ms | Average PostgreSQL query |
| Cache Hit Rate | > 95% | Redis cache performance |
| Memory Usage | < 512MB | Typical application memory |
| CPU Usage | < 10% | Normal operation CPU usage |
- Horizontal Scaling: Supports multiple backend instances
- Database Scaling: PostgreSQL read replicas supported
- Cache Scaling: Redis clustering supported
- Agent Scaling: Dynamic agent creation and termination
- Single Region: Designed for single-region deployment
- Database: PostgreSQL connection pooling required for high load
- Memory: Agent state stored in memory (persistence available)
- API Rate Limits: Configurable rate limiting for API endpoints
- JWT Tokens: Secure token-based authentication
- API Keys: LangGraph API key authentication
- Database Security: Password-based authentication with SSL
- CORS: Configurable cross-origin resource sharing
- Rate Limiting: Configurable request rate limits
- HTTPS: SSL/TLS encryption in production
- Firewall: Configurable network access rules
- Encryption: Data encryption in transit and at rest
- Audit Logging: Comprehensive audit trail
- Backup: Automated database backups
- Compliance: GDPR and SOC 2 compliance features
GET /healthReturns system health status.
GET /api/v1/statusReturns detailed system status including agents and workflows.
POST /api/v1/workflows/create
Content-Type: application/json
{
"name": "workflow-name",
"description": "Workflow description",
"config": {}
}socket.on('agent:status', (data) => {
console.log('Agent status:', data);
});socket.on('workflow:progress', (data) => {
console.log('Workflow progress:', data);
});socket.on('system:alert', (alert) => {
console.log('System alert:', alert);
});- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and add tests
- Run the test suite:
npm test - Ensure all tests pass:
npm run test:coverage - Submit a pull request
- TypeScript: Strict TypeScript with type definitions
- ESLint: Follow configured linting rules
- Prettier: Use Prettier for code formatting
- Tests: Maintain >80% test coverage
Include the following information:
- ROSAN Version:
npm run version - Node.js Version:
node --version - Operating System:
uname -a - Error Message: Full error stack trace
- Steps to Reproduce: Detailed reproduction steps
- Expected Behavior: What should happen
- Actual Behavior: What actually happens
This project is licensed under the MIT License - see the LICENSE file for details.
- API Documentation:
/docs/api/ - Agent Development:
/docs/agents/ - Deployment Guide:
/docs/deployment/ - Troubleshooting:
/docs/troubleshooting/
- GitHub Issues: Report bugs and request features
- Discussions: Community discussions and Q&A
- Wiki: Community-maintained documentation
For enterprise support and custom development, contact the ROSAN team.
ROSAN: Building the future of resilient multi-agent systems through autonomous orchestration and self-healing capabilities.