Skip to content

Verification Training Integration

rUv edited this page Aug 13, 2025 · 1 revision

Verification-Training Integration

Overview

The Verification-Training Integration creates a continuous learning system that improves agent performance over time by learning from verification results. This is a real machine learning implementation that tracks, learns, and predicts based on actual verification outcomes.

How It Works

Learning Algorithm

The system uses an Exponential Moving Average (EMA) with a learning rate of 0.1:

newReliability = oldReliability * 0.9 + newScore * 0.1

This means:

  • Recent results have more weight than old results
  • System adapts quickly to changes in agent performance
  • Reliability scores smooth out temporary fluctuations

Quick Start

# Check current training status
./claude-flow verify-train status

# Feed verification data to training system
./claude-flow verify-train feed

# Predict verification outcome for a task
./claude-flow verify-train predict default coder

# Get agent recommendation
./claude-flow verify-train recommend

# Get improvement recommendations
./claude-flow verify-train recommendations

Real Example: Learning in Action

Initial State

./claude-flow verify-train status
# Output:
# 🤖 Agent Reliability:
#    coder: 62.5%

After Feeding Data

./claude-flow verify-train feed
# Output:
# 📊 Learning update for coder: reliability 0.625
# 📉 Agent coder is declining (0.000)

After Successful Verifications

# Run simulation with successful verifications
node test-verification.js
# Output:
# 📈 Agent coder is improving (+0.289)
# 📊 Learning update for coder: reliability 0.815

Updated Predictions

./claude-flow verify-train predict default coder
# Before: Recommendation: use_different_agent
# After:  Recommendation: add_additional_checks

Data Storage Structure

.claude-flow/
├── training/
│   └── verification-data.jsonl         # Training data (append-only)
├── models/
│   ├── verification-model.json         # Main learning model
│   ├── agent-coder.json               # Coder agent model
│   ├── agent-reviewer.json            # Reviewer agent model
│   └── agent-tester.json              # Tester agent model
└── metrics/
    └── agent-performance.json         # Performance metrics

Model Structure

Main Model (verification-model.json)

{
  "version": "1.0.0",
  "agentReliability": {
    "coder": 0.815,
    "reviewer": 0.566,
    "tester": 0.750
  },
  "patterns": {
    "coder_success": {
      "count": 10,
      "avgScore": 0.92,
      "checks": {
        "compile": { "success": 8, "failure": 2 },
        "test": { "success": 6, "failure": 4 },
        "lint": { "success": 9, "failure": 1 }
      }
    },
    "coder_failure": {
      "count": 30,
      "avgScore": 0.625,
      "checks": {
        "compile": { "success": 10, "failure": 20 },
        "test": { "success": 5, "failure": 25 },
        "lint": { "success": 15, "failure": 15 }
      }
    }
  }
}

Agent Model (agent-coder.json)

{
  "agentType": "coder",
  "totalTasks": 72,
  "successfulTasks": 12,
  "averageScore": 0.815,
  "scoreHistory": [/* last 100 scores */],
  "trend": {
    "direction": "improving",
    "change": 0.289,
    "recentAverage": 0.914,
    "previousAverage": 0.625
  },
  "checkPerformance": {
    "compile": { "total": 20, "passed": 15, "avgScore": 0.75 },
    "test": { "total": 20, "passed": 10, "avgScore": 0.50 },
    "lint": { "total": 20, "passed": 18, "avgScore": 0.90 }
  }
}

Training Features

1. Agent Reliability Tracking

  • Tracks success rate for each agent type
  • Updates reliability score with each verification
  • Maintains historical performance data

2. Pattern Recognition

  • Identifies which checks fail most often
  • Tracks success/failure patterns per agent
  • Learns which combinations lead to failures

3. Trend Detection

  • Compares recent performance to historical
  • Identifies improving or declining agents
  • Shows performance delta

4. Predictive Scoring

./claude-flow verify-train predict default coder

Output includes:

  • Predicted Score: Expected verification score (0.0-1.0)
  • Confidence: How confident the prediction is (based on data points)
  • Recommendation: Action to take based on prediction
  • Historical Success Rate: Past performance percentage
  • Data Points: Number of verifications used for prediction

5. Agent Recommendations

./claude-flow verify-train recommend

Suggests the best agent based on:

  • Historical reliability scores
  • Recent performance trends
  • Task type requirements

Recommendations System

The system provides actionable recommendations:

Agent Retraining

"Retrain coder agent - reliability below 70%"

Triggered when agent reliability falls below threshold.

Focus Areas

"Focus training on test, compile for coder"

Identifies specific checks that fail most often.

Data Collection

"Run more verification cycles to improve training accuracy"

Suggests when more data is needed for accurate predictions.

Integration with Verification

Automatic Learning

Every verification automatically feeds the training system:

// After each verification:
1. Score is calculated
2. Result fed to training
3. Agent model updated
4. Reliability recalculated
5. Trends detected
6. Predictions updated

Manual Training Feed

# Feed all existing verification data
./claude-flow verify-train feed

# This processes all records in .swarm/verification-memory.json

Performance Metrics

What's Tracked

  • Total Verifications: Number of verification runs
  • Pass Rate: Percentage of successful verifications
  • Average Score: Mean verification score
  • Agent Reliability: Per-agent success rate
  • Performance by Hour: Time-based patterns
  • Check-Specific Performance: Success rate per check type

Viewing Metrics

# Full status report
./claude-flow verify-train status

# Agent-specific metrics
cat .claude-flow/models/agent-coder.json | jq

# Performance trends
cat .claude-flow/metrics/agent-performance.json | jq

Use Cases

1. Task Assignment

Before assigning a task, check predicted success:

./claude-flow verify-train predict api-development coder
# If low confidence, consider different agent

2. Agent Selection

Get the best agent for a task:

./claude-flow verify-train recommend
# Use recommended agent with highest reliability

3. Performance Monitoring

Track agent improvement over time:

./claude-flow verify-train status
# Review trends and reliability scores

4. Quality Improvement

Identify areas needing attention:

./claude-flow verify-train recommendations
# Follow suggestions to improve system

Best Practices

1. Regular Data Feeding

# Daily cron job
0 0 * * * cd /project && ./claude-flow verify-train feed

2. Monitor Trends

# Check for declining agents weekly
./claude-flow verify-train status | grep "declining"

3. Act on Recommendations

# Review and implement suggestions
./claude-flow verify-train recommendations

# Example actions:
# - Retrain underperforming agents
# - Focus on failing check types
# - Collect more verification data

4. Use Predictions for Planning

# Before critical tasks
PREDICTION=$(./claude-flow verify-train predict critical coder)
echo $PREDICTION | jq '.confidence'
# If confidence < 0.7, add extra verification steps

Troubleshooting

No Improvement After Training

# Check if new data is being added
ls -la .claude-flow/training/verification-data.jsonl
# File should be growing

# Verify learning rate
grep learningRate src/cli/simple-commands/verification-training-integration.js
# Should be 0.1 for balanced learning

Predictions Not Accurate

# Check data points
./claude-flow verify-train status
# Need at least 50+ data points for good predictions

# Review pattern distribution
cat .claude-flow/models/verification-model.json | jq '.patterns'
# Should have both success and failure patterns

Agent Reliability Not Updating

# Manually trigger update
./claude-flow verify-train feed

# Check model file
cat .claude-flow/models/verification-model.json | jq '.agentReliability'
# Values should change after feeding data

Advanced Configuration

Adjust Learning Rate

Edit src/cli/simple-commands/verification-training-integration.js:

this.learningRate = 0.1; // Default: 0.1
// Higher = faster learning but more volatile
// Lower = slower learning but more stable

Custom Prediction Thresholds

// In predictVerificationOutcome()
if (predictedScore < 0.5) {
  recommendation = 'use_different_agent';
} else if (predictedScore < 0.75) {
  recommendation = 'add_additional_checks';
}

Historical Data Limit

// Keep last N scores for trend analysis
if (agentModel.scoreHistory.length > 100) {
  agentModel.scoreHistory = agentModel.scoreHistory.slice(-100);
}

Future Enhancements

Planned Features

  1. Neural Network Integration: Deep learning for complex patterns
  2. Multi-Factor Predictions: Consider time, complexity, dependencies
  3. A/B Testing: Compare different learning algorithms
  4. Collaborative Filtering: Learn from similar projects
  5. Real-time Dashboards: Web UI for monitoring

Research Areas

  • Reinforcement learning for agent selection
  • Transfer learning between projects
  • Ensemble methods for predictions
  • Anomaly detection for unusual failures

Related Documentation


The Verification-Training Integration creates a self-improving system that learns from every verification, continuously enhancing agent reliability and task success rates through real machine learning.

Clone this wiki locally