Skip to content

Truth Verification System

rUv edited this page Aug 13, 2025 · 2 revisions

Truth Verification System

Overview

The Truth Verification System is a framework that provides verification and truth scoring for multi-agent operations in Claude-Flow. It includes real verification checks, training integration for continuous improvement, and practical tools for quality assurance.

Important Note: This system is currently a functional prototype that demonstrates verification concepts. While it performs real checks (compile, test, lint), full integration with all agent operations is still in development.

Quick Start

Initialize Verification System

# Initialize with specific mode (local version)
./claude-flow verify init strict      # Production mode (0.95 threshold)
./claude-flow verify init moderate    # Default mode (0.85 threshold)
./claude-flow verify init development # Development mode (0.75 threshold)

# After npm publish, will be available as:
npx claude-flow@alpha verify init strict

Current Implementation Status

✅ What's Working

  1. Real Verification Checks

    • Runs actual npm run typecheck, npm test, npm run lint
    • Returns real scores based on actual command output
    • Stores verification history in .swarm/verification-memory.json
  2. Truth Scoring & Reporting

    ./claude-flow truth              # Basic truth scores
    ./claude-flow truth --report     # Detailed breakdown
    ./claude-flow truth --analyze    # Failure pattern analysis
    ./claude-flow truth --json       # Machine-readable output
    ./claude-flow truth --export report.json  # Export to file
  3. Training Integration (NEW!)

    ./claude-flow verify-train status     # Show training status
    ./claude-flow verify-train feed       # Feed verification data to training
    ./claude-flow verify-train predict    # Predict verification outcomes
    ./claude-flow verify-train recommend  # Get agent recommendations
  4. Verification Hooks

    # Manual verification hooks
    node src/cli/simple-commands/verification-hooks.js pre task-123 coder
    node src/cli/simple-commands/verification-hooks.js post task-123 coder
    node src/cli/simple-commands/verification-hooks.js status

⚠️ What's In Progress

  1. Automatic Integration: Verification isn't automatically called by swarm/agent commands yet
  2. Limited Agent Types: Only 4 agent types have specific verification logic
  3. Consensus Features: Multi-agent consensus is simulated, not implemented

Verification-Training Integration

How Learning Works

The system uses real machine learning to improve over time:

  1. Exponential Moving Average: Updates agent reliability scores with learning rate of 0.1
  2. Pattern Recognition: Tracks which checks (compile, test, lint) succeed/fail most often
  3. Trend Detection: Identifies if agents are improving or declining
  4. Predictive Scoring: Predicts future verification outcomes based on history

Example: System Learning in Action

# Initial state: coder agent at 62.5% reliability
./claude-flow verify-train status

# Feed verification data to training
./claude-flow verify-train feed

# After 10 successful verifications:
# - Coder reliability: 62.5% → 81.5% 
# - System shows: "📈 Agent coder is improving (+0.289)"
# - Prediction changes: "use_different_agent" → "add_additional_checks"
# - Confidence increases: 60% → 70%

Training Data Storage

.claude-flow/
├── training/
│   └── verification-data.jsonl    # Training data in JSONL format
├── models/
│   ├── verification-model.json    # Main learning model
│   └── agent-coder.json          # Agent-specific models
└── metrics/
    └── agent-performance.json     # Performance metrics

Actual Verification Process

What Happens During Verification

// For Coder Agents:
1. compile:   Runs 'npm run typecheck'  Score based on errors
2. test:      Runs 'npm test'  Score based on pass/fail
3. lint:      Runs 'npm run lint'  Score based on warnings/errors
4. typecheck: Runs 'npm run typecheck'  Score based on errors

// Scores:
- No errors: 1.0
- Warnings only: 0.8
- Errors: 0.5
- Command fails: 0.3

Real Rollback Mechanism

When verification fails and rollback is enabled:

# Set environment variable
export VERIFICATION_ROLLBACK=true

# If verification fails, system runs:
git reset --hard HEAD

# You'll see:
"🔄 Attempting rollback..."
"✅ Rollback completed"

CLI Commands

Verification Commands

# Initialize verification system
./claude-flow verify init <mode>

# Run verification on a task
./claude-flow verify verify task-123 --agent coder

# Check verification status
./claude-flow verify status

Truth Scoring Commands

# Basic truth report
./claude-flow truth

# Detailed analysis options
./claude-flow truth --report           # Detailed breakdown
./claude-flow truth --analyze          # Failure patterns
./claude-flow truth --agent coder      # Agent-specific
./claude-flow truth --detailed         # With history
./claude-flow truth --json            # JSON output only
./claude-flow truth --export file.json # Export to file

Training Integration Commands

# Check training status
./claude-flow verify-train status

# Feed existing verifications to training
./claude-flow verify-train feed

# Predict if a task will pass
./claude-flow verify-train predict default coder
# Output: Predicted Score: 0.613, Confidence: 70%

# Get agent recommendation
./claude-flow verify-train recommend
# Output: Recommended: coder, Reliability: 81.5%

# Get improvement recommendations
./claude-flow verify-train recommendations

Environment Variables

# Set verification mode
export VERIFICATION_MODE=strict        # strict/moderate/development

# Enable automatic rollback
export VERIFICATION_ROLLBACK=true

# Custom threshold
export VERIFICATION_THRESHOLD=0.95

Integration Examples

Manual Verification in Scripts

#!/bin/bash
# Run pre-task verification
node src/cli/simple-commands/verification-hooks.js pre $TASK_ID coder

# Execute task
npm run build

# Run post-task verification
node src/cli/simple-commands/verification-hooks.js post $TASK_ID coder

# Feed results to training
node src/cli/simple-commands/verification-hooks.js train $TASK_ID coder

CI/CD Integration

name: Verification Pipeline
on: [push]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Run Verification
        run: |
          ./claude-flow verify init strict
          ./claude-flow verify verify ${{ github.sha }} --agent coder
      
      - name: Check Truth Score
        run: |
          SCORE=$(./claude-flow truth --json | jq .averageScore)
          echo "Truth score: $SCORE"
          if (( $(echo "$SCORE < 0.85" | bc -l) )); then
            exit 1
          fi

How Training Improves the System

Learning Cycle

  1. Verification Runs → Generates scores (pass/fail)
  2. Training Ingests → Updates agent reliability scores
  3. Pattern Detection → Identifies common failure points
  4. Prediction Improves → Better task outcome predictions
  5. Recommendations Update → Suggests best agents for tasks

Agent Performance Tracking

// Example: .claude-flow/models/agent-coder.json
{
  "agentType": "coder",
  "totalTasks": 72,
  "successfulTasks": 12,
  "averageScore": 0.815,
  "trend": {
    "direction": "improving",
    "change": 0.289,
    "recentAverage": 0.914,
    "previousAverage": 0.625
  },
  "checkPerformance": {
    "compile": { "total": 20, "passed": 15, "avgScore": 0.75 },
    "test": { "total": 20, "passed": 10, "avgScore": 0.50 },
    "lint": { "total": 20, "passed": 18, "avgScore": 0.90 }
  }
}

Current Limitations

Integration Gaps

  1. Not Auto-Integrated: You need to manually run verification commands
  2. Basic Checks Only: Runs npm scripts, not sophisticated analysis
  3. Limited Agent Types: Only coder, reviewer, tester, architect have specific logic
  4. No Real Consensus: Multi-agent consensus is simulated

What's Simulated vs Real

Feature Status Details
Compile Check ✅ Real Runs actual npm run typecheck
Test Execution ✅ Real Runs actual npm test
Lint Check ✅ Real Runs actual npm run lint
Git Rollback ✅ Real Runs actual git reset --hard
Training System ✅ Real Real learning with persistence
Agent Consensus ❌ Simulated Returns hardcoded values
Byzantine Tolerance ❌ Simulated Not implemented
Cryptographic Signing ❌ Simulated Not implemented

Best Practices

1. Regular Training Updates

# Daily: Feed new verification data to training
./claude-flow verify-train feed

# Weekly: Check training recommendations
./claude-flow verify-train recommendations

# Monthly: Review agent performance trends
./claude-flow verify-train status

2. Monitor Agent Reliability

# Check which agents are performing well
./claude-flow verify-train status | grep "Agent Reliability"

# Get recommendations for underperforming agents
./claude-flow verify-train recommendations

3. Use Predictions for Task Planning

# Before assigning a task, check predicted success
./claude-flow verify-train predict default coder

# If prediction is low, consider:
# - Using a different agent
# - Adding additional checks
# - Reviewing recent failures

Troubleshooting

Low Truth Scores

# Analyze what's failing
./claude-flow truth --analyze

# Check specific agent
./claude-flow truth --agent coder --detailed

# Review training recommendations
./claude-flow verify-train recommendations

Verification Not Running

# Check if npm scripts exist
npm run typecheck  # Should be defined in package.json
npm run test       # Should be defined in package.json
npm run lint       # Should be defined in package.json

# Run manual verification
node src/cli/simple-commands/verification-hooks.js status

Training Not Improving

# Check if data is being fed
./claude-flow verify-train status
# Look for "Training Data Points" - should be increasing

# Manually trigger learning
./claude-flow verify-train feed

# Check agent trends
cat .claude-flow/models/agent-coder.json | jq '.trend'

Future Development

Planned Improvements

  1. Auto-Integration: Automatic verification for all agent operations
  2. Deep Code Analysis: AST-based verification, not just npm scripts
  3. Real Consensus: Actual multi-agent voting system
  4. Smart Rollback: Selective rollback of only failed changes
  5. Dashboard UI: Web interface for monitoring verification metrics

How to Contribute

The verification system is ready for enhancement. Key areas:

  1. Integration Points: Add verification hooks to swarm/agent commands
  2. Check Types: Add more verification checks beyond compile/test/lint
  3. Agent Types: Add verification logic for more agent types
  4. Training Models: Improve prediction algorithms

Related Documentation


The Truth Verification System provides a foundation for quality assurance in AI-generated code. While not fully integrated, it offers real verification checks, learning capabilities, and practical tools for improving agent reliability over time.

Clone this wiki locally