Summary
Add comprehensive metrics and observability capabilities to eggai-clutch, enabling monitoring of pipeline performance, step latency, error rates, and resource utilization through pluggable exporters.
Motivation
Production deployments require visibility into:
- Pipeline latency - End-to-end and per-step timing
- Error rates - Failure patterns across agents
- Throughput - Requests per second capacity
- Resource usage - Queue depths, memory, concurrent executions
The existing on_step hook provides basic visibility but lacks structured metrics.
Proposed API
Metrics Configuration
from eggai_clutch import Clutch, Strategy, MetricsConfig
clutch = Clutch(
"customer_support",
strategy=Strategy.SEQUENTIAL,
metrics=MetricsConfig(
enabled=True,
exporter="prometheus",
prometheus_port=9090,
labels={"service": "support", "env": "prod"},
),
)
Built-in Metrics
| Metric |
Type |
Description |
clutch_step_duration_seconds |
Histogram |
Per-step execution time |
clutch_execution_duration_seconds |
Histogram |
Total pipeline time |
clutch_executions_total |
Counter |
Execution count |
clutch_step_errors_total |
Counter |
Error count by type |
clutch_active_executions |
Gauge |
Currently running pipelines |
Hook Integration
clutch = Clutch(
"pipeline",
on_step=async_step_callback, # existing
on_metric=metric_callback, # NEW
)
async def metric_callback(event: MetricEvent):
print(f"{event.name}: {event.value} {event.labels}")
Exporter Interface
class MetricsExporter(ABC):
async def record_histogram(self, name: str, value: float, labels: dict) -> None: ...
async def increment_counter(self, name: str, value: int, labels: dict) -> None: ...
async def set_gauge(self, name: str, value: float, labels: dict) -> None: ...
Implementation Scope
Phase 1: Core Infrastructure
MetricsConfig and MetricEvent types
- Timing in
_call_handler
on_metric callback
- Callback exporter
Phase 2: Exporters
- Prometheus exporter with HTTP endpoint
- StatsD exporter
- OpenTelemetry exporter (optional dependency)
Phase 3: Dashboard Examples
- Grafana dashboard templates
- Documentation
Related Issues
Reference
See eggai-clutch-rfcs/RFC-002-metrics-hooks.md for full specification.
Summary
Add comprehensive metrics and observability capabilities to eggai-clutch, enabling monitoring of pipeline performance, step latency, error rates, and resource utilization through pluggable exporters.
Motivation
Production deployments require visibility into:
The existing
on_stephook provides basic visibility but lacks structured metrics.Proposed API
Metrics Configuration
Built-in Metrics
clutch_step_duration_secondsclutch_execution_duration_secondsclutch_executions_totalclutch_step_errors_totalclutch_active_executionsHook Integration
Exporter Interface
Implementation Scope
Phase 1: Core Infrastructure
MetricsConfigandMetricEventtypes_call_handleron_metriccallbackPhase 2: Exporters
Phase 3: Dashboard Examples
Related Issues
Reference
See
eggai-clutch-rfcs/RFC-002-metrics-hooks.mdfor full specification.