Skip to content

Kaspre/openclaw-observability-plugin

Β 
Β 

Repository files navigation

OpenClaw Observability

Documentation License: MIT

OpenTelemetry observability for OpenClaw AI agents.

πŸ“– Full Documentation β€” Setup guides, configuration reference, and backend examples.

Two Approaches to Observability

This repository documents two complementary approaches to monitoring OpenClaw:

Approach Best For Setup Complexity
Official Plugin Operational metrics, Gateway health, cost tracking Simple config
Custom Plugin Deep tracing, tool call visibility, request lifecycle Plugin installation

Recommendation: Use both for complete observability.


Approach 1: Official Diagnostics Plugin (Built-in)

OpenClaw v2026.2+ includes built-in OpenTelemetry support. Just add to openclaw.json:

{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}

Then restart:

openclaw gateway restart

What It Captures

Metrics:

  • openclaw.tokens β€” Token usage by type (input/output/cache)
  • openclaw.cost.usd β€” Estimated model cost
  • openclaw.run.duration_ms β€” Agent run duration
  • openclaw.context.tokens β€” Context window usage
  • openclaw.webhook.* β€” Webhook processing stats
  • openclaw.message.* β€” Message processing stats
  • openclaw.queue.* β€” Queue depth and wait times
  • openclaw.session.* β€” Session state transitions

Traces: Model usage, webhook processing, message processing, stuck sessions

Logs: All Gateway logs via OTLP with severity, subsystem, and code location


Approach 2: Custom Hook-Based Plugin (This Repo)

For deeper observability, install the custom plugin from this repo. It uses OpenClaw's typed plugin hooks to capture the full agent lifecycle.

What It Adds

Connected Traces:

openclaw.request (root span)
β”œβ”€β”€ openclaw.agent.turn
β”‚   β”œβ”€β”€ tool.Read (file read)
β”‚   β”œβ”€β”€ tool.exec (shell command)  
β”‚   β”œβ”€β”€ tool.Write (file write)
β”‚   └── tool.web_search
└── (child spans connected via trace context)

Per-Tool Visibility:

  • Individual spans for each tool call
  • Tool execution time
  • Result size (characters)
  • Error tracking per tool

Request Lifecycle:

  • Full message β†’ response tracing
  • Session context propagation
  • Agent turn duration with token breakdown

Installation

  1. Clone this repository:

    git clone https://github.com/henrikrexed/openclaw-observability-plugin.git
  2. Add to your openclaw.json:

    {
      "plugins": {
        "load": {
          "paths": ["/path/to/openclaw-observability-plugin"]
        },
        "entries": {
          "otel-observability": {
            "enabled": true
          }
        }
      }
    }
  3. Clear cache and restart:

    rm -rf /tmp/jiti
    systemctl --user restart openclaw-gateway

Comparing the Two Approaches

Feature Official Plugin Custom Plugin
Token metrics βœ… Per model βœ… Per session + model
Cost tracking βœ… Yes βœ… Yes (from diagnostics)
Gateway health βœ… Webhooks, queues, sessions ❌ Not focused
Session state βœ… State transitions ❌ Not tracked
Tool call tracing ❌ No βœ… Individual tool spans
Request lifecycle ❌ No βœ… Full request β†’ response
Connected traces ❌ Separate spans βœ… Parent-child hierarchy
Setup complexity 🟒 Config only 🟑 Plugin installation

Backend Examples

Dynatrace (Direct)

{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "https://{env-id}.live.dynatrace.com/api/v2/otlp",
      "headers": {
        "Authorization": "Api-Token {your-token}"
      },
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}

Grafana Cloud

{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "https://otlp-gateway-{region}.grafana.net/otlp",
      "headers": {
        "Authorization": "Basic {base64-credentials}"
      },
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true
    }
  }
}

Local OTel Collector

{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}

Configuration Reference

Official Plugin Options

Option Type Default Description
diagnostics.enabled boolean false Enable diagnostics system
diagnostics.otel.enabled boolean false Enable OTel export
diagnostics.otel.endpoint string β€” OTLP endpoint URL
diagnostics.otel.protocol string "http/protobuf" Protocol
diagnostics.otel.headers object β€” Custom headers
diagnostics.otel.serviceName string "openclaw" Service name
diagnostics.otel.traces boolean true Enable traces
diagnostics.otel.metrics boolean true Enable metrics
diagnostics.otel.logs boolean false Enable logs
diagnostics.otel.sampleRate number 1.0 Trace sampling (0-1)

Custom Plugin Options

Important: Do NOT add a config block inside the plugin entry β€” OpenClaw's plugin framework rejects unknown properties. The plugin reads its configuration from the diagnostics.otel section instead.

The following settings are controlled via the diagnostics.otel config block:

Option Type Default Description
endpoint string http://localhost:4318 OTLP endpoint URL
serviceName string openclaw-gateway Service name
protocol string http/protobuf OTLP protocol
traces boolean true Enable traces
metrics boolean true Enable metrics
logs boolean true Enable logs

Documentation


Optional: Kernel-Level Security with Tetragon

For defense in depth, add Tetragon eBPF-based monitoring. While the plugins above capture application-level telemetry, Tetragon sees what happens at the kernel level β€” file access, process execution, network connections, and privilege changes.

Why Tetragon?

  • Tamper-proof: Even a compromised agent can't hide its kernel-level actions
  • Sensitive file detection: Alert when .env, SSH keys, or credentials are accessed
  • Dangerous command detection: Catch rm, curl | sh, chmod 777, etc.
  • Privilege escalation: Detect setuid/setgid attempts

Quick Setup

# Install Tetragon
curl -LO https://github.com/cilium/tetragon/releases/latest/download/tetragon-v1.6.0-amd64.tar.gz
tar -xzf tetragon-v1.6.0-amd64.tar.gz && cd tetragon-v1.6.0-amd64
sudo ./install.sh

# Create OpenClaw policies directory
sudo mkdir -p /etc/tetragon/tetragon.tp.d/openclaw

# Add policies (see docs/security/tetragon.md for full examples)
# Start Tetragon
sudo systemctl enable --now tetragon

Tetragon events are exported to /var/log/tetragon/tetragon.log and can be ingested by the OTel Collector using the filelog receiver.

Complete Observability Stack

Layer Source What It Shows
Application Custom Plugin Tool calls, tokens, request flow
Gateway Official Plugin Session health, queues, costs
Kernel Tetragon System calls, file access, network

See Security: Tetragon for full installation and configuration guide.


Known Limitations

Auto-instrumentation not possible: OpenLLMetry/IITM breaks @mariozechner/pi-ai named exports due to ESM/CJS module isolation. All telemetry is captured via hooks, not direct SDK instrumentation.

No per-LLM-call spans: Individual API calls to Claude/OpenAI cannot be traced. Token usage is aggregated per agent turn.

See Limitations for details.


License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 96.6%
  • JavaScript 3.4%