Skip to content

nkcodec/zabbix-kafka

Repository files navigation

Zabbix → Kafka Streaming Collector

Continuous micro-interval poller that streams Zabbix history data into Kafka with checkpoint recovery and Kafka queue backpressure handling.


Architecture Overview

            ┌─────────────────────────────┐
            │   Zabbix API (host groups)  │
            └──────────────┬──────────────┘
                           │
                 host_groups() / host.get()
                           │
        ┌──────────────────▼───────────────────┐
        │    Streaming Loop (`main.rs`)        │
        │  • Ticker (micro_interval_ms)        │
        │  • Per-group time window selection   │
        │  • CheckpointStore update/persist    │
        └──────────────┬──────────────▲────────┘
                       │              │
                       │        load()/update()
                       │              │
             ┌─────────▼─────────┐    │
             │ Checkpoint Store  │    │
             │ (groupid → ts)    │    │
             └─────────┬─────────┘    │
                       │              │
                       │   collect_group()
                       │              │
        ┌──────────────▼──────────────┐
        │  DataCollector (`collector`)│
        │  • Host discovery           │
        │  • Parallel host polling    │
        │  • history.get batching     │
        └──────────────┬──────────────┘
                       │ batches of MonitoringRecord
                       │
               ┌───────▼────────┐
               │ Kafka Channel  │ (bounded buffer)
               └───────┬────────┘
                       │
             ┌─────────▼─────────┐
             │ KafkaPublisher     │
             │ • Idempotent send  │
             │ • QueueFull backoff│
             └─────────┬─────────┘
                       │
                       ▼
                  Kafka Topic
  • Streaming loop drives the micro-interval ticker, walks host groups, and pushes records into the Kafka channel.
  • Checkpoint store persists the highest UNIX timestamp seen for each host group to JSON, enabling restarts and start-mode control. Supports three modes: EARLIEST, LATEST, and CHECKPOINT with intelligent priority handling.
  • Data collector fans out host/item history requests with a semaphore-limited concurrency pool and merges records.
  • Kafka publisher drains the channel, performs idempotent publishes, and applies exponential backoff when the broker queue returns QueueFull.

Configuration

integration.ini drives runtime behavior. Key sections:

[ZABBIX]
server = http://localhost:8080
user = Admin
password = zabbix

[KAFKA]
queue_max_messages = 100000     ; optional producer buffer limit
bootstrap_servers = localhost:9092
topic = rust.metrics

[STREAM]
micro_interval_ms = 250        ; ticker cadence
buffer_capacity   = 512        ; bounded channel depth
checkpoint_mode   = LATEST     ; EARLIEST, LATEST, or CHECKPOINT
checkpoint_path   = checkpoint_state.json
host_groups       = NONE       ; comma list of group names (or NONE for all)
  • checkpoint_mode = CHECKPOINT loads previously persisted timestamps. CURRENT clears them so the collector begins from “now”.
  • Adjust micro_interval_ms for faster/slower polling and buffer_capacity to tune upstream buffering relative to Kafka throughput.
  • Set queue_max_messages to bound the librdkafka internal queue; smaller values intensify backpressure handling and surface QueueFull warnings during tests (size limits scale proportionally).
  • host_groups accepts a comma-separated list of group names to include (comparison is case-insensitive). Omit the key or set it to NONE to stream every discovered group.

Enhanced Checkpoint Modes

The streaming collector supports three checkpoint modes with intelligent priority handling:

Checkpoint Modes

  • EARLIEST - Start from timestamp 0 to retrieve all available historical data from Zabbix
  • LATEST - Start from current time to get new data only (default behavior)
  • CHECKPOINT - Resume from existing checkpoint file if it exists

Priority System

Checkpoint files always take priority over configuration settings:

Configuration Checkpoint File Effective Mode Behavior
EARLIEST ❌ No EARLIEST Start from timestamp 0
LATEST ❌ No LATEST Start from current time
ANY ✅ Yes CHECKPOINT Resume from checkpoint

This ensures reliable restarts in production environments.

Configuration Examples

# Historical data import
[STREAM]
checkpoint_mode = EARLIEST

# Real-time monitoring (default)
[STREAM]  
checkpoint_mode = LATEST

# Flexible (will resume from checkpoint if it exists)
[STREAM]
checkpoint_mode = CHECKPOINT

Environment Variables

All configuration can also be set via environment variables:

export ZABBIX_KAFKA_CHECKPOINT_MODE=EARLIEST
export ZABBIX_KAFKA_CHECKPOINT_PATH=/custom/path/checkpoint.json

Environment variables take precedence over configuration file settings.

Enhanced Logging

The collector logs both the configured and effective modes:

configured_mode: Latest
effective_mode: Checkpoint
"Resuming from existing checkpoint file"

Running the Collector Locally

  1. Ensure the integration stack is up (docker-compose.integration.yml provides Zabbix, Kafka, Postgres, etc.).
  2. Build once for release performance:
    & "$env:USERPROFILE\.cargo\bin\cargo.exe" build --release
  3. Launch the streaming collector:
    & "$env:USERPROFILE\.cargo\bin\cargo.exe" run --release -- \
      --config integration.ini --workers 20

The process will log discovered host groups, enter the continuous loop, and stream Zabbix history records to Kafka while persisting checkpoints.

Automated Scenario Tests

Two PowerShell harnesses under scripts/ exercise end-to-end recovery flows, including synthetic data injection and Kafka verification:

  • checkpoint_resume_test.ps1

    1. Starts a background job that sends Zabbix metrics for ~140 seconds.
    2. Runs the collector for a first interval, then stops it and waits (DowntimeSeconds).
    3. Restarts the collector to confirm it resumes from the prior checkpoint.
    4. Captures Kafka offsets/messages and writes artifacts under artifacts/checkpoint_resume.
  • fresh_start_test.ps1

    1. Backs up and deletes the existing checkpoint file.
    2. Generates synthetic metrics while a single collector run executes.
    3. Confirms new offsets and records appear, storing artifacts under artifacts/fresh_start.

Execute them after the stack is running:

powershell.exe -ExecutionPolicy Bypass -File .\scripts\checkpoint_resume_test.ps1
powershell.exe -ExecutionPolicy Bypass -File .\scripts\fresh_start_test.ps1

Each script emits the Kafka offset delta plus paths to captured logs and message samples, making it easy to diff runs or hand results to reviewers.

Path-by-Path Validation Cheatsheet

Path Purpose Key Checks Expected Artifacts
Streaming Happy Path Run collector with existing checkpoints info logs showing groups, Kafka offset growth continues steadily checkpoint_state.json updated, Kafka topic advancing
Checkpoint Resume (checkpoint_resume_test.ps1) Validate restart resumes at last per-group timestamp Offset delta > 0 on second run with no duplicate replay in Kafka artifacts/checkpoint_resume/summary.txt, message sample JSON
Fresh Start (fresh_start_test.ps1) Confirm wiping checkpoint restarts from wall-clock Offset delta reflects only new data since restart artifacts/fresh_start/summary.txt, backup and post-run checkpoint copies
Backpressure (backpressure_test.ps1) Simulate slow Kafka broker (pause container, shrink queue_max_messages) warn logs: "Kafka queue full, backing off" with recovery without crashes Continuous logs, no panic; offsets eventually catch up; artifacts/backpressure/ summary/logs
Graceful Shutdown Hit Ctrl+C while streaming Collector logs “Shutdown signal received, draining”, flushes Kafka, persists checkpoints No lost offsets, Kafka task reports clean exit

Implementation Notes

  • Checkpoint granularity is per host group; if Zabbix emits late samples with identical second timestamps, consider extending the store to track per-item high-water marks.
  • Backpressure strategy relies on FutureProducer idempotence plus bounded channels—the polling loop naturally slows when Kafka is congested.
  • Graceful shutdown: Ctrl+C (or container stop) triggers a drain path that persists outstanding checkpoints, flushes Kafka, and joins the publisher task before exit.
  • Extensibility: add new history types by extending zabbix::collect_host_records batching list and adjusting downstream filtering.

For deeper troubleshooting, examine the artifacts created by the automation scripts or enable --log_level debug to view individual batches and timestamps.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors