Skip to content

feat(pipeline): structured execution logging — per-node success/error audit trail #468

@axisrow

Description

@axisrow

Problem

Pipeline runs complete with status=completed and result_count=3, but there is no observable trace of what actually happened. The only structured data saved is action_counts in generation_runs.metadata — e.g. {"react": 3}. There is no way to answer:

  • Which messages were reacted to (message_id, channel_id)?
  • Which messages were filtered out and why?
  • How many context messages were retrieved, and via what method?
  • What prompt/model was used for LLM generation?
  • Did image generation succeed or fail?
  • How long did each node take?

This makes debugging almost impossible and the UI shows nothing meaningful after a run.


Root Cause Analysis

1. No logger.info on success in any handler

Handler Success log Error log
ReactHandler ❌ only increments action_counts context.record_error()
ForwardHandler context.record_error()
DeleteMessageHandler context.record_error()
FilterHandler
FetchMessagesHandler
LlmGenerateHandler
LlmRefineHandler
ImageGenerateHandler ❌ (only skip)
RetrieveContextHandler ❌ (only warning)
NotifyHandler ⚠️ warning only

2. UnifiedDispatcher._handle_pipeline_run logs exceptions only — no success line

3. ContentGenerationService has no logger.info at all

4. context.record_error() is used only in action handlers (react/forward/delete) — LLM and retrieval errors are silently swallowed

5. No DB-side audit trail — generation_runs metadata is a single unstructured JSON blob with no per-node breakdown


What Should Be Added

A. Per-node logger.info on success (minimum viable)

# ReactHandler — after send_reaction succeeds
logger.info("ReactHandler[%s]: reacted %s to message_id=%s channel_id=%s phone=%s",
            node_id, chosen_emoji, message.message_id, message.channel_id, acquired_phone)

# FilterHandler — after filtering
logger.info("FilterHandler[%s]: %d → %d messages (filtered %d)", node_id, before, after, before - after)

# FetchMessagesHandler — after fetch
logger.info("FetchMessagesHandler[%s]: fetched %d messages from channel_id=%s", node_id, count, channel_id)

# LlmGenerateHandler — after generation
logger.info("LlmGenerateHandler[%s]: generated %d chars via model=%s", node_id, len(text), model)

# ImageGenerateHandler — after image
logger.info("ImageGenerateHandler[%s]: image generated url=%s", node_id, url)

# RetrieveContextHandler — after retrieval
logger.info("RetrieveContextHandler[%s]: retrieved %d messages via %s", node_id, count, method)

B. UnifiedDispatcher — log successful pipeline run completion

logger.info(
    "PIPELINE_RUN completed: pipeline_id=%d run_id=%d result_kind=%s result_count=%d elapsed=%.1fs",
    pipeline_id, run_id, result_kind, result_count, elapsed,
)

C. Extend context.record_error() usage to LLM and retrieval handlers

Currently only react/forward/delete handlers write structured errors. LLM timeouts, retrieval failures, image generation errors are all untracked.

D. (Optional / future) Per-node execution events in DB

Store a pipeline_run_events table with (run_id, node_id, event_type, payload JSON, created_at) to enable UI-level drill-down into a run. This is a bigger change and can be a separate follow-up.


Acceptance Criteria

  • Each handler logs at INFO level on every successful operation with enough context to identify the affected entity (message_id, channel_id, emoji, model, etc.)
  • UnifiedDispatcher logs successful PIPELINE_RUN completion with run_id, result_count, and elapsed time
  • LLM and retrieval handlers use context.record_error() for failures (consistent with action handlers)
  • No new DB schema required for the base fix — DB-side audit trail is a separate follow-up
  • Existing tests pass; no new test required unless behaviour changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions