Problem
Pipeline runs complete with status=completed and result_count=3, but there is no observable trace of what actually happened. The only structured data saved is action_counts in generation_runs.metadata — e.g. {"react": 3}. There is no way to answer:
- Which messages were reacted to (message_id, channel_id)?
- Which messages were filtered out and why?
- How many context messages were retrieved, and via what method?
- What prompt/model was used for LLM generation?
- Did image generation succeed or fail?
- How long did each node take?
This makes debugging almost impossible and the UI shows nothing meaningful after a run.
Root Cause Analysis
1. No logger.info on success in any handler
| Handler |
Success log |
Error log |
ReactHandler |
❌ only increments action_counts |
✅ context.record_error() |
ForwardHandler |
❌ |
✅ context.record_error() |
DeleteMessageHandler |
❌ |
✅ context.record_error() |
FilterHandler |
❌ |
❌ |
FetchMessagesHandler |
❌ |
❌ |
LlmGenerateHandler |
❌ |
❌ |
LlmRefineHandler |
❌ |
❌ |
ImageGenerateHandler |
❌ (only skip) |
❌ |
RetrieveContextHandler |
❌ |
❌ (only warning) |
NotifyHandler |
❌ |
⚠️ warning only |
2. UnifiedDispatcher._handle_pipeline_run logs exceptions only — no success line
3. ContentGenerationService has no logger.info at all
4. context.record_error() is used only in action handlers (react/forward/delete) — LLM and retrieval errors are silently swallowed
5. No DB-side audit trail — generation_runs metadata is a single unstructured JSON blob with no per-node breakdown
What Should Be Added
A. Per-node logger.info on success (minimum viable)
# ReactHandler — after send_reaction succeeds
logger.info("ReactHandler[%s]: reacted %s to message_id=%s channel_id=%s phone=%s",
node_id, chosen_emoji, message.message_id, message.channel_id, acquired_phone)
# FilterHandler — after filtering
logger.info("FilterHandler[%s]: %d → %d messages (filtered %d)", node_id, before, after, before - after)
# FetchMessagesHandler — after fetch
logger.info("FetchMessagesHandler[%s]: fetched %d messages from channel_id=%s", node_id, count, channel_id)
# LlmGenerateHandler — after generation
logger.info("LlmGenerateHandler[%s]: generated %d chars via model=%s", node_id, len(text), model)
# ImageGenerateHandler — after image
logger.info("ImageGenerateHandler[%s]: image generated url=%s", node_id, url)
# RetrieveContextHandler — after retrieval
logger.info("RetrieveContextHandler[%s]: retrieved %d messages via %s", node_id, count, method)
B. UnifiedDispatcher — log successful pipeline run completion
logger.info(
"PIPELINE_RUN completed: pipeline_id=%d run_id=%d result_kind=%s result_count=%d elapsed=%.1fs",
pipeline_id, run_id, result_kind, result_count, elapsed,
)
C. Extend context.record_error() usage to LLM and retrieval handlers
Currently only react/forward/delete handlers write structured errors. LLM timeouts, retrieval failures, image generation errors are all untracked.
D. (Optional / future) Per-node execution events in DB
Store a pipeline_run_events table with (run_id, node_id, event_type, payload JSON, created_at) to enable UI-level drill-down into a run. This is a bigger change and can be a separate follow-up.
Acceptance Criteria
Problem
Pipeline runs complete with
status=completedandresult_count=3, but there is no observable trace of what actually happened. The only structured data saved isaction_countsingeneration_runs.metadata— e.g.{"react": 3}. There is no way to answer:This makes debugging almost impossible and the UI shows nothing meaningful after a run.
Root Cause Analysis
1. No
logger.infoon success in any handlerReactHandleraction_countscontext.record_error()ForwardHandlercontext.record_error()DeleteMessageHandlercontext.record_error()FilterHandlerFetchMessagesHandlerLlmGenerateHandlerLlmRefineHandlerImageGenerateHandlerRetrieveContextHandlerNotifyHandler2.
UnifiedDispatcher._handle_pipeline_runlogs exceptions only — no success line3.
ContentGenerationServicehas nologger.infoat all4.
context.record_error()is used only in action handlers (react/forward/delete) — LLM and retrieval errors are silently swallowed5. No DB-side audit trail —
generation_runsmetadata is a single unstructured JSON blob with no per-node breakdownWhat Should Be Added
A. Per-node
logger.infoon success (minimum viable)B.
UnifiedDispatcher— log successful pipeline run completionC. Extend
context.record_error()usage to LLM and retrieval handlersCurrently only
react/forward/deletehandlers write structured errors. LLM timeouts, retrieval failures, image generation errors are all untracked.D. (Optional / future) Per-node execution events in DB
Store a
pipeline_run_eventstable with(run_id, node_id, event_type, payload JSON, created_at)to enable UI-level drill-down into a run. This is a bigger change and can be a separate follow-up.Acceptance Criteria
INFOlevel on every successful operation with enough context to identify the affected entity (message_id, channel_id, emoji, model, etc.)UnifiedDispatcherlogs successfulPIPELINE_RUNcompletion withrun_id,result_count, and elapsed timecontext.record_error()for failures (consistent with action handlers)