Skip to content

Conversation

@derodero24
Copy link

Summary

Extract text from Slack Block Kit structures in messages retrieved via conversations_history and conversations_replies, so that block-only messages (emails, bot notifications, app messages) are no longer returned empty.

Closes #186

Problem

Messages using Block Kit format (e.g., Slack Email integration, Grafana/Datadog alerts) have an empty Text field. The existing AttachmentsTo2CSV extracts text from attachment fields but ignores:

  • Top-level blocks array
  • attachments[].blocks array

Solution

  1. BlocksToText function in pkg/text/text_processor.go extracts text from common block types:
    • header.text
    • section.text and section.fields
    • rich_text (sections, lists, quotes, preformatted)
    • context.elements[].text
  2. Top-level blocks: When msg.Text is empty, fall back to BlocksToText(msg.Blocks) — avoids duplication since Slack typically populates text as a plaintext fallback of blocks
  3. Attachment blocks: AttachmentToText now also extracts att.Blocks

Changes

  • pkg/text/text_processor.go: Add BlocksToText and rich text helper functions; call BlocksToText in AttachmentToText
  • pkg/handler/conversations.go: Use block text as fallback in convertMessagesFromHistory and convertMessagesFromSearch
  • pkg/text/text_processor_test.go: Add 15 test cases covering all supported block types, edge cases, and attachment integration

Testing

  • All existing unit tests pass (go test -run TestUnit ./...)
  • New tests cover: empty blocks, header, section (with text/fields/nil text), context, rich_text (text/link/broadcast), multiple blocks combined, non-text blocks skipped, attachment with blocks only, attachment with both text and blocks

Messages using Slack's Block Kit format (emails forwarded via Slack,
bot notifications from Grafana/Datadog, app messages) return empty
text when retrieved via conversations_history. This extracts text
from block structures so these messages are no longer empty.

Closes korotovsky#186
@korotovsky
Copy link
Owner

Could you please provide a couple of screenshots/examples that illustrate before/after results?

@derodero24
Copy link
Author

@korotovsky Sure! Here are before/after examples based on real Slack messages.

Example 1: CloudWatch Alarm (attachment with Block Kit only)

This is a real message from an AWS monitoring bot. The message has text: "" and attachments[0] contains only Block Kit blocks (no legacy title/text fields).

Attachment JSON (simplified):

{
  "text": "",
  "attachments": [{
    "title": "",
    "text": "",
    "blocks": [
      {"type": "section", "text": {"type": "mrkdwn", "text": "*<URL|:rotating_light: CloudWatch Alarm | MyAlarm | ap-northeast-1>*"}},
      {"type": "section", "text": {"type": "mrkdwn", "text": "Threshold Crossed: 1 out of the last 1 datapoints [1.0] was greater than or equal to the threshold (1.0)"}},
      {"type": "actions", "elements": [...]},
      {"type": "section", "fields": [{"type": "mrkdwn", "text": "*Namespace*\nMyNamespace"}, {"type": "mrkdwn", "text": "*Metric*\nMyMetric"}]},
      {"type": "section", "fields": [{"type": "mrkdwn", "text": "*Alarm State*\nALARM"}]},
      {"type": "image", "...": "..."},
      {"type": "context", "elements": [{"type": "mrkdwn", "text": "<!date^1738900000^{date_short_pretty} at {time}|2025-02-07>"}]}
    ]
  }]
}

Beforeconversations_history CSV text column:

(empty)

After — Block Kit content is extracted from attachments[].blocks:

Blocks: *URL - :rotating_light: CloudWatch Alarm | MyAlarm | ap-northeast-1* Threshold Crossed: 1 out of the last 1 datapoints [1.0] was greater than or equal to the threshold [1.0] *Namespace* MyNamespace *Metric* MyMetric *Alarm State* ALARM <!date^1738900000^{date_short_pretty} at {time}|2025-02-07>

Example 2: Datadog Alert (legacy attachment format — no change)

A Datadog alert uses legacy attachment fields (title + text) without Block Kit blocks. Behavior is unchanged:

{
  "text": "",
  "attachments": [{
    "title": "Triggered: ServerError - /ecs/my-service",
    "text": "Host: /ecs/my-service\nLog status: error\nMore than 1 log event matched..."
  }]
}

Before and After — Same output (already extracted via legacy fields):

Title: Triggered: ServerError - /ecs/my-service; Text: Host: /ecs/my-service Log status: error More than 1 log event matched...

Example 3: Top-level Block Kit message (bot with text: "")

Some bots send messages with text: "" and content only in top-level blocks:

{
  "text": "",
  "blocks": [
    {"type": "header", "text": {"type": "plain_text", "text": "Deploy Complete"}},
    {"type": "section", "text": {"type": "mrkdwn", "text": "Service *my-api* deployed to production"}},
    {"type": "context", "elements": [{"type": "mrkdwn", "text": "Deployed by CI/CD pipeline"}]}
  ]
}

Beforetext column:

(empty)

After — Block Kit text extracted as fallback (only when msg.Text is empty):

Deploy Complete Service *my-api* deployed to production Deployed by CI/CD pipeline

Design notes

  • Top-level msg.Blocks: Extracted only when msg.Text is empty (Slack usually provides a plaintext fallback in text, so this avoids duplication)
  • attachments[].Blocks: Always extracted as supplementary info (prefixed with Blocks:)
  • Non-text blocks (divider, image, actions) are skipped

@korotovsky
Copy link
Owner

@derodero24 thank you, and how does serialized CSV message look like in such cases?

@derodero24
Copy link
Author

@korotovsky Here's the actual serialized CSV output generated by running the code against representative Block Kit messages:

Example 1: CloudWatch Alarm (attachment with blocks only)

Before:

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770433087.532289,U001,aws-bot,Amazon Q Developer,C001,,,2026-02-07T02:58:02Z,Amazon Q Developer,

After:

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770433087.532289,U001,aws-bot,Amazon Q Developer,C001,,"Blocks: https://console.aws.amazon.com/cloudwatch - :rotating_light: CloudWatch Alarm MyAlarm ap-northeast-1 Account: 123456789012, Threshold Crossed: 1 out of the last 1 datapoints 1.0 07/02/26 02:57:00 was greater than or equal to the threshold 1.0 Namespace MyNamespace Metric MyMetric Alarm State ALARM",2026-02-07T02:58:02Z,Amazon Q Developer,

Example 2: Bot message with top-level blocks

Before:

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770400000.000001,U002,deploy-bot,Deploy Bot,C002,,,2026-02-07T10:00:00Z,Deploy Bot,

After:

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770400000.000001,U002,deploy-bot,Deploy Bot,C002,,Deploy Complete Service my-api deployed to production Deployed by CI/CD pipeline,2026-02-07T10:00:00Z,Deploy Bot,

Example 3: Legacy attachment (no blocks) — unchanged

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770433155.658769,U003,datadog,Datadog,C001,,Title: Triggered: ServerError Text: Host: /ecs/my-service Log status: error More than 1 log event matched,2026-02-07T02:59:15Z,Datadog,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Extract text content from message blocks in conversations_history

2 participants