Skip to content

Track reply-to-tweet timing delay for engagement analysis #11

@DarlingtonDeveloper

Description

@DarlingtonDeveloper

Problem

Echo's evolve analyser cannot measure how quickly a reply was sent after the original tweet was posted. This is critical because X's algorithm heavily rewards engagement velocity — the first 30 minutes of a tweet's life are when replies get the most algorithmic boost.

Currently, baseline reply nodes store posted_at (from X Analytics CSV, date-only granularity) and post_id (which is the reply's own tweet ID, not the parent tweet ID). There is no in_reply_to_status_id or parent tweet timestamp stored.

What We Have

  • Reply snowflake ID (node.title) — encodes the exact timestamp the reply was created (millisecond precision via (id >> 22) + 1288834974657)
  • Parent tweet ID — NOT stored. The post_id field on baseline replies is the reply's own ID, not the parent's

What We Need

  1. Store in_reply_to_id on reply nodes — the parent tweet's snowflake ID
  2. Compute time_to_reply_seconds from the two snowflake IDs (no API call needed, pure math: ((reply_id >> 22) - (parent_id >> 22)) / 1000)
  3. Backfill existing replies — scrape in_reply_to_id for the ~311 baseline replies already in Cortex

Approaches to Get Parent Tweet ID

Option A: Browser scrape via xbot-browser

For each reply URL (e.g., https://x.com/DarlingtonDev/status/2028479083399516226), navigate to it and extract the parent tweet link from the conversation thread. Slow (~3s per reply) but works without API access.

Option B: X API v2

GET /2/tweets/:id?tweet.fields=conversation_id,in_reply_to_user_id,referenced_tweets returns referenced_tweets[0].id which is the parent tweet ID. Fast, batch-capable (100 tweets per request), but requires API access.

Option C: Embed in csv_import pipeline

X Analytics CSV doesn't include in_reply_to_id, so this can't be done at import time without a supplementary API/scrape call.

Recommendation: Option B if API access is available, Option A as fallback. Either way, build as a one-time backfill script + ongoing enrichment in the csv_import pipeline.

Impact on Evolve Analysis

Once timing data is available:

  • Correlate reply speed with engagement — do replies within 5 minutes of the original tweet get more impressions than replies sent hours later?
  • Optimal reply window — find the sweet spot (e.g., "replies sent 2-15 min after original tweet average 3x more impressions")
  • Feed into compose prioritisation — Echo should prioritise replying to tweets that are fresh (< 30 min old) over older ones
  • Add time_to_reply_seconds to the digest prompt so Claude can factor timing into its pattern analysis

Data Model Changes

# On reply nodes in Cortex:
{
    "in_reply_to_id": "2028475000000000000",  # parent tweet snowflake ID
    "time_to_reply_seconds": 342,              # computed from snowflake delta
    # ... existing fields
}
# In ReplyRecord (echo/evolve/collector.py):
@dataclass
class ReplyRecord:
    # ... existing fields
    time_to_reply_seconds: int | None  # already exists, just always None
    in_reply_to_id: str | None = None  # new

Files Affected

  • echo/analytics/csv_import.py — enrich with in_reply_to_id during import (if API available)
  • echo/evolve/collector.py — pass time_to_reply_seconds through to ReplyRecord
  • echo/evolve/analyser.py — add timing correlation analysis
  • echo/evolve/digest.py — include timing data in Claude digest prompt
  • New: echo/scripts/backfill_reply_timing.py — one-time backfill for existing replies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions