Skip to content

Fix remaining OSINT signal text truncation#68

Merged
calesthio merged 5 commits intocalesthio:masterfrom
schergr:fix/osint-signal-truncation
Mar 25, 2026
Merged

Fix remaining OSINT signal text truncation#68
calesthio merged 5 commits intocalesthio:masterfrom
schergr:fix/osint-signal-truncation

Conversation

@schergr
Copy link
Copy Markdown

@schergr schergr commented Mar 21, 2026

Summary

  • Remove 120-char truncation in delta engine when building OSINT signals
  • Remove 80-char truncation in memory snapshots for urgent Telegram posts
  • Remove 120-char truncation in ideas/LLM context for OSINT posts
  • Improve signal formatting in Telegram alerts (bulleted list instead of inline)

The prior fix (753c676) removed truncation at source ingestion and alert formatting, but signals were still arriving at the alerter pre-truncated from upstream. The sendMessage chunker already handles Telegram's 4096-char API limit.

Test plan

  • Trigger a sweep with urgent OSINT posts and verify full text appears in Telegram alert
  • Confirm alert messages are properly chunked if they exceed 4096 chars
  • Verify delta engine correctly deduplicates signals with full-length text

🤖 Generated with Claude Code

Greg Scher and others added 2 commits March 20, 2026 16:49
Posts were being cut to 300 chars (source ingestion) and 150 chars
(alert evaluation), losing valuable OSINT context. The sendMessage
chunker already handles the 4096-char Telegram API limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The prior fix (753c676) only removed truncation at source ingestion and
alert formatting. Signals were still being cut to 120 chars in the delta
engine, 80 chars in memory snapshots, and 120 chars in the ideas LLM
context — so OSINT posts arrived at the alerter already truncated.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@schergr schergr requested a review from calesthio as a code owner March 21, 2026 17:01
Copilot AI review requested due to automatic review settings March 21, 2026 17:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes remaining upstream truncation of urgent Telegram/OSINT signal text so full post content can flow through delta computation, memory snapshots, LLM context, and Telegram alert rendering (with improved “Signals” formatting).

Changes:

  • Removed substring/slice truncation in Telegram source ingestion, delta engine signal construction, and memory snapshot compaction.
  • Updated LLM “ideas” sweep compaction to include full urgent OSINT post text.
  • Improved Telegram alert formatting for signals (more items + bulleted list output).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/llm/ideas.mjs Stops truncating urgent OSINT post text included in LLM ideas context.
lib/delta/memory.mjs Stores full urgent post text in compacted memory snapshots.
lib/delta/engine.mjs Emits full urgent post text in newly-detected OSINT signals.
lib/alerts/telegram.mjs Expands/reshapes OSINT signal text shown in alerts and formats signals as bullets.
apis/sources/telegram.mjs Stops truncating Telegram message text extracted via Bot API and web preview parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@calesthio
Copy link
Copy Markdown
Owner

Thanks for opening this. The direction makes sense, but there are two issues I think should be fixed before this is merged:

  1. Telegram alert formatting now sends full raw OSINT post text through parse_mode: Markdown without escaping. In the rule-based OSINT surge path, evaluation.signals can now contain full Telegram post bodies, and _formatTieredAlert() renders them as bullet lines. Real post text commonly contains _, brackets, parentheses, and similar Markdown-significant characters. That means alerts can render incorrectly or be rejected by the Bot API altogether. Please either escape Markdown-sensitive characters before formatting or send this section without Markdown parsing.

  2. The ideas LLM context no longer has a length bound for urgent OSINT posts. Keeping full text in storage/delta/memory is reasonable, but compactSweepForLLM() is supposed to stay compact and now it can be dominated by a handful of long Telegram posts. That creates regression risk for latency, cost, and provider-side input-limit failures. Please keep full text upstream, but add an overall size/token cap when building the ideas prompt.

Once those two are addressed, this looks much closer to mergeable.

Greg Scher and others added 2 commits March 23, 2026 12:57
Addresses PR review: escape Markdown-sensitive characters in
_formatTieredAlert signal bullets to prevent Telegram Bot API
rejections, and add a 1500-char budget for URGENT_OSINT in
compactSweepForLLM to bound prompt size while keeping full text upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Replace single &calesthio#39; handler with generic numeric/hex entity decoder
  so &calesthio#39; and other unpadded entities are properly converted
- Dedup urgent OSINT posts against all hot memory runs (last 3 sweeps)
  instead of only the previous sweep, preventing posts that drop out
  of one sweep from reappearing as "new" in the next

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@schergr
Copy link
Copy Markdown
Author

schergr commented Mar 24, 2026

anything else you need?

@calesthio
Copy link
Copy Markdown
Owner

calesthio commented Mar 25, 2026

Added a follow-up commit on top of this branch to close the remaining review issues:

  • switched broad OSINT dedup in lib/delta/engine.mjs to prefer stable post identity (postId, or channel/chat + date + text) instead of only the lossy semantic hash
  • preserved channel/post identity in lib/delta/memory.mjs so cross-run dedup has enough information to suppress exact reposts without hiding genuinely new updates
  • aligned signal escaping in lib/alerts/telegram.mjs with the bot's existing legacy Markdown parse mode instead of MarkdownV2-style escaping

Rechecked the branch after the patch: sweep still completes, dashboard inject still runs, the new-post dedup false negative is fixed, and this stays scoped to Telegram/delta/ideas paths without touching jarvis core UI code.

Copy link
Copy Markdown
Owner

@calesthio calesthio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the updated branch including the follow-up fix commit. The truncation removal adds real value, and with the dedup identity + Markdown escaping fixes in place I don’t see a remaining blocker.

@schergr
Copy link
Copy Markdown
Author

schergr commented Mar 25, 2026

This PR is ready to merge — can you merge it when you get a chance? We're currently running the unpatched code on master.

@calesthio calesthio merged commit 53f6d81 into calesthio:master Mar 25, 2026
1 check passed
mdrong22 pushed a commit to mdrong22/Crucix-Main that referenced this pull request Mar 25, 2026
commit 53f6d81
Merge: 8c1ea37 5c08355
Author: Calesthio <[email protected]>
Date:   Wed Mar 25 10:21:02 2026 -0700

    Merge pull request calesthio#68 from schergr/fix/osint-signal-truncation

    Fix remaining OSINT signal text truncation

commit 5c08355
Author: calesthio <[email protected]>
Date:   Tue Mar 24 18:48:55 2026 -0700

    Fix Telegram dedup identity and legacy Markdown escaping

commit b7322f1
Author: Greg Scher <[email protected]>
Date:   Mon Mar 23 13:01:32 2026 -0400

    Fix HTML entity decoding and broaden OSINT dedup window

    - Replace single &calesthio#39; handler with generic numeric/hex entity decoder
      so &calesthio#39; and other unpadded entities are properly converted
    - Dedup urgent OSINT posts against all hot memory runs (last 3 sweeps)
      instead of only the previous sweep, preventing posts that drop out
      of one sweep from reappearing as "new" in the next

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit 31c305c
Author: Greg Scher <[email protected]>
Date:   Mon Mar 23 12:57:37 2026 -0400

    Escape Markdown in alert signals and cap OSINT text in ideas prompt

    Addresses PR review: escape Markdown-sensitive characters in
    _formatTieredAlert signal bullets to prevent Telegram Bot API
    rejections, and add a 1500-char budget for URGENT_OSINT in
    compactSweepForLLM to bound prompt size while keeping full text upstream.

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit 2d166c2
Author: Greg Scher <[email protected]>
Date:   Sat Mar 21 12:59:30 2026 -0400

    Remove remaining text truncation across delta engine, memory, and ideas

    The prior fix (753c676) only removed truncation at source ingestion and
    alert formatting. Signals were still being cut to 120 chars in the delta
    engine, 80 chars in memory snapshots, and 120 chars in the ideas LLM
    context — so OSINT posts arrived at the alerter already truncated.

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit 753c676
Author: Greg Scher <[email protected]>
Date:   Fri Mar 20 16:49:58 2026 -0400

    Remove text truncation limits from Telegram posts

    Posts were being cut to 300 chars (source ingestion) and 150 chars
    (alert evaluation), losing valuable OSINT context. The sendMessage
    chunker already handles the 4096-char Telegram API limit.

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
mdrong22 pushed a commit to mdrong22/Crucix-Main that referenced this pull request Mar 25, 2026
commit 53f6d81
Merge: 8c1ea37 5c08355
Author: Calesthio <[email protected]>
Date:   Wed Mar 25 10:21:02 2026 -0700

    Merge pull request calesthio#68 from schergr/fix/osint-signal-truncation

    Fix remaining OSINT signal text truncation

commit 5c08355
Author: calesthio <[email protected]>
Date:   Tue Mar 24 18:48:55 2026 -0700

    Fix Telegram dedup identity and legacy Markdown escaping

commit b7322f1
Author: Greg Scher <[email protected]>
Date:   Mon Mar 23 13:01:32 2026 -0400

    Fix HTML entity decoding and broaden OSINT dedup window

    - Replace single &calesthio#39; handler with generic numeric/hex entity decoder
      so &calesthio#39; and other unpadded entities are properly converted
    - Dedup urgent OSINT posts against all hot memory runs (last 3 sweeps)
      instead of only the previous sweep, preventing posts that drop out
      of one sweep from reappearing as "new" in the next

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit 31c305c
Author: Greg Scher <[email protected]>
Date:   Mon Mar 23 12:57:37 2026 -0400

    Escape Markdown in alert signals and cap OSINT text in ideas prompt

    Addresses PR review: escape Markdown-sensitive characters in
    _formatTieredAlert signal bullets to prevent Telegram Bot API
    rejections, and add a 1500-char budget for URGENT_OSINT in
    compactSweepForLLM to bound prompt size while keeping full text upstream.

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit 2d166c2
Author: Greg Scher <[email protected]>
Date:   Sat Mar 21 12:59:30 2026 -0400

    Remove remaining text truncation across delta engine, memory, and ideas

    The prior fix (753c676) only removed truncation at source ingestion and
    alert formatting. Signals were still being cut to 120 chars in the delta
    engine, 80 chars in memory snapshots, and 120 chars in the ideas LLM
    context — so OSINT posts arrived at the alerter already truncated.

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit 753c676
Author: Greg Scher <[email protected]>
Date:   Fri Mar 20 16:49:58 2026 -0400

    Remove text truncation limits from Telegram posts

    Posts were being cut to 300 chars (source ingestion) and 150 chars
    (alert evaluation), losing valuable OSINT context. The sendMessage
    chunker already handles the 4096-char Telegram API limit.

    Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
mdrong22 pushed a commit to mdrong22/Crucix-Main that referenced this pull request Mar 25, 2026
commit 8d99875
Author: Matt <[email protected]>
Date:   Wed Mar 25 14:19:07 2026 -0500

    updated commodities

commit a9d77d9
Author: Matt-Drong <[email protected]>
Date:   Wed Mar 25 13:44:05 2026 -0500

    Squashed commit of the following:

    commit 53f6d81
    Merge: 8c1ea37 5c08355
    Author: Calesthio <[email protected]>
    Date:   Wed Mar 25 10:21:02 2026 -0700

        Merge pull request calesthio#68 from schergr/fix/osint-signal-truncation

        Fix remaining OSINT signal text truncation

    commit 5c08355
    Author: calesthio <[email protected]>
    Date:   Tue Mar 24 18:48:55 2026 -0700

        Fix Telegram dedup identity and legacy Markdown escaping

    commit b7322f1
    Author: Greg Scher <[email protected]>
    Date:   Mon Mar 23 13:01:32 2026 -0400

        Fix HTML entity decoding and broaden OSINT dedup window

        - Replace single &calesthio#39; handler with generic numeric/hex entity decoder
          so &calesthio#39; and other unpadded entities are properly converted
        - Dedup urgent OSINT posts against all hot memory runs (last 3 sweeps)
          instead of only the previous sweep, preventing posts that drop out
          of one sweep from reappearing as "new" in the next

        Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

    commit 31c305c
    Author: Greg Scher <[email protected]>
    Date:   Mon Mar 23 12:57:37 2026 -0400

        Escape Markdown in alert signals and cap OSINT text in ideas prompt

        Addresses PR review: escape Markdown-sensitive characters in
        _formatTieredAlert signal bullets to prevent Telegram Bot API
        rejections, and add a 1500-char budget for URGENT_OSINT in
        compactSweepForLLM to bound prompt size while keeping full text upstream.

        Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

    commit 2d166c2
    Author: Greg Scher <[email protected]>
    Date:   Sat Mar 21 12:59:30 2026 -0400

        Remove remaining text truncation across delta engine, memory, and ideas

        The prior fix (753c676) only removed truncation at source ingestion and
        alert formatting. Signals were still being cut to 120 chars in the delta
        engine, 80 chars in memory snapshots, and 120 chars in the ideas LLM
        context — so OSINT posts arrived at the alerter already truncated.

        Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

    commit 753c676
    Author: Greg Scher <[email protected]>
    Date:   Fri Mar 20 16:49:58 2026 -0400

        Remove text truncation limits from Telegram posts

        Posts were being cut to 300 chars (source ingestion) and 150 chars
        (alert evaluation), losing valuable OSINT context. The sendMessage
        chunker already handles the 4096-char Telegram API limit.

        Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

commit b54ddce
Author: Matt <[email protected]>
Date:   Wed Mar 25 02:22:48 2026 -0500

    sure why not
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants