Skip to content

feat: add configurable discord message_limit with code-fence-preserving chunks#140

Open
the3mi wants to merge 1 commit intoopenabdev:mainfrom
the3mi:feat/message-limit-config
Open

feat: add configurable discord message_limit with code-fence-preserving chunks#140
the3mi wants to merge 1 commit intoopenabdev:mainfrom
the3mi:feat/message-limit-config

Conversation

@the3mi
Copy link
Copy Markdown
Contributor

@the3mi the3mi commented Apr 8, 2026

Summary

Implements configurable message_limit per platform, with an extensible config structure. Currently only Discord is wired up. Telegram placeholder is present but not yet implemented.

Config

[platforms.discord]
message_limit = 2000  # default

[platforms.telegram]  # placeholder — not yet implemented
message_limit = 4096  # default

Reference limits for future platforms:

  • Discord = 2000
  • Telegram = 4096
  • Slack = 39000
  • Signal = 8000
  • Matrix = 4000
  • DingTalk = 20000
  • WeCom = 4000

Behavior

  • Streaming edits: message_limit - 100 buffer (Discord edit overhead)
  • Final chunking: full message_limit
  • Streaming uses split_message(..., false) — no (1/N) indicators to avoid live-edit noise
  • Final chunking uses split_message(..., true) — includes (1/N) indicators

Code changes

  • config.rs: PlatformSettings + PlatformsConfig replaces the old flat discord.message_limit field. default_limit_for_platform() centralizes per-platform defaults so PlatformSettings::default() and PlatformsConfig::default() stay consistent.
  • format.rs: split_message(text, limit, add_indicator) — third arg controls indicator injection. split_message_for_streaming() wraps it with add_indicator=false.
  • discord.rs: Both call sites updated to pass the third arg.

Bug fixes in this PR

  1. Streaming indicator bug: split_message_for_streaming() was calling split_message() directly with no indicator flag, so it still added (1/N) markers despite the name implying otherwise. Fixed by making the indicator behavior explicit.
  2. Hardcoded test buffer: test_very_long_line used 2000 + 20 instead of limit + indicator_len. Now uses dynamic values.
  3. Telegram default mismatch: PlatformSettings::default() returned 2000, but PlatformsConfig::default() hardcoded telegram at 4096. Now both use default_limit_for_platform().

Quality checks

  • cargo test: 6/6 passing
  • cargo clippy: warnings only (no errors)

@the3mi the3mi requested a review from thepagent as a code owner April 8, 2026 07:24
@the3mi the3mi force-pushed the feat/message-limit-config branch 2 times, most recently from eff3f0d to 8da5032 Compare April 8, 2026 07:42
- New [platforms] config section; each platform has its own message_limit
  Reference limits: Discord=2000, Telegram=4096, Slack=39000, Signal=8000
- Discord streaming uses limit-100 buffer; final chunking uses full limit
- split_message(text, limit, add_indicator): third param controls (N/N) indicators
- split_message_for_streaming(): calls split_message(..., false) — no indicators
- test_very_long_line: uses dynamic indicator_len instead of hardcoded +20
- PlatformSettings: default_message_limit() consistent with PlatformsConfig defaults
- Discord and Telegram both use their platform-specific defaults (2000 / 4096)
- Fix clippy: remove unnecessary trim() before split_whitespace()
- All 6 tests pass, clippy clean
Copy link
Copy Markdown
Collaborator

@chaodu-agent chaodu-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Changes Requested — Good idea but has compile-breaking issues and a Unicode regression. Needs rebase + significant rework.

Baseline Check (Step 0)
Field Value
State OPEN
Mergeable CONFLICTING
Created 2026-04-08 (23 days ago)
Last commit 2026-04-08
Author @brettchien
Labels closing-soon

Main has: split_message() in format.rs with Unicode-safe .chars().count() splitting, 12 thorough tests including fence-reopening and hard-split edge cases. Message limit is hardcoded at 2000 in discord.rs. No PlatformsConfig or configurable limits.

Net-new: PlatformsConfig struct, configurable message_limit per platform, add_indicator flag on split_message, split_message_for_streaming helper.

四問框架

1. 解決什麼問題

Message limit is hardcoded at 2000 (Discord). Adding Telegram/Slack/other platforms requires per-platform configurable limits.

2. 怎麼解決

Adds PlatformsConfig with per-platform defaults, threads message_limit from config through to split_message(), adds add_indicator flag to control (1/N) markers.

3. 考慮過什麼

Streaming edits use limit - 100 buffer for Discord edit overhead. Final chunking uses full limit. Indicator suppressed during streaming.

4. 最佳方案嗎

The config design is clean, but the implementation has critical issues (see below).

🟢 INFO

  • Making message_limit configurable is the right direction — hardcoded 2000 is fragile.
  • PlatformsConfig with per-platform defaults (discord=2000, telegram=4096) is a clean design.
  • The add_indicator flag is a nice touch — streaming edits shouldn't show "(1/N)" markers.

🟡 NIT

  1. config.toml.example not updated — Users won't discover [platforms.discord] without reading source.
  2. Telegram placeholder — Defined in config but no Telegram adapter exists. Consider removing until needed.
  3. Indicator length hardcoded to 10 — Breaks for >99 chunks. Minor but worth a comment.
  4. default_message_limit() returns 2000 — Consider a larger generic default (e.g. 4096) so new platforms don't silently truncate.

🔴 SUGGESTED CHANGES

  1. Compile erroradapter.rs calls format::split_message(&final_content, message_limit) with 2 args. This PR changes the signature to 3 args (text, limit, add_indicator) but does NOT update adapter.rs. This will not compile.

  2. Unicode regression — Main's split_message uses .chars().count() for Unicode-safe splitting. The PR replaces this with .len() (byte length). For CJK/emoji content (common in Discord), a 3-byte UTF-8 char counts as 3 toward the limit, producing unnecessarily small chunks. Discord counts Unicode codepoints, not bytes.

  3. Byte-boundary split can panicremaining[..headroom.min(remaining.len())] slices by byte offset. If headroom falls in the middle of a multi-byte UTF-8 character, this panics at runtime. Need remaining.char_indices() or similar.

  4. Existing tests deleted — Main has 12 thorough tests (including fence-reopening, hard-split lines, Unicode invariants). The PR replaces them with 6 simpler tests that don't verify the chunk-size invariant with .chars().count().

  5. Merge conflict — Needs rebase onto current main.

@chaodu-agent
Copy link
Copy Markdown
Collaborator

超渡法師 Review — PR #140

1. What problem does it solve?

The message character limit is hardcoded at 2000 (Discord's limit). As the project adds multi-platform support (Telegram=4096, Slack=39000), each platform needs its own configurable limit. The PR also adds an add_indicator flag to suppress "(1/N)" markers during streaming edits.

2. How does it solve it?

  • PlatformsConfig / PlatformSettings structs in config.rs with per-platform message_limit fields and serde defaults.
  • default_limit_for_platform() centralizes platform-specific defaults.
  • Threads message_limit from config → Handlerstream_prompt().
  • Rewrites split_message() with a third add_indicator: bool parameter.

3. What was considered?

  • Streaming edits use message_limit - 100 buffer for Discord edit overhead.
  • Indicator suppression during streaming to avoid "(1/N)" noise.
  • Reference limits documented for 7 platforms.

4. Is this the best approach?

The config design is sound, but the implementation has critical issues against current main. The codebase has evolved significantly since this PR was opened.


Traffic Light

🟢 INFO

  • Configurable limits are the right direction — hardcoded 2000 is fragile as multi-platform support grows.
  • PlatformsConfig design is clean with serde defaults.
  • add_indicator flag is a good UX improvement for streaming.

🔴 SUGGESTED CHANGES

  1. Won't compile — adapter.rs not updated — Main's adapter.rs calls format::split_message(&final_content, message_limit) with 2 args. This PR changes the signature to 3 args but does NOT update adapter.rs.

  2. Unicode regression — .len() vs .chars().count() — Main's split_message uses .chars().count() for Unicode-safe splitting. The PR replaces this with .len() (byte length). Discord counts Unicode codepoints, not bytes. For CJK/emoji content, a 3-byte UTF-8 char counts as 3 toward the limit, producing incorrect splitting.

  3. Potential panic on byte-boundary sliceremaining[..headroom.min(remaining.len())] slices by byte offset. If headroom falls in the middle of a multi-byte UTF-8 character, this panics with "byte index is not a char boundary".

  4. 12 existing tests deleted, replaced with 6 weaker ones — Main has 12 thorough tests including assert_length_invariant (verifying .chars().count() <= limit), fence-reopening, hard-split edge cases, and Unicode safety. The PR replaces all of them.

  5. Merge conflicts — needs rebase — Main has evolved significantly. The discord.rs structure changed, and adapter.rs now owns the split_message call site.

  6. Architectural mismatch — Main already has ChatAdapter::message_limit() trait method (hardcoded to 2000 in DiscordAdapter). The PR should make DiscordAdapter::message_limit() read from PlatformsConfig instead of adding a parallel message_limit field on Handler.


Verdict

🔴 Changes requested — The feature direction is correct, but the implementation needs a full rebase onto current main. Key actions:

  1. Rebase and resolve conflicts with adapter.rs architecture
  2. Use ChatAdapter::message_limit() trait method instead of parallel plumbing
  3. Preserve main's Unicode-safe .chars().count() splitting
  4. Preserve main's 12 existing tests and add the add_indicator tests on top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closing-soon PR missing Discord Discussion URL — will auto-close in 3 days needs-rebase pending-contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants