Skip to content

stripReasoningTags mistreats literal <think> in inline code as XML reasoning tag #488

@HarryLee1900

Description

@HarryLee1900

Summary

stripReasoningTags() and extractThinkingContent() in src/card/builder.ts do not protect inline code blocks before scanning for XML reasoning tags. When the answer text contains `<think>` as a literal code reference (inside backticks), the regex <\s*(?:think|...)\s*>[\s\S]*$ treats it as an unclosed reasoning tag and strips everything from <think> to end of text, causing content loss.

Root Cause

The two regex-based functions scan the raw text directly:

// stripReasoningTags — current code
function stripReasoningTags(text: string): string {
  let result = text.replace(
    /<\s*(?:think(?:ing)?|thought|antthinking)\s*>[\s\S]*?<\s*\/\s*(?:...)\s*>/gi, ''
  );
  // This line is the problem:
  result = result.replace(
    /<\s*(?:think(?:ing)?|thought|antthinking)\s*>[\s\S]*$/gi, ''
  );
  result = result.replace(/<\s*\/\s*(?:think(?:ing)?|thought|antthinking)\s*>/gi, '');
  return result.trim();
}

The second regex ([\s\S]*$) is designed to handle streaming content where <think> opens but hasn't been closed yet. But it doesn't distinguish between:

  • A real <think> XML tag: <think>reasoning content...
  • A code reference: `<think>` inside backticks

Steps to Reproduce

  1. Generate a response that includes `<think>` as a code reference in the answer text
  2. stripReasoningTags is called during onPartialReply
  3. The regex matches <think> and strips from it to end of text
const text = "引用 `` `<think>` `` 作为代码参考";
stripReasoningTags(text);
// Expected: "引用 `` `<think>` `` 作为代码参考"
// Actual:   "引用 `` `"
// Everything after <think> is gone

Impact

  • Content loss: any mention of <think> as code (in backticks) in the answer causes content truncation
  • The extractThinkingContent function has the same vulnerability
  • Affects both streaming display (via onPartialReply) and final card rendering

Suggested Fix

Protect inline code blocks (backtick-quoted content) before running the reasoning regex. Replace code blocks with placeholders, run regex, then restore:

function stripReasoningTags(text: string): string {
  // Protect inline code blocks
  const codeBlocks: string[] = [];
  const MARK = '___CB_';
  let result = text.replace(/(`+)(.+?)\1/g, (m) => {
    const idx = codeBlocks.push(m) - 1;
    return `${MARK}${idx}___`;
  });
  // Run reasoning regex on protected text
  result = result.replace(
    /<\s*(?:think(?:ing)?|thought|antthinking)\s*>[\s\S]*?<\s*\/\s*(?:...)\s*>/gi, ''
  );
  result = result.replace(
    /<\s*(?:think(?:ing)?|thought|antthinking)\s*>[\s\S]*$/gi, ''
  );
  result = result.replace(/<\s*\/\s*(?:think(?:ing)?|thought|antthinking)\s*>/gi, '');
  // Restore code blocks
  codeBlocks.forEach((block, i) => {
    result = result.replace(`${MARK}${i}___`, block);
  });
  return result.trim();
}

The same protection should be applied to extractThinkingContent().

Environment

  • Package: @larksuite/openclaw-lark
  • Version: 2026.4.1 (main branch, confirmed still present)
  • File: src/card/builder.tsstripReasoningTags() and extractThinkingContent()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmessagingsrc/messaging/ + src/card/ — message rendering, cards, streamingtrackedIssue is being tracked by the team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions