Skip to content

Conversation

ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Oct 10, 2025

fix: add remark plugin to render raw HTML as literal text

Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs
do ensuring consistent and safe Markdown rendering

Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the
Markdown AST into plain-text equivalents while preserving indentation and
line breaks. This ensures consistent rendering and prevents unintended HTML
execution, without altering valid Markdown structure

Kept 'remarkRehype' in the pipeline since it performs the required conversion
from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization

Refined the link-enhancement logic to skip unnecessary DOM rewrites,
fixing a subtle bug where extra paragraphs were injected after the first
line due to full innerHTML reconstruction, and ensuring links open in new
tabs only when required

Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml
-> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify

Close #16417

Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs
do ensuring consistent and safe Markdown rendering

Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the
Markdown AST into plain-text equivalents while preserving indentation and
line breaks. This ensures consistent rendering and prevents unintended HTML
execution, without altering valid Markdown structure

Kept 'remarkRehype' in the pipeline since it performs the required conversion
from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization

Refined the link-enhancement logic to skip unnecessary DOM rewrites,
fixing a subtle bug where extra paragraphs were injected after the first
line due to full innerHTML reconstruction, and ensuring links open in new
tabs only when required

Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml
-> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify
@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 10, 2025

Test sheet

reasoning_content:
all must render as plain text

Final content:
markdown -> must render normally
markdown link -> must be clickable and open in a new tab
html without code block -> must render as plain text
html link tag -> must render as plain text and URL not clickable
html in code block -> must render as plain text + syntax-highlighted
latex in code block -> must render as plain text
latex outside markdown -> must render as plain text
latex inside markdown (nominal LLM case) -> must render normally

This patch aligns the WebUI Markdown pipeline with industry-standard LLM renderers (OpenAI ChatGPT, Hugging Face Spaces, Anthropic...) by ensuring raw HTML safety without sacrificing formatting fidelity

This patch doesn't just "sanitize HTML" : it neutralizes raw XML-like output (e.g. <think>, <tool>, <meta>, <response>, <step>, <node>, <data>), ensuring these symbolic or structural tags, whether produced by LLMs or part of generic XML fragments, are displayed as plain text rather than parsed as DOM, preserving structure while keeping the UI safe and consistent.

@ServeurpersoCom
Copy link
Collaborator Author

Sans titre

@zzokkolma
Copy link

I tested this PR and it seems to solve the missing HTML output part, however I did run into a formatting issue.
format-1

It is probably caused by the fact that the model output an empty indented line before that tag
format-2

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 11, 2025

Interesting case ! I just ran a quick test inside ChatGPT's own WebUI, and it fails in exactly the same way 😅
Even their renderer neutralizes raw HTML and collapses line breaks when HTML is escaped inside Markdown.
So what you're seeing is consistent with how most production-safe Markdown pipelines behave when remark-rehype or rehype-sanitize flatten the tree.

Here's a screenshot from that test:
GPT

I'll dig a bit deeper, but it really confirms that our remarkLiteralHtml stage is the right approach keeping structure visible while neutralizing unsafe tags.

@ServeurpersoCom
Copy link
Collaborator Author

To reproduce the issue now, you need to explicitly ask the model to output XML-like tags in the stream, which is already a bit of a hack, since LLMs naturally know they’re emitting Markdown.
So this goes slightly beyond normal usage, and as long as it’s only a rendering glitch, it’s probably better to keep the code simple rather than over-engineering for rare edge cases that could degrade the quality of conventional Markdown rendering.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 11, 2025

Test prompt :

Write HTML with real blank lines and indentation inside a code block and then output the same HTML outside a code block, so we can compare the rendering.

https://chatgpt.com/share/68ea480b-5c3c-8012-9201-62cfb687dc67

And also on llama.cpp with this PR :

conversation_6b69f066-c32c-4b0b-9f0d-92dad9c31764_tu_peux_crire_exacte.json

At this point, we’re actually doing slightly better than some major LLM WebUIs so that’s a good sign 😄
Let’s stop here before over-tuning it; the current behavior is safe, consistent, and covers all realistic use cases.

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff overall! Just left a few architectural remarks that need to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: webui is dropping or omitting model's HTML output

3 participants