Skip to content

Conversation

ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Oct 10, 2025

webui: remove client-side context pre-check and rely on backend for limits

Removed the client-side context window pre-check and now simply sends messages
while keeping the dialog imports limited to core components, eliminating the
maximum context alert path

Simplified streaming and non-streaming chat error handling to surface a generic
'No response received from server' error whenever the backend returns no content

Removed the obsolete maxContextError plumbing from the chat store so state
management now focuses on the core message flow without special context-limit cases

  • fix: make SSE client resilient to premature [DONE] events in multi-turn agentic proxy chains (like legacy WebUI), ensuring all SSE chunks are displayed until the TCP stream fully closes.

close #16437

Master branch :
https://github.com/user-attachments/assets/edc1337d-2e19-4f99-a7ba-78f40146022f

This PR (don't care about Model Selector) :
https://github.com/user-attachments/assets/e9952e04-e189-434f-8536-84184193d704

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ServeurpersoCom

Just a few cosmetic changes 😄 also, could u add screenshots/video to the PR description with comparison of before/after changes? Will be great for adding context for the future lookback.

@ServeurpersoCom
Copy link
Collaborator Author

@ServeurpersoCom

Just a few cosmetic changes 😄 also, could u add screenshots/video to the PR description with comparison of before/after changes? Will be great for adding context for the future lookback.

I’d love to make a temporary mini version of the model selector : just a simple field in Settings to declare the model in the JSON request. That way my llama-swap would work on master, and I could make videos of the master branch more easily!

@ServeurpersoCom
Copy link
Collaborator Author

I’ve added two videos, running on my Raspberry Pi 5 (16 GB) with Qwen3 30B A3B, fully synced with the master branch. You can see the bug where I got stuck : once the context overflows, the interface is completely blocked until you hit F5.

With the current PR build, it’s much better: if a message block is too large, it can still slip into the context and needs to be deleted manually. But since the backend decides, it never fully blocks. We could still improve it a bit by preventing oversized messages from being sent into the context in the first place.

@ServeurpersoCom
Copy link
Collaborator Author

Toolcall testing (Node.js proxy)

Google.what.the.weather-AVC-750kbps.mp4

@ggerganov
Copy link
Member

@ServeurpersoCom Curious are you doing some OCR in the last video to detect text elements in the screenshots? Would love to learn more, but maybe after the PR is reviewed to avoid getting offtopic.

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ServeurpersoCom let's just rebuild fresh webui static output and we good to go :)

ServeurpersoCom and others added 7 commits October 12, 2025 12:20
…imits

Removed the client-side context window pre-check and now simply sends messages
while keeping the dialog imports limited to core components, eliminating the
maximum context alert path

Simplified streaming and non-streaming chat error handling to surface a generic
'No response received from server' error whenever the backend returns no content

Removed the obsolete maxContextError plumbing from the chat store so state
management now focuses on the core message flow without special context-limit cases
@ServeurpersoCom ServeurpersoCom force-pushed the ctxsize-rely-on-backend branch from 28badc5 to be85c24 Compare October 12, 2025 10:21
@allozaur
Copy link
Collaborator

@ServeurpersoCom actually I will improve the UI/UX of the new Alert Dialog in a separate PR so that we don't block this change :)

@allozaur allozaur merged commit 81d54bb into ggml-org:master Oct 12, 2025
14 checks passed
@ServeurpersoCom
Copy link
Collaborator Author

@ServeurpersoCom Curious are you doing some OCR in the last video to detect text elements in the screenshots? Would love to learn more, but maybe after the PR is reviewed to avoid getting offtopic.

Not OCR : the proxy just parses streamed text and DOM elements in real time.
I can still use OCR separately when needed (reading screenshots, captchas, etc.).

The model actually sees the entire page: it can analyze the full DOM and reach elements outside the viewport through an abstraction layer that simulates human actions (scroll, click, type).
That makes it effectively undetectable by anti-bot systems, while keeping inference fully streamed through the SSE proxy.

ToolCall

@ServeurpersoCom
Copy link
Collaborator Author

@ServeurpersoCom actually I will improve the UI/UX of the new Alert Dialog in a separate PR so that we don't block this change :)

Awesome can’t wait to see your pure Svelte touch on that dialog 😄

@ggerganov
Copy link
Member

Nice. So this seems like some sort of ingenious way to control a headless? browser with an LLM. And the images in the WebUI are just "progress report" from the browser. It's a bit over my head, but definitely looks interesting.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 12, 2025

Exactly, but not headless, full real browser with GPU capability (inside a software box) ! the goal is to convert the DOM (with all bounding boxes) into labeled text tokens for the LLM.
When the model wants to interact with a label, it also gets the list of possible actions on it.
The ToolCall + abstraction layer then takes care of reaching the right area (scrolling if needed) and performing what the LLM asked : typing or clicking.
It’s all heuristic logic, way simpler than the CUDA kernels in llama.cpp 😄

Idea: we could add a small module in llama.cpp that exposes every ToolCall event through a user-defined HTTP hook : that would let anyone easily connect their model to external actions or systems!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: SveltKit WebUI blocks prompts that are >~1/3 the max context size

3 participants