Skip to content

Misc. bug: SveltKit WebUI blocks prompts that are >~1/3 the max context size #16437

@dpmm99

Description

@dpmm99

Name and Version

6586 (835b2b9)
built with clang version 19.1.5 for x86_64-pc-windows-msvc

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m "Qwen_Qwen3-30B-A3B-Q6_K.gguf" --port 7861 -c 16384 -b 2048 --gpu-layers 99 --flash-attn on --no-mmap --main-gpu 1 --tensor-split 0,100

Problem description & steps to reproduce

The Web UI (over)estimates the prompt's token count yet entirely blocks prompts whose estimated token count exceeds the context size. For example, on a 16,384 context window, I provided a 8210-token prompt, and it estimated 23,199 tokens and showed the "Message Too Long" dialog without sending an HTTP request. Additionally, it sent successfully after I tried a few times, but then a chat reached 16,384 tokens, and my next new chat estimated the same prompt to be exactly 16,384 tokens and showed the warning dialog again. (That happened on build 6692.)

Easiest solution: make the dialog just a warning and send the request anyway. (I considered a "Send Anyway" button, but since it's an easy-to-cancel operation, it's probably better to just attempt it.)

Better solution: if the estimate exceeds the context window (and not by >5x), send the prompt + system prompt to /tokenize for a more accurate estimate. Alternatively, don't automatically call /tokenize, but provide a button in the dialog to do so.

First Bad Commit

a7a98e0

Relevant log output

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions