Misc. bug: SveltKit WebUI blocks prompts that are >~1/3 the max context size

### Name and Version

6586 (835b2b91)
built with clang version 19.1.5 for x86_64-pc-windows-msvc

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server -m "Qwen_Qwen3-30B-A3B-Q6_K.gguf" --port 7861 -c 16384 -b 2048 --gpu-layers 99 --flash-attn on --no-mmap --main-gpu 1 --tensor-split 0,100
```

### Problem description & steps to reproduce

The Web UI (over)estimates the prompt's token count yet entirely **blocks** prompts whose estimated token count exceeds the context size. For example, on a 16,384 context window, I provided a 8210-token prompt, and it estimated 23,199 tokens and showed the "Message Too Long" dialog without sending an HTTP request. Additionally, it sent successfully after I tried a few times, but then a chat reached 16,384 tokens, and my next _new chat_ estimated _the same prompt_ to be exactly 16,384 tokens and showed the warning dialog again. (That happened on build 6692.)

Easiest solution: make the dialog _just_ a warning and send the request anyway. (I considered a "Send Anyway" button, but since it's an easy-to-cancel operation, it's probably better to just attempt it.)

Better solution: if the estimate exceeds the context window (and not by >5x), send the prompt + system prompt to /tokenize for a more accurate estimate. Alternatively, don't automatically call /tokenize, but provide a button in the dialog to do so.

### First Bad Commit

https://github.com/ggml-org/llama.cpp/commit/a7a98e0fffed794396b3fbad4dcdbbc184963645

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: SveltKit WebUI blocks prompts that are >~1/3 the max context size #16437

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: SveltKit WebUI blocks prompts that are >~1/3 the max context size #16437

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions