Misc. bug: llama-server integration with Firefox's AI chatbot feature breaks with overly long queries

### Name and Version

``` shell
$llama-server --version
version: 6869 (851553ea6)
built with cc (GCC) 15.2.1 20250808 (Red Hat 15.2.1-1) for x86_64-redhat-linux
```
``` shell
$snap list | grep firefox
firefox 144.0-2 7084 latest/stable mozilla** -
```

### Operating Systems

Linux

### Which llama.cpp Modules Do You Know To Be Affected?

llama-server

### Command Line

```shell
llama-server -m gpt-oss-20b-mxfp4.gguf
```

### Problem Description & Steps To Reproduce


#### Background

Pursuant to #16722 closed by PR #16728 as per @allozaur's [request](https://github.com/ggml-org/llama.cpp/issues/16722#issuecomment-3456365452).
Mozilla Firefox [apparently announced](https://connect.mozilla.org/t5/discussions/ai-chatbot-at-your-service/m-p/84920/) the "gradual rollout" of an "AI Chatbot" feature in January this year following an earlier "initial soft launch". As at writing, the feature does not support out-of-the-box integration with llama.cpp, but can be enabled and customized in `about:config` with the `browser.ml` keys. A look at Bugzilla/Core/MachineLearning suggests explicit support is in the works (eg. [BZ1970183](https://bugzilla.mozilla.org/show_bug.cgi?id=1970183)). 🤷🏼

#### Issue Description

Llama-server integration with Firefox's AI chatbot feature breaks with overly long queries. Furthermore, when the AI chatbot panel is left open, llama-server appears to hang on just the first time the "Summarize Page" button is pressed and the webpage is too long. On subsequent overly long queries, llama-server appears to correctly return a 414 HTTP error code. Once the AI chatbot panel is closed after making one or more overly long requests, llama-server appears to behave correctly by returning the error code thereafter.

#### Issue Reproduction

**NOTE:** It appears that there's an annoyingly sneaky quirk in Firefox where `browser.ml.chat.provider` must be changed in order for `browser.ml.chat.maxLength` to kick in.
<details><summary>Integration Limitation: Summarize Long Page Without Changing Max Length</summary>


1. Run server: `llama-server -m gpt-oss-20b-mxfp4.gguf`
2. In `about:config`, set `browser.ml.chat.provider` to `localhost:8080`
3. Browse to an article and click on "Summarize page"

<img width="1078" height="894" alt="Image" src="https://github.com/user-attachments/assets/acefbfed-d908-4ada-a725-4242472bc8ef" />


</details>

<details><summary>Errant Behaviour: Summarize Long Page After Changing Max Length</summary>


1. Restart llama-server to reset prompt cache.
2. Change `browser.ml.chat.maxLength` to eg. 100,000
3. Close AI chatbot panel and set `browser.ml.chat.provider` to blank
4. Change `browser.ml.chat.provider` back to `localhost:8080`
5. Clicking on the "Summarize page" button the first time produces no logs (even with `-DCMAKE_BUILD_TYPE=Debug` and `llama-server --verbose`)
6. Clicking on it a second and third time shows `srv log_server_r: request: GET / 414` in llama-server logs each time
7. Interrupting the llama-server process hangs on "cleaning up before exit..."
8. Closing the AI chatbot panel outputs a final 414 before llama-server exits

</details>

#### Initial Investigation

<details><summary>Details</summary>


It looks like the external library `cpp-httplib` sets a compile-time value called `CPPHTTPLIB_REQUEST_URI_MAX_LENGTH` to 8192. Increasing this value allows for longer query parameters.

<details><summary>It looks like <code>tools/server/utils.hpp</code> might be the right place for a change if there is to be one.</summary>


``` diff
diff --git a/tools/server/utils.hpp b/tools/server/utils.hpp
--- a/tools/server/utils.hpp
+++ b/tools/server/utils.hpp
@@ -13,6 +13,8 @@
 #define CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH 1048576
 // increase backlog size to avoid connection resets for >> 1 slots
 #define CPPHTTPLIB_LISTEN_BACKLOG 512
+// increase max URI length to allow for prompts in q URL parameter
+#define CPPHTTPLIB_REQUEST_URI_MAX_LENGTH 16384
 // disable Nagle's algorithm
 #define CPPHTTPLIB_TCP_NODELAY true
 #include <cpp-httplib/httplib.h>
```

Alternatively, this can be passed to cmake:

`cmake -B build -DCMAKE_CXX_FLAGS="-DCPPHTTPLIB_REQUEST_URI_MAX_LENGTH=16384"`


</details> 

Unclear why nothing happens on just the first overly long query. Unclear whether problem is in Firefox or llama-server.

Also, a mildly notable Bugzilla discussion: ["are Firefox AI summaries vulnerable to prompt injection?"](https://bugzilla.mozilla.org/show_bug.cgi?id=1987081)


</details> 

### Causative Commits

https://github.com/ggml-org/llama.cpp/commit/69e9ff010309a1155d704cf9320bdb3aaf4160ca (Oct 24: q param)
https://github.com/ggml-org/llama.cpp/commit/0f2bbe656473177538956d22b6842bcaa0449fab (Feb 16: 414 hangs)

### Relevant Log Output

``` shell
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
srv log_server_r: request: GET / 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv update_slots: all slots are idle
srv log_server_r: request: GET /slots 127.0.0.1 200
srv log_server_r: request: GET / 414
srv log_server_r: request: GET / 414
^Csrv operator(): operator(): cleaning up before exit...
srv log_server_r: request: GET / 414
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: llama-server integration with Firefox's AI chatbot feature breaks with overly long queries #16830

Name and Version

Operating Systems

Which llama.cpp Modules Do You Know To Be Affected?

Command Line

Problem Description & Steps To Reproduce

Background

Issue Description

Issue Reproduction

Initial Investigation

Causative Commits

Relevant Log Output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: llama-server integration with Firefox's AI chatbot feature breaks with overly long queries #16830

Description

Name and Version

Operating Systems

Which llama.cpp Modules Do You Know To Be Affected?

Command Line

Problem Description & Steps To Reproduce

Background

Issue Description

Issue Reproduction

Initial Investigation

Causative Commits

Relevant Log Output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions