-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Name and Version
$llama-server --version
version: 6869 (851553ea6)
built with cc (GCC) 15.2.1 20250808 (Red Hat 15.2.1-1) for x86_64-redhat-linux$snap list | grep firefox
firefox 144.0-2 7084 latest/stable mozilla** -Operating Systems
Linux
Which llama.cpp Modules Do You Know To Be Affected?
llama-server
Command Line
llama-server -m gpt-oss-20b-mxfp4.ggufProblem Description & Steps To Reproduce
Background
Pursuant to #16722 closed by PR #16728 as per @allozaur's request.
Mozilla Firefox apparently announced the "gradual rollout" of an "AI Chatbot" feature in January this year following an earlier "initial soft launch". As at writing, the feature does not support out-of-the-box integration with llama.cpp, but can be enabled and customized in about:config with the browser.ml keys. A look at Bugzilla/Core/MachineLearning suggests explicit support is in the works (eg. BZ1970183). 🤷🏼
Issue Description
Llama-server integration with Firefox's AI chatbot feature breaks with overly long queries. Furthermore, when the AI chatbot panel is left open, llama-server appears to hang on just the first time the "Summarize Page" button is pressed and the webpage is too long. On subsequent overly long queries, llama-server appears to correctly return a 414 HTTP error code. Once the AI chatbot panel is closed after making one or more overly long requests, llama-server appears to behave correctly by returning the error code thereafter.
Issue Reproduction
NOTE: It appears that there's an annoyingly sneaky quirk in Firefox where browser.ml.chat.provider must be changed in order for browser.ml.chat.maxLength to kick in.
Integration Limitation: Summarize Long Page Without Changing Max Length
- Run server:
llama-server -m gpt-oss-20b-mxfp4.gguf - In
about:config, setbrowser.ml.chat.providertolocalhost:8080 - Browse to an article and click on "Summarize page"
Errant Behaviour: Summarize Long Page After Changing Max Length
- Restart llama-server to reset prompt cache.
- Change
browser.ml.chat.maxLengthto eg. 100,000 - Close AI chatbot panel and set
browser.ml.chat.providerto blank - Change
browser.ml.chat.providerback tolocalhost:8080 - Clicking on the "Summarize page" button the first time produces no logs (even with
-DCMAKE_BUILD_TYPE=Debugandllama-server --verbose) - Clicking on it a second and third time shows
srv log_server_r: request: GET / 414in llama-server logs each time - Interrupting the llama-server process hangs on "cleaning up before exit..."
- Closing the AI chatbot panel outputs a final 414 before llama-server exits
Initial Investigation
Details
It looks like the external library cpp-httplib sets a compile-time value called CPPHTTPLIB_REQUEST_URI_MAX_LENGTH to 8192. Increasing this value allows for longer query parameters.
It looks like tools/server/utils.hpp might be the right place for a change if there is to be one.
diff --git a/tools/server/utils.hpp b/tools/server/utils.hpp
--- a/tools/server/utils.hpp
+++ b/tools/server/utils.hpp
@@ -13,6 +13,8 @@
#define CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH 1048576
// increase backlog size to avoid connection resets for >> 1 slots
#define CPPHTTPLIB_LISTEN_BACKLOG 512
+// increase max URI length to allow for prompts in q URL parameter
+#define CPPHTTPLIB_REQUEST_URI_MAX_LENGTH 16384
// disable Nagle's algorithm
#define CPPHTTPLIB_TCP_NODELAY true
#include <cpp-httplib/httplib.h>Alternatively, this can be passed to cmake:
cmake -B build -DCMAKE_CXX_FLAGS="-DCPPHTTPLIB_REQUEST_URI_MAX_LENGTH=16384"
Unclear why nothing happens on just the first overly long query. Unclear whether problem is in Firefox or llama-server.
Also, a mildly notable Bugzilla discussion: "are Firefox AI summaries vulnerable to prompt injection?"
Causative Commits
69e9ff0 (Oct 24: q param)
0f2bbe6 (Feb 16: 414 hangs)
Relevant Log Output
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
srv log_server_r: request: GET / 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv update_slots: all slots are idle
srv log_server_r: request: GET /slots 127.0.0.1 200
srv log_server_r: request: GET / 414
srv log_server_r: request: GET / 414
^Csrv operator(): operator(): cleaning up before exit...
srv log_server_r: request: GET / 414