Skip to content

Misc. bug: llama-server integration with Firefox's AI chatbot feature breaks with overly long queries #16830

@chansikpark

Description

@chansikpark

Name and Version

$llama-server --version
version: 6869 (851553ea6)
built with cc (GCC) 15.2.1 20250808 (Red Hat 15.2.1-1) for x86_64-redhat-linux
$snap list | grep firefox
firefox            144.0-2                         7084   latest/stable  mozilla**       -

Operating Systems

Linux

Which llama.cpp Modules Do You Know To Be Affected?

llama-server

Command Line

llama-server -m gpt-oss-20b-mxfp4.gguf

Problem Description & Steps To Reproduce

Background

Pursuant to #16722 closed by PR #16728 as per @allozaur's request.
Mozilla Firefox apparently announced the "gradual rollout" of an "AI Chatbot" feature in January this year following an earlier "initial soft launch". As at writing, the feature does not support out-of-the-box integration with llama.cpp, but can be enabled and customized in about:config with the browser.ml keys. A look at Bugzilla/Core/MachineLearning suggests explicit support is in the works (eg. BZ1970183). 🤷🏼

Issue Description

Llama-server integration with Firefox's AI chatbot feature breaks with overly long queries. Furthermore, when the AI chatbot panel is left open, llama-server appears to hang on just the first time the "Summarize Page" button is pressed and the webpage is too long. On subsequent overly long queries, llama-server appears to correctly return a 414 HTTP error code. Once the AI chatbot panel is closed after making one or more overly long requests, llama-server appears to behave correctly by returning the error code thereafter.

Issue Reproduction

NOTE: It appears that there's an annoyingly sneaky quirk in Firefox where browser.ml.chat.provider must be changed in order for browser.ml.chat.maxLength to kick in.

Integration Limitation: Summarize Long Page Without Changing Max Length

  1. Run server: llama-server -m gpt-oss-20b-mxfp4.gguf
  2. In about:config, set browser.ml.chat.provider to localhost:8080
  3. Browse to an article and click on "Summarize page"
Image

Errant Behaviour: Summarize Long Page After Changing Max Length

  1. Restart llama-server to reset prompt cache.
  2. Change browser.ml.chat.maxLength to eg. 100,000
  3. Close AI chatbot panel and set browser.ml.chat.provider to blank
  4. Change browser.ml.chat.provider back to localhost:8080
  5. Clicking on the "Summarize page" button the first time produces no logs (even with -DCMAKE_BUILD_TYPE=Debug and llama-server --verbose)
  6. Clicking on it a second and third time shows srv log_server_r: request: GET / 414 in llama-server logs each time
  7. Interrupting the llama-server process hangs on "cleaning up before exit..."
  8. Closing the AI chatbot panel outputs a final 414 before llama-server exits

Initial Investigation

Details

It looks like the external library cpp-httplib sets a compile-time value called CPPHTTPLIB_REQUEST_URI_MAX_LENGTH to 8192. Increasing this value allows for longer query parameters.

It looks like tools/server/utils.hpp might be the right place for a change if there is to be one.

diff --git a/tools/server/utils.hpp b/tools/server/utils.hpp
--- a/tools/server/utils.hpp
+++ b/tools/server/utils.hpp
@@ -13,6 +13,8 @@
 #define CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH 1048576
 // increase backlog size to avoid connection resets for >> 1 slots
 #define CPPHTTPLIB_LISTEN_BACKLOG 512
+// increase max URI length to allow for prompts in q URL parameter
+#define CPPHTTPLIB_REQUEST_URI_MAX_LENGTH 16384
 // disable Nagle's algorithm
 #define CPPHTTPLIB_TCP_NODELAY true
 #include <cpp-httplib/httplib.h>

Alternatively, this can be passed to cmake:

cmake -B build -DCMAKE_CXX_FLAGS="-DCPPHTTPLIB_REQUEST_URI_MAX_LENGTH=16384"

Unclear why nothing happens on just the first overly long query. Unclear whether problem is in Firefox or llama-server.

Also, a mildly notable Bugzilla discussion: "are Firefox AI summaries vulnerable to prompt injection?"

Causative Commits

69e9ff0 (Oct 24: q param)
0f2bbe6 (Feb 16: 414 hangs)

Relevant Log Output

main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
srv  log_server_r: request: GET / 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  update_slots: all slots are idle
srv  log_server_r: request: GET /slots 127.0.0.1 200
srv  log_server_r: request: GET /  414
srv  log_server_r: request: GET /  414
^Csrv    operator(): operator(): cleaning up before exit...
srv  log_server_r: request: GET /  414

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions