-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval bug: llama-serve ignores SIGINT and SIGTERM when running within a container. #11742
Comments
Should be related to #11731 |
Running the container with |
docker run -i image |
I know how to stop a app that is ignoring signals inside of a container, The issue here is llama-serve should not be ignoring these signals. If you run an app like top and then press ^c it exits instantly. Running with --init does not change the behavior. |
@rhatdan I think there could be something Dockerfile-related and not the llama-server itself. I usually seen the same mistake when people use a shell script as Dockerfile entrypoint that calls another binary, which results in the signal not properly redirected. I noticed that we're using |
Also, your command line runs
|
-i and --init are different keys |
All containers run by RamaLama run with |
|
I don't get why Copied from https://docs.docker.com/reference/cli/docker/container/run/#interactive
But signal is not delivered via stdin... some apps get terminated via Ctrl+D because it signifies EOF, but it does not emits SIGTERM.
@magicse the last comment in this post mentioned about ENTRYPOINT, which is aligned with my speculation above. We should test this theory instead. |
when I run
then run
it immediately stops with
|
Try without --init and after stop container try refresh in browser page with opened llama server or simple close it. |
I tested this locally, will attempt to make similar change in RamaLama. Thanks. Need to make sure we have latet llama.cpp in our containers. |
Name and Version
llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = llvmpipe (LLVM 19.1.3, 256 bits) (llvmpipe) | uma: 0 | fp16: 1 | warp size: 8 | matrix cores: none
ggml_vulkan: Warning: Device type is CPU. This is probably not the device you want.
version: 4607 (aa6fb13)
built with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2) for x86_64-redhat-linux
Operating systems
Linux
GGML backends
Vulkan
Hardware
When we run llama-serve in a podman container, it ignores kill -TERM and kill -INT. Sent from inside of the container and on the outside.
Models
Granite, but I believe this has nothing to do with the model.
Problem description & steps to reproduce
llama-server --port8080 -m/mnt/models/model.file -c2048 --temp0.8 -ngl -1 --host0.0.0.0
First Bad Commit
/bin/ramalama --image quay.io/ramalama/vulkan bench granite
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (RPL-S) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
Relevant log output
The text was updated successfully, but these errors were encountered: