Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: llama-serve ignores SIGINT and SIGTERM when running within a container. #11742

Closed
rhatdan opened this issue Feb 7, 2025 · 13 comments
Closed

Comments

@rhatdan
Copy link

rhatdan commented Feb 7, 2025

Name and Version

llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = llvmpipe (LLVM 19.1.3, 256 bits) (llvmpipe) | uma: 0 | fp16: 1 | warp size: 8 | matrix cores: none
ggml_vulkan: Warning: Device type is CPU. This is probably not the device you want.
version: 4607 (aa6fb13)
built with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2) for x86_64-redhat-linux

Operating systems

Linux

GGML backends

Vulkan

Hardware

When we run llama-serve in a podman container, it ignores kill -TERM and kill -INT. Sent from inside of the container and on the outside.

Models

Granite, but I believe this has nothing to do with the model.

Problem description & steps to reproduce

llama-server --port8080 -m/mnt/models/model.file -c2048 --temp0.8 -ngl -1 --host0.0.0.0

First Bad Commit

/bin/ramalama --image quay.io/ramalama/vulkan bench granite
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (RPL-S) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none

model size params backend ngl test t/s
^C^C

Relevant log output

None,
@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

Should be related to #11731

@kth8
Copy link

kth8 commented Feb 8, 2025

Running the container with --init will properly forward the signals.

@magicse
Copy link
Contributor

magicse commented Feb 9, 2025

@rhatdan
Copy link
Author

rhatdan commented Feb 10, 2025

I know how to stop a app that is ignoring signals inside of a container, The issue here is llama-serve should not be ignoring these signals. If you run an app like top and then press ^c it exits instantly.

Running with --init does not change the behavior.

@ngxson
Copy link
Collaborator

ngxson commented Feb 10, 2025

@rhatdan I think there could be something Dockerfile-related and not the llama-server itself. I usually seen the same mistake when people use a shell script as Dockerfile entrypoint that calls another binary, which results in the signal not properly redirected.

I noticed that we're using ENTRYPOINT ["/app/tools.sh"] in https://github.com/ggerganov/llama.cpp/blob/master/.devops/vulkan.Dockerfile#L89 , potentially due to this? Can you try overwriting the entrypoint to bypass the tools.sh script, something like docker --entrypoint ... ?

@ngxson
Copy link
Collaborator

ngxson commented Feb 10, 2025

Also, your command line runs bench, not server

/bin/ramalama --image quay.io/ramalama/vulkan bench

@magicse
Copy link
Contributor

magicse commented Feb 11, 2025

I know how to stop a app that is ignoring signals inside of a container, The issue here is llama-serve should not be ignoring these signals. If you run an app like top and then press ^c it exits instantly.

Running with --init does not change the behavior.

-i and --init are different keys

@rhatdan
Copy link
Author

rhatdan commented Feb 11, 2025

All containers run by RamaLama run with -i

@kth8
Copy link

kth8 commented Feb 11, 2025

-i is short for --interactive, not --init

@ngxson
Copy link
Collaborator

ngxson commented Feb 11, 2025

I don't get why -i is important here.

Copied from https://docs.docker.com/reference/cli/docker/container/run/#interactive

The --interactive (or -i) flag keeps the container's STDIN open, and lets you send input to the container through standard input.

But signal is not delivered via stdin... some apps get terminated via Ctrl+D because it signifies EOF, but it does not emits SIGTERM.

https://forums.docker.com/t/docker-run-cannot-be-killed-with-ctrl-c/13108/11

@magicse the last comment in this post mentioned about ENTRYPOINT, which is aligned with my speculation above. We should test this theory instead.

@kth8
Copy link

kth8 commented Feb 11, 2025

when I run llama-server inside a container with

podman run -d --name llama1b --init -p 8001:8080/tcp ghcr.io/kth8/llama-server:llama-3.2-1b-instruct

then run

podman stop llama1b

it immediately stops with Exited (143). Without --init it will hang for 10 seconds before getting killed with Exited (137)

WARN[0010] StopSignal SIGTERM failed to stop container llama1b in 10 seconds, resorting to SIGKILL

@magicse
Copy link
Contributor

magicse commented Feb 11, 2025

WARN[0010] StopSignal SIGTERM failed to stop container llama1b in 10 seconds, resorting to SIGKILL

Try without --init and after stop container try refresh in browser page with opened llama server or simple close it.

@rhatdan
Copy link
Author

rhatdan commented Feb 11, 2025

I tested this locally, will attempt to make similar change in RamaLama. Thanks. Need to make sure we have latet llama.cpp in our containers.

@rhatdan rhatdan closed this as completed Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants