Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it #11929

maglore9900 · 2025-02-17T16:04:05Z

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3070, compute capability 8.6, VMM: yes
version: 4735 (73e2ed3)
built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu

is built per the RPC instructions, and launches fine

the other system attempting to use also ubuntu but without a GPU
version: 4735 (73e2ed3)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

`bin/llama-cli -m ../models/llama-3.2-3b-instruct-q4_k_m.gguf` no issues
`bin/llama-cli -m ../models/llama-3.2-3b-instruct-q4_k_m.gguf --rpc 10.0.0.125:52415` issue listed below

Problem description & steps to reproduce

it was built with the cmake -B build -DGGML_RPC=ON command, compiles fine

I can run bin/llama-cli -m ../models/llama-3.2-3b-instruct-q4_k_m.gguf without any issues, I can run the same command on the other system without any issues

but when I run
bin/llama-cli -m ../models/llama-3.2-3b-instruct-q4_k_m.gguf --rpc 10.0.0.125:52415
I get the following error

build: 4735 (73e2ed3c) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
/mnt/test_zone/llama.cpp/ggml/src/ggml-rpc/ggml-rpc.cpp:755: GGML_ASSERT(status) failed
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

I have tried rebuilding, googling, review the docs. But I cannot figure out why I get this error

I have confirmed that both systems can communicate with each other on my network

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

maglore9900 · 2025-02-17T16:55:41Z

Here is with the command run with sudo so I can ptrace

build: 4735 (73e2ed3c) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
/mnt/test_zone/llama.cpp/ggml/src/ggml-rpc/ggml-rpc.cpp:755: GGML_ASSERT(status) failed
[New LWP 380474]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x0000706684aea42f in __GI___wait4 (pid=380475, stat_loc=0x7ffc128ba924, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x0000706684aea42f in __GI___wait4 (pid=380475, stat_loc=0x7ffc128ba924, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000070668505bd5a in ggml_abort () from /mnt/test_zone/llama.cpp/build/bin/libggml-base.so
#2  0x0000706684c32f30 in ggml_backend_rpc_get_device_memory () from /mnt/test_zone/llama.cpp/build/bin/libggml-rpc.so
#3  0x000070668518369f in llama_model_load_from_file_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, llama_model_params) () from /mnt/test_zone/llama.cpp/build/bin/libllama.so
#4  0x00007066851843c6 in llama_model_load_from_file () from /mnt/test_zone/llama.cpp/build/bin/libllama.so
#5  0x000058448083d111 in common_init_from_params(common_params&) ()
#6  0x00005844807e11f0 in main ()
[Inferior 1 (process 380473) detached]
Aborted

br00t4c · 2025-02-19T13:27:11Z

I run into the same issue when attempting to offload model layers to a CPU-only RPC backend

maglore9900 · 2025-02-19T16:46:15Z

Is it necessary to run the llama-cli command with the --rpc flag ONLY on a system that is also running the RPC?

If so, this would indicate that only a system with a GPU can take advantage of the RPC option, and would account for the error I am receiving.

maglore9900 added the bug-unconfirmed label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it #11929

Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it #11929

maglore9900 commented Feb 17, 2025

maglore9900 commented Feb 17, 2025

br00t4c commented Feb 19, 2025

maglore9900 commented Feb 19, 2025

Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it #11929

Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it #11929

Comments

maglore9900 commented Feb 17, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

maglore9900 commented Feb 17, 2025

br00t4c commented Feb 19, 2025

maglore9900 commented Feb 19, 2025