How to check if docker image hosted on Windows system is actually using the GPU? #1460

ghost · 2023-12-19T17:33:32Z

ghost
Dec 19, 2023

I'm using LocalAI on a system with the RTX4070 GPU with 8GB on a ZBOX barebone. I have configured the docker-compose file to pass through access to the GPU. Also configured use of cuBLAS. But it is actually getting slower than using the CPU setup with 14 Cores.

How can I check if the GPU is actually used by LocalAI? I installed the Nvidia driver and docker-desktop on the host. Do I also have to install other libraries on the host? Do I have to configure the docker service?

When I run docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi I get this output:

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4884 C+G ...\Docker\frontend\Docker Desktop.exe N/A |
| 0 N/A N/A 6752 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 9972 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 10724 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 11608 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 12428 C+G ...oogle\Chrome\Application\chrome.exe N/A |
| 0 N/A N/A 14848 C+G ...crosoft\Edge\Application\msedge.exe N/A |
| 0 N/A N/A 16708 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
+---------------------------------------------------------------------------------------+

So its seems to me that the docker-desktop service on the host ist configured correctly (at least for the standard nvidia image)

dionysius · 2024-01-10T00:34:33Z

dionysius
Jan 10, 2024

12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: using CUDA for GPU acceleration
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: system memory used  =   70.44 MiB
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: VRAM used           = 1634.32 MiB
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: offloading 32 repeating layers to GPU
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: offloading non-repeating layers to GPU
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: offloaded 33/33 layers to GPU

When you see offloaded X/Y layers to GPU for LLMs using llama.cpp. Remember the model config itself has gpu_layers config option that needs to be set. https://localai.io/features/gpu-acceleration/index.html#model-configuration

0 replies

atljoseph · 2024-04-29T22:21:35Z

atljoseph
Apr 29, 2024

Use the nvtop command

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to check if docker image hosted on Windows system is actually using the GPU? #1460

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to check if docker image hosted on Windows system is actually using the GPU? #1460

Uh oh!

Uh oh!

ghost Dec 19, 2023

Replies: 2 comments

Uh oh!

dionysius Jan 10, 2024

Uh oh!

atljoseph Apr 29, 2024

ghost
Dec 19, 2023

dionysius
Jan 10, 2024

atljoseph
Apr 29, 2024