Skip to content

Commit d128114

Browse files
authored
Update how-to-2.md
1 parent 88e066f commit d128114

File tree

1 file changed

+1
-1
lines changed
  • content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp

1 file changed

+1
-1
lines changed

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ For this demonstration, the experimental setup includes:
1818

1919
- Total number of instances: 3
2020
- Instance type: c8g.4xlarge
21-
- Model: model.gguf (llama-3.1-70B_Q4_0, ~38GB when quantized to 4 bits)
21+
- Model: model.gguf (Llama-3.1-70B_Q4_0, ~38GB when quantized to 4 bits)
2222

2323
One of the three nodes serves as the master node, which physically hosts the model file. The other two nodes act as worker nodes. In `llama.cpp`, remote procedure calls (RPC) offload both the model and the computation over TCP connections between nodes. The master node forwards inference requests to the worker nodes, where computation is performed.
2424

0 commit comments

Comments
 (0)