Skip to content

Commit fee356f

Browse files
authored
Merge pull request #2251 from madeline-underwood/distrib_int_update
Distrib_int_PV to sign off
2 parents 5535b79 + d128114 commit fee356f

File tree

2 files changed

+7
-7
lines changed
  • content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp

2 files changed

+7
-7
lines changed

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2-
title: Distributed inference using llama.cpp
2+
title: Run distributed inference with llama.cpp on Arm-based AWS Graviton4 instances
33

44
minutes_to_complete: 30
55

6-
who_is_this_for: This introductory topic is for developers with some experience using llama.cpp who want to learn distributed inference.
6+
who_is_this_for: This introductory topic is for developers with some experience using llama.cpp who want to learn how to run distributed inference on Arm-based servers.
77

88
learning_objectives:
99
- Set up a main host and worker nodes with llama.cpp

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-1.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,16 @@ This example runs on three AWS Graviton4 `c8g.4xlarge` instances. Each instance
1313
In this Learning Path, you will:
1414

1515
- Download Meta's [Llama 3.1 70B parameter model](https://huggingface.co/meta-llama/Llama-3.1-70B).
16-
- Download and build `llama.cpp`, a C++ library for efficient CPU inference of LLaMA and similar large language models on CPUs, optimized for local and embedded environments.
16+
- Download and build `llama.cpp`, a C++ library for efficient CPU inference of Llama and similar large language models on CPUs, optimized for local and embedded environments.
1717
- Convert Meta's `safetensors` files to a single GGUF file.
1818
- Quantize the 16-bit GGUF weights file to 4-bit weights.
1919
- Load and run the model.
2020

2121
{{% notice Note %}}
22-
The **Reading time** shown on the **Introduction** page does not include downloading, converting, and quantizing the model. These steps can take 1-2 hours. If you already have a quantized GGUF file, you can skip the download and quantization.
22+
The **Reading time** shown on the **Introduction** page does not include downloading, converting, and quantizing the model. These steps can take several hours depending on bandwidth and system resources. If you already have a quantized GGUF file, you can skip the download and quantization.
2323
{{% /notice %}}
2424

25-
## Set up dependencies
25+
## Install dependencies
2626

2727
Before you start, make sure you have permission to access Meta's [Llama 3.1 70B parameter model](https://huggingface.co/meta-llama/Llama-3.1-70B).
2828

@@ -35,7 +35,7 @@ You must repeat the install steps on each device. However, only run the download
3535
```bash
3636
apt update
3737
apt install -y python3.12-venv
38-
python3 -m venv myenv
38+
python3.12 -m venv myenv
3939
source myenv/bin/activate
4040
```
4141

@@ -188,4 +188,4 @@ Allowed quantization types:
188188
32 or BF16 : 14.00G, -0.0050 ppl @ Mistral-7B
189189
0 or F32 : 26.00G @ 7B
190190
COPY : only copy tensors, no quantizing
191-
```
191+
```

0 commit comments

Comments
 (0)