You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meta has written the following PyTorch blog: https://pytorch.org/blog/cuda-free-inference-for-llms
They have evaluated Llama3-8B using Triton on A100 and H100 GPUs.
We should do the same for PVC after porting the code.
3 - to run (update model path and tokenizer to your local drive) :
CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python inference.py --architecture=llama --variant=3-8b --tokenizer="/net/storage149/autofs/css22/nmg/models/llama3-8b/base" --model_path="/net/storage149/autofs/css22/nmg/models/llama3-8b/base" --device_type cuda --model_source hf --compile
4 - script options are controlled in : https://github.com/.../blob/amd_attn/scripts/inference.py
The text was updated successfully, but these errors were encountered:
vlad-penkin
changed the title
[sycl-free-inference-for-llms] Port and evaluate amd_attn
[sycl-free-inference-for-llms] Port and evaluate Llama3-8B
Sep 11, 2024
vlad-penkin
changed the title
[sycl-free-inference-for-llms] Port and evaluate Llama3-8B
[sycl-free-inference-for-llms] Port and evaluate LLama3-8B and Granite-8B
Sep 11, 2024
Meta has written the following PyTorch blog: https://pytorch.org/blog/cuda-free-inference-for-llms
They have evaluated
Llama3-8B
using Triton on A100 and H100 GPUs.We should do the same for PVC after porting the code.
The instructions are as follows:
1 -Get the code:
git clone https://github.com/AdnanHoque/foundation-model-stack.git
git checkout amd_attn cd foundation-model-stack pip install -e. cd scripts/
2 - weights and tokenizer from: https://huggingface.co/meta-llama/Meta-Llama-3-8B/tree/main
3 - to run (update model path and tokenizer to your local drive) :
CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python inference.py --architecture=llama --variant=3-8b --tokenizer="/net/storage149/autofs/css22/nmg/models/llama3-8b/base" --model_path="/net/storage149/autofs/css22/nmg/models/llama3-8b/base" --device_type cuda --model_source hf --compile
4 - script options are controlled in : https://github.com/.../blob/amd_attn/scripts/inference.py
The text was updated successfully, but these errors were encountered: