Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sycl-free-inference-for-llms] Port and evaluate LLama3-8B and Granite-8B #2170

Open
etiotto opened this issue Sep 9, 2024 · 1 comment
Open

Comments

@etiotto
Copy link
Contributor

etiotto commented Sep 9, 2024

Meta has written the following PyTorch blog: https://pytorch.org/blog/cuda-free-inference-for-llms
They have evaluated Llama3-8B using Triton on A100 and H100 GPUs.
We should do the same for PVC after porting the code.

The instructions are as follows:

1 -Get the code:
git clone https://github.com/AdnanHoque/foundation-model-stack.git
git checkout amd_attn cd foundation-model-stack pip install -e. cd scripts/
2 - weights and tokenizer from: https://huggingface.co/meta-llama/Meta-Llama-3-8B/tree/main

3 - to run (update model path and tokenizer to your local drive) :
CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python inference.py --architecture=llama --variant=3-8b --tokenizer="/net/storage149/autofs/css22/nmg/models/llama3-8b/base" --model_path="/net/storage149/autofs/css22/nmg/models/llama3-8b/base" --device_type cuda --model_source hf --compile
4 - script options are controlled in : https://github.com/.../blob/amd_attn/scripts/inference.py

@vlad-penkin vlad-penkin changed the title Port and evaluate amd_attn [sycl-free-inference-for-llms] Port and evaluate amd_attn Sep 11, 2024
@vlad-penkin vlad-penkin changed the title [sycl-free-inference-for-llms] Port and evaluate amd_attn [sycl-free-inference-for-llms] Port and evaluate Llama3-8B Sep 11, 2024
@vlad-penkin vlad-penkin changed the title [sycl-free-inference-for-llms] Port and evaluate Llama3-8B [sycl-free-inference-for-llms] Port and evaluate LLama3-8B and Granite-8B Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants