- Launch the batch inference template
- Update the Head Node Type to the desired machine type
Model | Node Type |
---|---|
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 | g6e.12xlarge |
neuralmagic/Meta-Llama-3.1-7B-Instruct-FP8 | g6e.xlarge |
- Run the following script on the template:
bash run_70b.sh
# bash run_8b.sh