Skip to content

gbecigneul/llama-runpod-finetuning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run llama.cpp on RunPod

Description

RunPod provides a cheap serverless GPU service that allows to simply serve AI models. They handle queuing and auto-scaling.

You just have to provide a Docker image. This repository contains instructions to build your own image for any model.

Steps

  1. Clone this repository
  2. Choose a model and download it to the workspace directory. Here we use this model with 7B parameters.
wget -P workspace https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q4_K_M.gguf
  1. Build the Docker image. Create a llama-runpod-finetune repository on Docker Hub and replace your-docker-hub-login with your login.
docker build -t llama-runpod-finetune .
docker tag llama-runpod-finetune your-docker-hub-login/llama-runpod:latest
docker push your-docker-hub-login/llama-runpod-finetune:latest
  1. Go to RunPod's serverless console and create a template:

RunPod template

You can pass the arguments to llama_cpp in the LLAMA_ARGS environment variable. Here are mine:

{"model": "llama-2-7b.Q4_K_M.gguf", "n_gpu_layers": -1}

n_gpu_layers is set to -1 to offload all layers to the GPU.

  1. Create the endpoint:

RunPod endpoint

  1. Profit!

Replace ENDPOINT_ID and API_KEY with your own values. You can get API_KEY on that page.

TODO

About

Run llama.cpp finetuning on RunPod

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.5%
  • Dockerfile 15.5%