From 0f9df0229ee3e81ab408606bbfb1f1b14b787143 Mon Sep 17 00:00:00 2001 From: Annie Sexton Date: Wed, 28 Aug 2024 12:02:06 -0700 Subject: [PATCH 1/3] added ollama blueprint --- blueprints/ollama-fly-gpus.html.md | 93 ++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 blueprints/ollama-fly-gpus.html.md diff --git a/blueprints/ollama-fly-gpus.html.md b/blueprints/ollama-fly-gpus.html.md new file mode 100644 index 0000000000..cef22ef595 --- /dev/null +++ b/blueprints/ollama-fly-gpus.html.md @@ -0,0 +1,93 @@ +--- +title: Running Ollama on Fly GPUs +layout: docs +nav: firecracker +--- + +Self-hosting an LLM on Fly.io only takes a few steps. In this blueprint, we'll show you how to deploy an app with Ollama that uses Fly GPUs. This app is configured to scale to zero when not in use. + +## Getting started + +If you don't already have a Fly.io account, do that first (https://fly.io/app/sign-up), and make sure you have the Fly CLI installed (https://fly.io/docs/flyctl/install/). + +First, clone [this repository](https://github.com/fly-apps/self-host-ollama), and feel free to change the `app` name in the `fly.toml`. Then launch it as a new Fly App with: + +```cmd +fly launch --flycast +``` + +The `--flycast` flag will make your application private. To access your application, you'll use the address `http://.flycast` and ensure you're connected over your Wireguard VPN. This will work in both production and local development. More on this later. + +Now that your app is launched, let's download some Ollama models. First, SSH into one of your Fly Machines like so: + +```cmd +fly ssh console +``` + +Next, set the value of your `OLLAMA_HOST`, so our `ollama` commands know what to use: + +```cmd +export OLLAMA_HOST=.flycast +``` + +Finally, pull down the model of your choice: + +```cmd +ollama pull llava +``` + +And you're done! Your Ollama app is now available for use. + +## Connecting to your app + +This Ollama app will exist separate from whatever app you're building. Since our app is private (we don't randos eating up our Fly GPU usage), we'll need to connect to it over a secure Wireguard connection. When developing locally, the easiest method is to run: + +```cmd +fly proxy 11434:80 -a +# you don't need -a if you're in the Ollama app directory +``` + +This command proxies requests from a local port (`11434`) to port `80` on your Ollama Fly Machine, over a secure Wireguard tunnel. + +When using Ollama in your app **locally**, you'll set the host to `http://localhost:11434`. Note that while `11434` is the standard port used by Ollama, since this is just a proxy, that number can really be anything. + +In **production**, you'll use the host `http://.flycast` instead. + +## Examples + +The following code would live in an app separate from your Ollama app; This allows you to auto-start and stop your Ollama app, so you're not paying for GPUs when not in use. + +**JavaScript (npm `ollama` package)** + +```typescript +import { Ollama } from "ollama"; + +const ollama = new Ollama({ + host: process.env.OLLAMA_APP_URL // either http://localhost:11434 or http://.flycast on production +}); + +const response = await ollama.generate({ + model: 'llama3.1', + prompt: "Give me a week's worth of healthy vegetarian meal ideas and their recipes.", + stream: false, +}) +``` + +**JavaScript (basic fetch request)** + +```typescript +const params = { + model: 'llama3.1', + prompt: "Give me a week's worth of healthy vegetarian meal ideas and their recipes.", + stream: false, +}; + +let resp = await fetch("http://sparkling-violet-709.flycast/api/generate", { + method: "POST", + body: JSON.stringify(params), +}); +``` + +## Watch the video + +Check out the accompanying video here: https://youtu.be/xkWcGmbhZRQ \ No newline at end of file From 2e83be97cad841c502bec64eb4d23f9aaa3bc511 Mon Sep 17 00:00:00 2001 From: Annie Sexton Date: Wed, 28 Aug 2024 12:03:56 -0700 Subject: [PATCH 2/3] fixed typo --- blueprints/ollama-fly-gpus.html.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blueprints/ollama-fly-gpus.html.md b/blueprints/ollama-fly-gpus.html.md index cef22ef595..2f1f2b8dfc 100644 --- a/blueprints/ollama-fly-gpus.html.md +++ b/blueprints/ollama-fly-gpus.html.md @@ -16,7 +16,7 @@ First, clone [this repository](https://github.com/fly-apps/self-host-ollama), an fly launch --flycast ``` -The `--flycast` flag will make your application private. To access your application, you'll use the address `http://.flycast` and ensure you're connected over your Wireguard VPN. This will work in both production and local development. More on this later. +The `--flycast` flag will make your application private. To access your application, you'll use the address `http://.flycast` and ensure you're connected over your WireGuard VPN. This will work in both production and local development. More on this later. Now that your app is launched, let's download some Ollama models. First, SSH into one of your Fly Machines like so: @@ -40,14 +40,14 @@ And you're done! Your Ollama app is now available for use. ## Connecting to your app -This Ollama app will exist separate from whatever app you're building. Since our app is private (we don't randos eating up our Fly GPU usage), we'll need to connect to it over a secure Wireguard connection. When developing locally, the easiest method is to run: +This Ollama app will exist separate from whatever app you're building. Since our app is private (we don't randos eating up our Fly GPU usage), we'll need to connect to it over a secure WireGuard connection. When developing locally, the easiest method is to run: ```cmd fly proxy 11434:80 -a # you don't need -a if you're in the Ollama app directory ``` -This command proxies requests from a local port (`11434`) to port `80` on your Ollama Fly Machine, over a secure Wireguard tunnel. +This command proxies requests from a local port (`11434`) to port `80` on your Ollama Fly Machine, over a secure WireGuard tunnel. When using Ollama in your app **locally**, you'll set the host to `http://localhost:11434`. Note that while `11434` is the standard port used by Ollama, since this is just a proxy, that number can really be anything. From e3d499b2807b70cf017201fe3465a07f6c32927d Mon Sep 17 00:00:00 2001 From: Annie Sexton Date: Wed, 28 Aug 2024 12:12:18 -0700 Subject: [PATCH 3/3] fixed typo --- blueprints/ollama-fly-gpus.html.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blueprints/ollama-fly-gpus.html.md b/blueprints/ollama-fly-gpus.html.md index 2f1f2b8dfc..900b626d8e 100644 --- a/blueprints/ollama-fly-gpus.html.md +++ b/blueprints/ollama-fly-gpus.html.md @@ -40,7 +40,7 @@ And you're done! Your Ollama app is now available for use. ## Connecting to your app -This Ollama app will exist separate from whatever app you're building. Since our app is private (we don't randos eating up our Fly GPU usage), we'll need to connect to it over a secure WireGuard connection. When developing locally, the easiest method is to run: +This Ollama app will exist separate from whatever app you're building. Since our app is private (we don't random people eating up our Fly GPU usage), we'll need to connect to it over a secure WireGuard connection. When developing locally, the easiest method is to run: ```cmd fly proxy 11434:80 -a