Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

2024-05-07 09:25:01 -07:00

1.6 KiB

Raw Blame History

Deploy Ollama to Fly.io

Note: this example exposes a public endpoint and does not configure authentication. Use with care.

Prerequisites

Ollama: https://ollama.com/download
Fly.io account. Sign up for a free account: https://fly.io/app/sign-up

Steps

Create a new Fly app

fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now

Pull and run orca-mini:3b

OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b

shared-cpu-8x is a free-tier eligible machine type. For better performance, switch to a performance or dedicated machine type or attach a GPU for hardware acceleration (see below).

(Optional) Persistent Volume

By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:

Create the Fly Volume
```
fly volume create ollama
```

Update fly.toml and add [mounts]

[mounts]
  source = "ollama"
  destination = "/mnt/ollama/models"

Update fly.toml and add [env]

[env]
  OLLAMA_MODELS = "/mnt/ollama/models"

Deploy your app
```
fly deploy
```

(Optional) Hardware Acceleration

Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu

Once you've been accepted, create the app with the additional flags --vm-gpu-kind a100-pcie-40gb or --vm-gpu-kind a100-pcie-80gb.

1.6 KiB Raw Blame History