Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
1.6 KiB
Deploy Ollama to Fly.io
Note: this example exposes a public endpoint and does not configure authentication. Use with care.
Prerequisites
- Ollama: https://ollama.com/download
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up
Steps
-
Login to Fly.io
fly auth login
-
Create a new Fly app
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
-
Pull and run
orca-mini:3b
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
shared-cpu-8x
is a free-tier eligible machine type. For better performance, switch to a performance
or dedicated
machine type or attach a GPU for hardware acceleration (see below).
(Optional) Persistent Volume
By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:
-
Create the Fly Volume
fly volume create ollama
-
Update
fly.toml
and add[mounts]
[mounts] source = "ollama" destination = "/mnt/ollama/models"
-
Update
fly.toml
and add[env]
[env] OLLAMA_MODELS = "/mnt/ollama/models"
-
Deploy your app
fly deploy
(Optional) Hardware Acceleration
Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu
Once you've been accepted, create the app with the additional flags --vm-gpu-kind a100-pcie-40gb
or --vm-gpu-kind a100-pcie-80gb
.