diff --git a/examples/flyio/.gitignore b/examples/flyio/.gitignore new file mode 100644 index 00000000..0501d092 --- /dev/null +++ b/examples/flyio/.gitignore @@ -0,0 +1 @@ +fly.toml diff --git a/examples/flyio/README.md b/examples/flyio/README.md new file mode 100644 index 00000000..9ab8b801 --- /dev/null +++ b/examples/flyio/README.md @@ -0,0 +1,67 @@ +# Deploy Ollama to Fly.io + +> Note: this example exposes a public endpoint and does not configure authentication. Use with care. + +## Prerequisites + +- Ollama: https://ollama.ai/download +- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up + +## Steps + +1. Login to Fly.io + + ```bash + fly auth login + ``` + +1. Create a new Fly app + + ```bash + fly launch --name --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now + ``` + +1. Pull and run `orca-mini:3b` + + ```bash + OLLAMA_HOST=https://.fly.dev ollama run orca-mini:3b + ``` + +`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below). + +## (Optional) Persistent Volume + +By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models: + +1. Create the Fly Volume + + ```bash + fly volume create ollama + ``` + +1. Update `fly.toml` and add `[mounts]` + + ```toml + [mounts] + source = "ollama" + destination = "/mnt/ollama/models" + ``` + +1. Update `fly.toml` and add `[env]` + + ```toml + [env] + OLLAMA_MODELS = "/mnt/ollama/models" + ``` + +1. Deploy your app + + ```bash + fly deploy + ``` + +## (Optional) Hardware Acceleration + +Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu + +Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.