# Deploy Ollama to Fly.io > Note: this example exposes a public endpoint and does not configure authentication. Use with care. ## Prerequisites - Ollama: https://ollama.com/download - Fly.io account. Sign up for a free account: https://fly.io/app/sign-up ## Steps 1. Login to Fly.io ```bash fly auth login ``` 1. Create a new Fly app ```bash fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now ``` 1. Pull and run `orca-mini:3b` ```bash OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b ``` `shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below). ## (Optional) Persistent Volume By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models: 1. Create the Fly Volume ```bash fly volume create ollama ``` 1. Update `fly.toml` and add `[mounts]` ```toml [mounts] source = "ollama" destination = "/mnt/ollama/models" ``` 1. Update `fly.toml` and add `[env]` ```toml [env] OLLAMA_MODELS = "/mnt/ollama/models" ``` 1. Deploy your app ```bash fly deploy ``` ## (Optional) Hardware Acceleration Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.