ollama/examples/flyio
Michael Yang 3f0ed03856
Update examples/flyio/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:25:01 -07:00
..
.gitignore fly example 2023-11-01 14:58:20 -07:00
README.md Update examples/flyio/README.md 2024-05-07 09:25:01 -07:00

Deploy Ollama to Fly.io

Note: this example exposes a public endpoint and does not configure authentication. Use with care.

Prerequisites

Steps

  1. Login to Fly.io

    fly auth login
    
  2. Create a new Fly app

    fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
    
  3. Pull and run orca-mini:3b

    OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
    

shared-cpu-8x is a free-tier eligible machine type. For better performance, switch to a performance or dedicated machine type or attach a GPU for hardware acceleration (see below).

(Optional) Persistent Volume

By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:

  1. Create the Fly Volume

    fly volume create ollama
    
  2. Update fly.toml and add [mounts]

    [mounts]
      source = "ollama"
      destination = "/mnt/ollama/models"
    
  3. Update fly.toml and add [env]

    [env]
      OLLAMA_MODELS = "/mnt/ollama/models"
    
  4. Deploy your app

    fly deploy
    

(Optional) Hardware Acceleration

Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu

Once you've been accepted, create the app with the additional flags --vm-gpu-kind a100-pcie-40gb or --vm-gpu-kind a100-pcie-80gb.