ollama/examples/flyio/README.md

# Deploy Ollama to Fly.io

> Note: this example exposes a public endpoint and does not configure authentication. Use with care.

## Prerequisites

- Ollama: https://ollama.com/download
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up

## Steps

1. Login to Fly.io

    ```bash
    fly auth login
    ```

1. Create a new Fly app

    ```bash
    fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
    ```

1. Pull and run `orca-mini:3b`

    ```bash
    OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
    ```

`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below).

## (Optional) Persistent Volume

By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:

1. Create the Fly Volume

    ```bash
    fly volume create ollama
    ```

1. Update `fly.toml` and add `[mounts]`

    ```toml
    [mounts]
      source = "ollama"
      destination = "/mnt/ollama/models"
    ```

1. Update `fly.toml` and add `[env]`

    ```toml
    [env]
      OLLAMA_MODELS = "/mnt/ollama/models"
    ```

1. Deploy your app

    ```bash
    fly deploy
    ```

## (Optional) Hardware Acceleration

Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu

Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.
fly example 2023-10-30 18:38:09 +00:00			`# Deploy Ollama to Fly.io`

			`> Note: this example exposes a public endpoint and does not configure authentication. Use with care.`

			`## Prerequisites`

Update examples/flyio/README.md Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> 2024-05-07 16:25:01 +00:00			`- Ollama: https://ollama.com/download`
fly example 2023-10-30 18:38:09 +00:00			`- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up`

			`## Steps`

			`1. Login to Fly.io`

			```bash
			`fly auth login`
			```

			`1. Create a new Fly app`

			```bash
			`fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now`
			```

			1. Pull and run `orca-mini:3b`

			```bash
			`OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b`
			```

			`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below).

			`## (Optional) Persistent Volume`

			`By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:`

			`1. Create the Fly Volume`

			```bash
			`fly volume create ollama`
			```

			1. Update `fly.toml` and add `[mounts]`

			```toml
			`[mounts]`
			`source = "ollama"`
			`destination = "/mnt/ollama/models"`
			```

			1. Update `fly.toml` and add `[env]`

			```toml
			`[env]`
			`OLLAMA_MODELS = "/mnt/ollama/models"`
			```

			`1. Deploy your app`

			```bash
			`fly deploy`
			```

			`## (Optional) Hardware Acceleration`

			`Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu`

			Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.