ollama/docs/development.md

# Development

Install required tools:

- cmake version 3.24 or higher
- go version 1.22 or higher
- gcc version 11.4.0 or higher

### MacOS

```bash
brew install go cmake gcc
```

Optionally enable debugging and more verbose logging:

```bash
# At build time
export CGO_CFLAGS="-g"

# At runtime
export OLLAMA_DEBUG=1
```

Get the required libraries and build the native LLM code:

```bash
go generate ./...
```

Then build ollama:

```bash
go build .
```

Now you can run `ollama`:

```bash
./ollama
```

### Linux

#### Linux CUDA (NVIDIA)

_Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_

Install `cmake` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
development and runtime packages.

Typically the build scripts will auto-detect CUDA, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `CUDA_LIB_DIR` to the location of the shared
libraries, and `CUDACXX` to the location of the nvcc compiler. You can customize
a set of target CUDA architectures by setting `CMAKE_CUDA_ARCHITECTURES` (e.g. "50;60;70")

Then generate dependencies:

```
go generate ./...
```

Then build the binary:

```
go build .
```

#### Linux ROCm (AMD)

_Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_

Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `cmake` and `golang`.

Typically the build scripts will auto-detect ROCm, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `ROCM_PATH` to the location of the ROCm
install (typically `/opt/rocm`), and `CLBlast_DIR` to the location of the
CLBlast install (typically `/usr/lib/cmake/CLBlast`). You can also customize
the AMD GPU targets by setting AMDGPU_TARGETS (e.g. `AMDGPU_TARGETS="gfx1101;gfx1102"`)

```
go generate ./...
```

Then build the binary:

```
go build .
```

ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.

#### Advanced CPU Settings

By default, running `go generate ./...` will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
load. If you would like to build a CPU-based build customized for your
processor, you can set `OLLAMA_CUSTOM_CPU_DEFS` to the llama.cpp flags you would
like to use. For example, to compile an optimized binary for an Intel i9-9880H,
you might use:

```
OLLAMA_CUSTOM_CPU_DEFS="-DGGML_AVX=on -DGGML_AVX2=on -DGGML_F16C=on -DGGML_FMA=on" go generate ./...
go build .
```

#### Containerized Linux Build

If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting binary is placed in `./dist`

### Windows

Note: The Windows build for Ollama is still under development.

First, install required tools:

- MSVC toolchain - C/C++ and cmake as minimal requirements
- Go version 1.22 or higher
- MinGW (pick one variant) with GCC.
  - [MinGW-w64](https://www.mingw-w64.org/)
  - [MSYS2](https://www.msys2.org/)
- The `ThreadJob` Powershell module: `Install-Module -Name ThreadJob -Scope CurrentUser`

Then, build the `ollama` binary:

```powershell
$env:CGO_ENABLED="1"
go generate ./...
go build .
```

#### Windows CUDA (NVIDIA)

In addition to the common Windows development tools described above, install CUDA after installing MSVC.

- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)


#### Windows ROCm (AMD Radeon)

In addition to the common Windows development tools described above, install AMDs HIP package after installing MSVC.

- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
- [Strawberry Perl](https://strawberryperl.com/)

Lastly, add `ninja.exe` included with MSVC to the system path (e.g. `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja`).

#### Windows arm64

The default `Developer PowerShell for VS 2022` may default to x86 which is not what you want.  To ensure you get an arm64 development environment, start a plain PowerShell terminal and run:

```powershell
import-module 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -Arch arm64 -vsinstallpath 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community' -skipautomaticlocation
```

You can confirm with `write-host $env:VSCMD_ARG_TGT_ARCH`

Follow the instructions at https://www.msys2.org/wiki/arm64/ to set up an arm64 msys2 environment.  Ollama requires gcc and mingw32-make to compile, which is not currently available on Windows arm64, but a gcc compatibility adapter is available via `mingw-w64-clang-aarch64-gcc-compat`. At a minimum you will need to install the following:

```
pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw-w64-clang-aarch64-make make
```

You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically `C:\msys64\clangarm64\bin\`)
add development doc 2023-06-27 17:46:46 +00:00			`# Development`

add publish script 2023-07-07 16:59:24 +00:00			`Install required tools:`
add development doc 2023-06-27 17:46:46 +00:00
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00			`- cmake version 3.24 or higher`
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			`- go version 1.22 or higher`
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00			`- gcc version 11.4.0 or higher`

Tidy up developer guide a little 2024-05-23 21:24:07 +00:00			`### MacOS`

add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
subprocess llama.cpp server (#401) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm 2023-08-30 20:35:03 +00:00			`brew install go cmake gcc`
add development doc 2023-06-27 17:46:46 +00:00			```

Quiet down llama.cpp logging by default By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g` 2023-12-22 16:47:18 +00:00			`Optionally enable debugging and more verbose logging:`

			```bash
Mechanical switch from log to slog A few obvious levels were adjusted, but generally everything mapped to "info" level. 2024-01-18 18:52:01 +00:00			`# At build time`
Quiet down llama.cpp logging by default By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g` 2023-12-22 16:47:18 +00:00			`export CGO_CFLAGS="-g"`
Mechanical switch from log to slog A few obvious levels were adjusted, but generally everything mapped to "info" level. 2024-01-18 18:52:01 +00:00
			`# At runtime`
			`export OLLAMA_DEBUG=1`
Quiet down llama.cpp logging by default By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g` 2023-12-22 16:47:18 +00:00			```

			`Get the required libraries and build the native LLM code:`
Note that CGO must be enabled in dev docs 2023-07-21 20:36:36 +00:00
add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`go generate ./...`
Note that CGO must be enabled in dev docs 2023-07-21 20:36:36 +00:00			```

Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`Then build ollama:`
add development doc 2023-06-27 17:46:46 +00:00
add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`go build .`
add development doc 2023-06-27 17:46:46 +00:00			```

Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			Now you can run `ollama`:
add development doc 2023-06-27 17:46:46 +00:00
add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`./ollama`
add development doc 2023-06-27 17:46:46 +00:00			```
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`### Linux`
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`#### Linux CUDA (NVIDIA)`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			`_Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
Refine the linux cuda/rocm developer docs 2024-01-18 17:44:44 +00:00			Install `cmake` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			`development and runtime packages.`
Refine the linux cuda/rocm developer docs 2024-01-18 17:44:44 +00:00
			`Typically the build scripts will auto-detect CUDA, however, if your Linux distro`
			`or installation approach uses unusual paths, you can specify the location by`
			specifying an environment variable `CUDA_LIB_DIR` to the location of the shared
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			libraries, and `CUDACXX` to the location of the nvcc compiler. You can customize
chore: fix typo in docs/development.md (#4073) 2024-05-01 19:39:11 +00:00			a set of target CUDA architectures by setting `CMAKE_CUDA_ARCHITECTURES` (e.g. "50;60;70")
Refine the linux cuda/rocm developer docs 2024-01-18 17:44:44 +00:00
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`Then generate dependencies:`

			```
			`go generate ./...`
			```

Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			`Then build the binary:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`go build .`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```

Add windows native build instructions 2023-12-24 17:02:18 +00:00			`#### Linux ROCm (AMD)`

update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			`_Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
Fix ROCm link in `development.md` 2024-03-25 20:32:44 +00:00			Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `cmake` and `golang`.
Refine the linux cuda/rocm developer docs 2024-01-18 17:44:44 +00:00
			`Typically the build scripts will auto-detect ROCm, however, if your Linux distro`
			`or installation approach uses unusual paths, you can specify the location by`
			specifying an environment variable `ROCM_PATH` to the location of the ROCm
			install (typically `/opt/rocm`), and `CLBlast_DIR` to the location of the
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			CLBlast install (typically `/usr/lib/cmake/CLBlast`). You can also customize
Make CPU builds parallel and customizable AMD GPUs The linux build now support parallel CPU builds to speed things up. This also exposes AMD GPU targets as an optional setting for advaced users who want to alter our default set. 2024-01-21 20:57:13 +00:00			the AMD GPU targets by setting AMDGPU_TARGETS (e.g. `AMDGPU_TARGETS="gfx1101;gfx1102"`)
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			```
			`go generate ./...`
			```

Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			`Then build the binary:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`go build .`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```

update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available 2024-01-07 23:48:05 +00:00			`#### Advanced CPU Settings`

Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			By default, running `go generate ./...` will compile a few different variations
Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available 2024-01-07 23:48:05 +00:00			`of the LLM library based on common CPU families and vector math capabilities,`
			`including a lowest-common-denominator which should run on almost any 64 bit CPU`
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			`somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to`
			`load. If you would like to build a CPU-based build customized for your`
Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available 2024-01-07 23:48:05 +00:00			processor, you can set `OLLAMA_CUSTOM_CPU_DEFS` to the llama.cpp flags you would
update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			`like to use. For example, to compile an optimized binary for an Intel i9-9880H,`
Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available 2024-01-07 23:48:05 +00:00			`you might use:`

			```
update llama.cpp submodule to `d7fd29f` (#5475) 2024-07-05 17:25:58 +00:00			`OLLAMA_CUSTOM_CPU_DEFS="-DGGML_AVX=on -DGGML_AVX2=on -DGGML_F16C=on -DGGML_FMA=on" go generate ./...`
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`go build .`
Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available 2024-01-07 23:48:05 +00:00			```

Add windows native build instructions 2023-12-24 17:02:18 +00:00			`#### Containerized Linux Build`

update go to 1.22 in other places (#2975) 2024-03-07 15:39:49 +00:00			If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting binary is placed in `./dist`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
			`### Windows`

docs: add missing powershell package to windows development instructions (#5075) * docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md 2024-06-16 03:08:09 +00:00			`Note: The Windows build for Ollama is still under development.`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
docs: add missing powershell package to windows development instructions (#5075) * docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md 2024-06-16 03:08:09 +00:00			`First, install required tools:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350) 2024-03-26 20:23:16 +00:00			`- MSVC toolchain - C/C++ and cmake as minimal requirements`
			`- Go version 1.22 or higher`
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`- MinGW (pick one variant) with GCC.`
remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350) 2024-03-26 20:23:16 +00:00			`- [MinGW-w64](https://www.mingw-w64.org/)`
			`- [MSYS2](https://www.msys2.org/)`
docs: add missing powershell package to windows development instructions (#5075) * docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md 2024-06-16 03:08:09 +00:00			- The `ThreadJob` Powershell module: `Install-Module -Name ThreadJob -Scope CurrentUser`

			Then, build the `ollama` binary:
Add windows native build instructions 2023-12-24 17:02:18 +00:00
			```powershell
			`$env:CGO_ENABLED="1"`
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 22:57:45 +00:00			`go generate ./...`
			`go build .`
Add windows native build instructions 2023-12-24 17:02:18 +00:00			```

			`#### Windows CUDA (NVIDIA)`

remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350) 2024-03-26 20:23:16 +00:00			`In addition to the common Windows development tools described above, install CUDA after installing MSVC.`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)`
Doc how to set up ROCm builds on windows 2024-03-09 19:29:45 +00:00

			`#### Windows ROCm (AMD Radeon)`

remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350) 2024-03-26 20:23:16 +00:00			`In addition to the common Windows development tools described above, install AMDs HIP package after installing MSVC.`
Doc how to set up ROCm builds on windows 2024-03-09 19:29:45 +00:00
Fix ROCm link in `development.md` 2024-03-25 20:32:44 +00:00			`- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)`
remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350) 2024-03-26 20:23:16 +00:00			`- [Strawberry Perl](https://strawberryperl.com/)`

chore: fix typo in docs/development.md (#4073) 2024-05-01 19:39:11 +00:00			Lastly, add `ninja.exe` included with MSVC to the system path (e.g. `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja`).
Add Windows arm64 support to official builds (#5712) * Unified arm/x86 windows installer This adjusts the installer payloads to be architecture aware so we can cary both amd64 and arm64 binaries in the installer, and install only the applicable architecture at install time. * Include arm64 in official windows build * Harden schedule test for slow windows timers This test seems to be a bit flaky on windows, so give it more time to converge 2024-09-20 20:09:38 +00:00
			`#### Windows arm64`

			The default `Developer PowerShell for VS 2022` may default to x86 which is not what you want. To ensure you get an arm64 development environment, start a plain PowerShell terminal and run:

			```powershell
			`import-module 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll'`
			`Enter-VsDevShell -Arch arm64 -vsinstallpath 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community' -skipautomaticlocation`
			```

			You can confirm with `write-host $env:VSCMD_ARG_TGT_ARCH`

			Follow the instructions at https://www.msys2.org/wiki/arm64/ to set up an arm64 msys2 environment. Ollama requires gcc and mingw32-make to compile, which is not currently available on Windows arm64, but a gcc compatibility adapter is available via `mingw-w64-clang-aarch64-gcc-compat`. At a minimum you will need to install the following:

			```
			`pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw-w64-clang-aarch64-make make`
			```

			You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically `C:\msys64\clangarm64\bin\`)