ollama/docs/development.md

# Development

- Install cmake or (optionally, required tools for GPUs)
- run `go generate ./...`
- run `go build .`

Install required tools:

- cmake version 3.24 or higher
- go version 1.20 or higher
- gcc version 11.4.0 or higher

```bash
brew install go cmake gcc
```

Optionally enable debugging and more verbose logging:

```bash
export CGO_CFLAGS="-g"
```

Get the required libraries and build the native LLM code:

```bash
go generate ./...
```

Then build ollama:

```bash
go build .
```

Now you can run `ollama`:

```bash
./ollama
```

### Linux

#### Linux CUDA (NVIDIA)

*Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!*

Install `cmake` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) development and runtime packages.
Then generate dependencies:

```
go generate ./...
```

Then build the binary:

```
go build .
```

#### Linux ROCm (AMD)

*Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!*

Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) developement packages first, as well as `cmake` and `golang`.
Adjust the paths below (correct for Arch) as appropriate for your distributions install locations and generate dependencies:

```
CLBlast_DIR=/usr/lib/cmake/CLBlast ROCM_PATH=/opt/rocm go generate ./...
```

Then build the binary:

```
go build .
```

ROCm requires elevated privileges to access the GPU at runtime.  On most distros you can add your user account to the `render` group, or run as root.

#### Advanced CPU Settings

By default, running `go generate ./...` will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly.  At runtime, Ollama will auto-detect the optimal variation to
load.  If you would like to build a CPU-based build customized for your
processor, you can set `OLLAMA_CUSTOM_CPU_DEFS` to the llama.cpp flags you would
like to use.  For example, to compile an optimized binary for an Intel i9-9880H,
you might use:

```
OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_F16C=on -DLLAMA_FMA=on" go generate ./...
go build .
```

#### Containerized Linux Build

If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included.  The resulting binary is placed in `./dist`


### Windows

Note: The windows build for Ollama is still under development.

Install required tools:

- MSVC toolchain - C/C++ and cmake as minimal requirements
- go version 1.20 or higher
- MinGW (pick one variant) with GCC.
  - <https://www.mingw-w64.org/>
  - <https://www.msys2.org/>

```powershell
$env:CGO_ENABLED="1"

go generate ./...

go build .
```

#### Windows CUDA (NVIDIA)

In addition to the common Windows development tools described above, install:

- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)
add development doc 2023-06-27 17:46:46 +00:00			`# Development`

subprocess llama.cpp server (#401) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm 2023-08-30 20:35:03 +00:00			`- Install cmake or (optionally, required tools for GPUs)`
			- run `go generate ./...`
			- run `go build .`

add publish script 2023-07-07 16:59:24 +00:00			`Install required tools:`
add development doc 2023-06-27 17:46:46 +00:00
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00			`- cmake version 3.24 or higher`
			`- go version 1.20 or higher`
			`- gcc version 11.4.0 or higher`

add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
subprocess llama.cpp server (#401) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm 2023-08-30 20:35:03 +00:00			`brew install go cmake gcc`
add development doc 2023-06-27 17:46:46 +00:00			```

Quiet down llama.cpp logging by default By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g` 2023-12-22 16:47:18 +00:00			`Optionally enable debugging and more verbose logging:`

			```bash
			`export CGO_CFLAGS="-g"`
			```

			`Get the required libraries and build the native LLM code:`
Note that CGO must be enabled in dev docs 2023-07-21 20:36:36 +00:00
add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
subprocess llama.cpp server (#401) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm 2023-08-30 20:35:03 +00:00			`go generate ./...`
Note that CGO must be enabled in dev docs 2023-07-21 20:36:36 +00:00			```

Some simple modelfile examples Signed-off-by: Matt Williams <m@technovangelist.com> 2023-07-18 00:16:59 +00:00			`Then build ollama:`
add development doc 2023-06-27 17:46:46 +00:00
add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
Some simple modelfile examples Signed-off-by: Matt Williams <m@technovangelist.com> 2023-07-18 00:16:59 +00:00			`go build .`
add development doc 2023-06-27 17:46:46 +00:00			```

add publish script 2023-07-07 16:59:24 +00:00			Now you can run `ollama`:
add development doc 2023-06-27 17:46:46 +00:00
add some missing code directives in docs (#664) 2023-10-01 18:51:01 +00:00			```bash
add publish script 2023-07-07 16:59:24 +00:00			`./ollama`
add development doc 2023-06-27 17:46:46 +00:00			```
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`### Linux`
first pass at linux gpu support (#454) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me> 2023-09-12 15:04:35 +00:00
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`#### Linux CUDA (NVIDIA)`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
			`Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!`

			Install `cmake` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) development and runtime packages.
			`Then generate dependencies:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```
			`go generate ./...`
			```
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			`Then build the binary:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```
			`go build .`
			```

Add windows native build instructions 2023-12-24 17:02:18 +00:00			`#### Linux ROCm (AMD)`

Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			`Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!`

			Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) developement packages first, as well as `cmake` and `golang`.
			`Adjust the paths below (correct for Arch) as appropriate for your distributions install locations and generate dependencies:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```
			`CLBlast_DIR=/usr/lib/cmake/CLBlast ROCM_PATH=/opt/rocm go generate ./...`
			```
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			`Then build the binary:`
Add windows native build instructions 2023-12-24 17:02:18 +00:00
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00			```
			`go build .`
			```

			ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.

Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available 2024-01-07 23:48:05 +00:00			`#### Advanced CPU Settings`

			By default, running `go generate ./...` will compile a few different variations
			`of the LLM library based on common CPU families and vector math capabilities,`
			`including a lowest-common-denominator which should run on almost any 64 bit CPU`
			`somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to`
			`load. If you would like to build a CPU-based build customized for your`
			processor, you can set `OLLAMA_CUSTOM_CPU_DEFS` to the llama.cpp flags you would
			`like to use. For example, to compile an optimized binary for an Intel i9-9880H,`
			`you might use:`

			```
			`OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_F16C=on -DLLAMA_FMA=on" go generate ./...`
			`go build .`
			```

Add windows native build instructions 2023-12-24 17:02:18 +00:00			`#### Containerized Linux Build`

			If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting binary is placed in `./dist`


			`### Windows`

			`Note: The windows build for Ollama is still under development.`

			`Install required tools:`

			`- MSVC toolchain - C/C++ and cmake as minimal requirements`
			`- go version 1.20 or higher`
			`- MinGW (pick one variant) with GCC.`
			`- <https://www.mingw-w64.org/>`
			`- <https://www.msys2.org/>`

			```powershell
			`$env:CGO_ENABLED="1"`

			`go generate ./...`

			`go build .`
			```

			`#### Windows CUDA (NVIDIA)`

			`In addition to the common Windows development tools described above, install:`
Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version 2023-12-14 01:26:47 +00:00
Add windows native build instructions 2023-12-24 17:02:18 +00:00			`- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)`