This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
3.4 KiB
Development
- Install cmake or (optionally, required tools for GPUs)
- run
go generate ./...
- run
go build .
Install required tools:
- cmake version 3.24 or higher
- go version 1.20 or higher
- gcc version 11.4.0 or higher
brew install go cmake gcc
Optionally enable debugging and more verbose logging:
export CGO_CFLAGS="-g"
Get the required libraries and build the native LLM code:
go generate ./...
Then build ollama:
go build .
Now you can run ollama
:
./ollama
Linux
Linux CUDA (NVIDIA)
Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!
Install cmake
and golang
as well as NVIDIA CUDA development and runtime packages.
Then generate dependencies:
go generate ./...
Then build the binary:
go build .
Linux ROCm (AMD)
Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!
Install CLBlast and ROCm developement packages first, as well as cmake
and golang
.
Adjust the paths below (correct for Arch) as appropriate for your distributions install locations and generate dependencies:
CLBlast_DIR=/usr/lib/cmake/CLBlast ROCM_PATH=/opt/rocm go generate ./...
Then build the binary:
go build .
ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the render
group, or run as root.
Advanced CPU Settings
By default, running go generate ./...
will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
load. If you would like to build a CPU-based build customized for your
processor, you can set OLLAMA_CUSTOM_CPU_DEFS
to the llama.cpp flags you would
like to use. For example, to compile an optimized binary for an Intel i9-9880H,
you might use:
OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_F16C=on -DLLAMA_FMA=on" go generate ./...
go build .
Containerized Linux Build
If you have Docker available, you can build linux binaries with ./scripts/build_linux.sh
which has the CUDA and ROCm dependencies included. The resulting binary is placed in ./dist
Windows
Note: The windows build for Ollama is still under development.
Install required tools:
- MSVC toolchain - C/C++ and cmake as minimal requirements
- go version 1.20 or higher
- MinGW (pick one variant) with GCC.
$env:CGO_ENABLED="1"
go generate ./...
go build .
Windows CUDA (NVIDIA)
In addition to the common Windows development tools described above, install: