* Switch over to clang for deepseek on windows The patch for deepseek requires clang on windows. gcc on windows has a buggy c++ library and can't handle the unicode characters * Fail fast with wrong compiler on windows Avoid users mistakenly building with GCC when we need clang
13 KiB
Development
Important
The
llm
package that loads and runs models is being updated to use a new Go runner: this should only impact a small set of PRs however it does change how the project is built.
Install required tools:
- cmake version 3.24 or higher
- go version 1.22 or higher
- gcc version 11.4.0 or higher
MacOS
brew install go cmake gcc
Optionally enable debugging and more verbose logging:
# At build time
export CGO_CFLAGS="-g"
# At runtime
export OLLAMA_DEBUG=1
Get the required libraries and build the native LLM code:
go generate ./...
Then build ollama:
go build .
Now you can run ollama
:
./ollama
Linux
Linux CUDA (NVIDIA)
Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!
Install cmake
and golang
as well as NVIDIA CUDA
development and runtime packages.
Typically the build scripts will auto-detect CUDA, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable CUDA_LIB_DIR
to the location of the shared
libraries, and CUDACXX
to the location of the nvcc compiler. You can customize
a set of target CUDA architectures by setting CMAKE_CUDA_ARCHITECTURES
(e.g. "50;60;70")
Then generate dependencies:
go generate ./...
Then build the binary:
go build .
Linux ROCm (AMD)
Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!
Install CLBlast and ROCm development packages first, as well as cmake
and golang
.
Typically the build scripts will auto-detect ROCm, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable ROCM_PATH
to the location of the ROCm
install (typically /opt/rocm
), and CLBlast_DIR
to the location of the
CLBlast install (typically /usr/lib/cmake/CLBlast
). You can also customize
the AMD GPU targets by setting AMDGPU_TARGETS (e.g. AMDGPU_TARGETS="gfx1101;gfx1102"
)
go generate ./...
Then build the binary:
go build .
ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the render
group, or run as root.
Advanced CPU Settings
By default, running go generate ./...
will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
load. If you would like to build a CPU-based build customized for your
processor, you can set OLLAMA_CUSTOM_CPU_DEFS
to the llama.cpp flags you would
like to use. For example, to compile an optimized binary for an Intel i9-9880H,
you might use:
OLLAMA_CUSTOM_CPU_DEFS="-DGGML_AVX=on -DGGML_AVX2=on -DGGML_F16C=on -DGGML_FMA=on" go generate ./...
go build .
Containerized Linux Build
If you have Docker available, you can build linux binaries with ./scripts/build_linux.sh
which has the CUDA and ROCm dependencies included. The resulting binary is placed in ./dist
Windows
Note: The Windows build for Ollama is still under development.
First, install required tools:
- MSVC toolchain - C/C++ and cmake as minimal requirements
- Go version 1.22 or higher
- MinGW (pick one variant) with GCC.
- The
ThreadJob
Powershell module:Install-Module -Name ThreadJob -Scope CurrentUser
Then, build the ollama
binary:
$env:CGO_ENABLED="1"
go generate ./...
go build .
Windows CUDA (NVIDIA)
In addition to the common Windows development tools described above, install CUDA after installing MSVC.
Windows ROCm (AMD Radeon)
In addition to the common Windows development tools described above, install AMDs HIP package after installing MSVC.
Lastly, add ninja.exe
included with MSVC to the system path (e.g. C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja
).
Windows arm64
The default Developer PowerShell for VS 2022
may default to x86 which is not what you want. To ensure you get an arm64 development environment, start a plain PowerShell terminal and run:
import-module 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -Arch arm64 -vsinstallpath 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community' -skipautomaticlocation
You can confirm with write-host $env:VSCMD_ARG_TGT_ARCH
Follow the instructions at https://www.msys2.org/wiki/arm64/ to set up an arm64 msys2 environment. Ollama requires gcc and mingw32-make to compile, which is not currently available on Windows arm64, but a gcc compatibility adapter is available via mingw-w64-clang-aarch64-gcc-compat
. At a minimum you will need to install the following:
pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw-w64-clang-aarch64-make make
You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically C:\msys64\clangarm64\bin\
)
Transition to Go runner
The Ollama team is working on moving to a new Go based runner that loads and runs models in a subprocess to replace the previous code under ext_server
. During this transition period, this new Go runner is "opt in" at build time, and requires using a different approach to build.
After the transition to use the Go server exclusively, both make
and go generate
will build the Go runner.
Install required tools:
- go version 1.22 or higher
- gcc version 11.4.0 or higher
MacOS
Optionally enable debugging and more verbose logging:
# At build time
export CGO_CFLAGS="-g"
# At runtime
export OLLAMA_DEBUG=1
Get the required libraries and build the native LLM code: (Adjust the job count based on your number of processors for a faster build)
make -C llama -j 5
Then build ollama:
go build .
Now you can run ollama
:
./ollama
Xcode 15 warnings
If you are using Xcode newer than version 14, you may see a warning during go build
about ld: warning: ignoring duplicate libraries: '-lobjc'
due to Golang issue https://github.com/golang/go/issues/67799 which can be safely ignored. You can suppress the warning with export CGO_LDFLAGS="-Wl,-no_warn_duplicate_libraries"
Linux
Linux CUDA (NVIDIA)
Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!
Install make
, gcc
and golang
as well as NVIDIA CUDA
development and runtime packages.
Typically the build scripts will auto-detect CUDA, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable CUDA_LIB_DIR
to the location of the shared
libraries, and CUDACXX
to the location of the nvcc compiler. You can customize
a set of target CUDA architectures by setting CMAKE_CUDA_ARCHITECTURES
(e.g. "50;60;70")
Then generate dependencies: (Adjust the job count based on your number of processors for a faster build)
make -C llama -j 5
Then build the binary:
go build .
Linux ROCm (AMD)
Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!
Install CLBlast and ROCm development packages first, as well as make
, gcc
, and golang
.
Typically the build scripts will auto-detect ROCm, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable ROCM_PATH
to the location of the ROCm
install (typically /opt/rocm
), and CLBlast_DIR
to the location of the
CLBlast install (typically /usr/lib/cmake/CLBlast
). You can also customize
the AMD GPU targets by setting AMDGPU_TARGETS (e.g. AMDGPU_TARGETS="gfx1101;gfx1102"
)
Then generate dependencies: (Adjust the job count based on your number of processors for a faster build)
make -C llama -j 5
Then build the binary:
go build .
ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the render
group, or run as root.
Advanced CPU Settings
By default, running make
will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
load.
Custom CPU settings are not currently supported in the new Go server build but will be added back after we complete the transition.
Containerized Linux Build
If you have Docker available, you can build linux binaries with OLLAMA_NEW_RUNNERS=1 ./scripts/build_linux.sh
which has the CUDA and ROCm dependencies included. The resulting binary is placed in ./dist
Windows
The following tools are required as a minimal development environment to build CPU inference support.
- Go version 1.22 or higher
- Git
- clang with gcc compat and Make. There are multiple options on how to go about installing these tools on Windows. We have verified the following, but others may work as well:
- MSYS2
- After installing, from an MSYS2 terminal, run
pacman -S mingw-w64-clang-x86_64-gcc-compat mingw-w64-clang-x86_64-clang make
to install the required tools
- After installing, from an MSYS2 terminal, run
- Assuming you used the default install prefix for msys2 above, add
C:\msys64\clang64\bin
andc:\msys64\usr\bin
to your environment variablePATH
where you will perform the build steps below (e.g. system-wide, account-level, powershell, cmd, etc.)
- MSYS2
Note
Due to bugs in the GCC C++ library for unicode support, Ollama requires clang on windows. If the gcc executable in your path is not the clang compatibility wrapper, the build will error.
Then, build the ollama
binary:
$env:CGO_ENABLED="1"
make -C llama -j 8
go build .
GPU Support
The GPU tools require the Microsoft native build tools. To build either CUDA or ROCm, you must first install MSVC via Visual Studio:
- Make sure to select
Desktop development with C++
as a Workload during the Visual Studio install - You must complete the Visual Studio install and run it once BEFORE installing CUDA or ROCm for the tools to properly register
- Add the location of the 64 bit (x64) compiler (
cl.exe
) to yourPATH
- Note: the default Developer Shell may configure the 32 bit (x86) compiler which will lead to build failures. Ollama requires a 64 bit toolchain.
Windows CUDA (NVIDIA)
In addition to the common Windows development tools and MSVC described above:
Windows ROCm (AMD Radeon)
In addition to the common Windows development tools and MSVC described above:
Windows arm64
The default Developer PowerShell for VS 2022
may default to x86 which is not what you want. To ensure you get an arm64 development environment, start a plain PowerShell terminal and run:
import-module 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -Arch arm64 -vsinstallpath 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community' -skipautomaticlocation
You can confirm with write-host $env:VSCMD_ARG_TGT_ARCH
Follow the instructions at https://www.msys2.org/wiki/arm64/ to set up an arm64 msys2 environment. Ollama requires gcc and mingw32-make to compile, which is not currently available on Windows arm64, but a gcc compatibility adapter is available via mingw-w64-clang-aarch64-gcc-compat
. At a minimum you will need to install the following:
pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw-w64-clang-aarch64-make make
You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically C:\msys64\clangarm64\bin\
)