diff --git a/Dockerfile b/Dockerfile index 8e0fd010..59cca725 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,5 +1,6 @@ ARG GOLANG_VERSION=1.22.1 ARG CMAKE_VERSION=3.22.1 +# this CUDA_VERSION corresponds with the one specified in docs/gpu.md ARG CUDA_VERSION=11.3.1 ARG ROCM_VERSION=6.0 diff --git a/docs/faq.md b/docs/faq.md index 7e891eac..42738a14 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -14,6 +14,10 @@ curl -fsSL https://ollama.com/install.sh | sh Review the [Troubleshooting](./troubleshooting.md) docs for more about using logs. +## Is my GPU compatible with Ollama? + +Please refer to the [GPU docs](./gpu.md). + ## How can I specify the context window size? By default, Ollama uses a context window size of 2048 tokens. diff --git a/docs/gpu.md b/docs/gpu.md new file mode 100644 index 00000000..65124e5a --- /dev/null +++ b/docs/gpu.md @@ -0,0 +1,74 @@ +# GPU +## Nvidia +Ollama supports Nvidia GPUs with compute capability 5.0+. + +Check your compute compatibility to see if your card is supported: +[https://developer.nvidia.com/cuda-gpus](https://developer.nvidia.com/cuda-gpus) + +| Compute Capability | Family | Cards | +| ------------------ | ------------------- | ----------------------------------------------------------------------------------------------------------- | +| 9.0 | NVIDIA | `H100` | +| 8.9 | GeForce RTX 40xx | `RTX 4090` `RTX 4080` `RTX 4070 Ti` `RTX 4060 Ti` | +| | NVIDIA Professional | `L4` `L40` `RTX 6000` | +| 8.6 | GeForce RTX 30xx | `RTX 3090 Ti` `RTX 3090` `RTX 3080 Ti` `RTX 3080` `RTX 3070 Ti` `RTX 3070` `RTX 3060 Ti` `RTX 3060` | +| | NVIDIA Professional | `A40` `RTX A6000` `RTX A5000` `RTX A4000` `RTX A3000` `RTX A2000` `A10` `A16` `A2` | +| 8.0 | NVIDIA | `A100` `A30` | +| 7.5 | GeForce GTX/RTX | `GTX 1650 Ti` `TITAN RTX` `RTX 2080 Ti` `RTX 2080` `RTX 2070` `RTX 2060` | +| | NVIDIA Professional | `T4` `RTX 5000` `RTX 4000` `RTX 3000` `T2000` `T1200` `T1000` `T600` `T500` | +| | Quadro | `RTX 8000` `RTX 6000` `RTX 5000` `RTX 4000` | +| 7.0 | NVIDIA | `TITAN V` `V100` `Quadro GV100` | +| 6.1 | NVIDIA TITAN | `TITAN Xp` `TITAN X` | +| | GeForce GTX | `GTX 1080 Ti` `GTX 1080` `GTX 1070 Ti` `GTX 1070` `GTX 1060` `GTX 1050` | +| | Quadro | `P6000` `P5200` `P4200` `P3200` `P5000` `P4000` `P3000` `P2200` `P2000` `P1000` `P620` `P600` `P500` `P520` | +| | Tesla | `P40` `P4` | +| 6.0 | NVIDIA | `Tesla P100` `Quadro GP100` | +| 5.2 | GeForce GTX | `GTX TITAN X` `GTX 980 Ti` `GTX 980` `GTX 970` `GTX 960` `GTX 950` | +| | Quadro | `M6000 24GB` `M6000` `M5000` `M5500M` `M4000` `M2200` `M2000` `M620` | +| | Tesla | `M60` `M40` | +| 5.0 | GeForce GTX | `GTX 750 Ti` `GTX 750` `NVS 810` | +| | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` | + + +## AMD Radeon +Ollama supports the following AMD GPUs: +| Family | Cards and accelerators | +| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | +| AMD Radeon RX | `7900 XTX` `7900 XT` `7900 GRE` `7800 XT` `7700 XT` `7600 XT` `7600` `6950 XT` `6900 XTX` `6900XT` `6800 XT` `6800` `Vega 64` `Vega 56` | +| AMD Radeon PRO | `W7900` `W7800` `W7700` `W7600` `W7500` `W6900X` `W6800X Duo` `W6800X` `W6800` `V620` `V420` `V340` `V320` `Vega II Duo` `Vega II` `VII` `SSG` | +| AMD Instinct | `MI300X` `MI300A` `MI300` `MI250X` `MI250` `MI210` `MI200` `MI100` `MI60` `MI50` | + +### Overrides +Ollama leverages the AMD ROCm library, which does not support all AMD GPUs. In +some cases you can force the system to try to use a similar LLVM target that is +close. For example The Radeon RX 5400 is `gfx1034` (also known as 10.3.4) +however, ROCm does not currently support this target. The closest support is +`gfx1030`. You can use the environment variable `HSA_OVERRIDE_GFX_VERSION` with +`x.y.z` syntax. So for example, to force the system to run on the RX 5400, you +would set `HSA_OVERRIDE_GFX_VERSION="10.3.0"` as an environment variable for the +server. If you have an unsupported AMD GPU you can experiment using the list of +supported types below. + +At this time, the known supported GPU types are the following LLVM Targets. +This table shows some example GPUs that map to these LLVM targets: +| **LLVM Target** | **An Example GPU** | +|-----------------|---------------------| +| gfx900 | Radeon RX Vega 56 | +| gfx906 | Radeon Instinct MI50 | +| gfx908 | Radeon Instinct MI100 | +| gfx90a | Radeon Instinct MI210 | +| gfx940 | Radeon Instinct MI300 | +| gfx941 | | +| gfx942 | | +| gfx1030 | Radeon PRO V620 | +| gfx1100 | Radeon PRO W7900 | +| gfx1101 | Radeon PRO W7700 | +| gfx1102 | Radeon RX 7600 | + +AMD is working on enhancing ROCm v6 to broaden support for families of GPUs in a +future release which should increase support for more GPUs. + +Reach out on [Discord](https://discord.gg/ollama) or file an +[issue](https://github.com/ollama/ollama/issues) for additional help. + +### Metal (Apple GPUs) +Ollama supports GPU acceleration on Apple devices via the Metal API. diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 983613fe..7103be4d 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -67,40 +67,6 @@ You can see what features your CPU has with the following. cat /proc/cpuinfo| grep flags | head -1 ``` -## AMD Radeon GPU Support - -Ollama leverages the AMD ROCm library, which does not support all AMD GPUs. In -some cases you can force the system to try to use a similar LLVM target that is -close. For example The Radeon RX 5400 is `gfx1034` (also known as 10.3.4) -however, ROCm does not currently support this target. The closest support is -`gfx1030`. You can use the environment variable `HSA_OVERRIDE_GFX_VERSION` with -`x.y.z` syntax. So for example, to force the system to run on the RX 5400, you -would set `HSA_OVERRIDE_GFX_VERSION="10.3.0"` as an environment variable for the -server. If you have an unsupported AMD GPU you can experiment using the list of -supported types below. - -At this time, the known supported GPU types are the following LLVM Targets. -This table shows some example GPUs that map to these LLVM targets: -| **LLVM Target** | **An Example GPU** | -|-----------------|---------------------| -| gfx900 | Radeon RX Vega 56 | -| gfx906 | Radeon Instinct MI50 | -| gfx908 | Radeon Instinct MI100 | -| gfx90a | Radeon Instinct MI210 | -| gfx940 | Radeon Instinct MI300 | -| gfx941 | | -| gfx942 | | -| gfx1030 | Radeon PRO V620 | -| gfx1100 | Radeon PRO W7900 | -| gfx1101 | Radeon PRO W7700 | -| gfx1102 | Radeon RX 7600 | - -AMD is working on enhancing ROCm v6 to broaden support for families of GPUs in a -future release which should increase support for more GPUs. - -Reach out on [Discord](https://discord.gg/ollama) or file an -[issue](https://github.com/ollama/ollama/issues) for additional help. - ## Installing older or pre-release versions on Linux If you run into problems on Linux and want to install an older version, or you'd