Merge pull request #3282 from dhiltgen/gpu_docs
Add docs for GPU selection and nvidia uvm workaround
This commit is contained in:
commit
2c390a73ac
2 changed files with 28 additions and 10 deletions
10
docs/faq.md
10
docs/faq.md
|
@ -228,13 +228,3 @@ To unload the model and free up memory use:
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
|
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
|
||||||
```
|
```
|
||||||
|
|
||||||
## Controlling which GPUs to use
|
|
||||||
|
|
||||||
By default, on Linux and Windows, Ollama will attempt to use Nvidia GPUs, or
|
|
||||||
Radeon GPUs, and will use all the GPUs it can find. You can limit which GPUs
|
|
||||||
will be utilized by setting the environment variable `CUDA_VISIBLE_DEVICES` for
|
|
||||||
NVIDIA cards, or `HIP_VISIBLE_DEVICES` for Radeon GPUs to a comma delimited list
|
|
||||||
of GPU IDs. You can see the list of devices with GPU tools such as `nvidia-smi` or
|
|
||||||
`rocminfo`. You can set to an invalid GPU ID (e.g., "-1") to bypass the GPU and
|
|
||||||
fallback to CPU.
|
|
||||||
|
|
28
docs/gpu.md
28
docs/gpu.md
|
@ -29,6 +29,21 @@ Check your compute compatibility to see if your card is supported:
|
||||||
| | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` |
|
| | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` |
|
||||||
|
|
||||||
|
|
||||||
|
### GPU Selection
|
||||||
|
|
||||||
|
If you have multiple NVIDIA GPUs in your system and want to limit Ollama to use
|
||||||
|
a subset, you can set `CUDA_VISIBLE_DEVICES` to a comma separated list of GPUs.
|
||||||
|
Numeric IDs may be used, however ordering may vary, so UUIDs are more reliable.
|
||||||
|
You can discover the UUID of your GPUs by running `nvidia-smi -L` If you want to
|
||||||
|
ignore the GPUs and force CPU usage, use an invalid GPU ID (e.g., "-1")
|
||||||
|
|
||||||
|
### Laptop Suspend Resume
|
||||||
|
|
||||||
|
On linux, after a suspend/resume cycle, sometimes Ollama will fail to discover
|
||||||
|
your NVIDIA GPU, and fallback to running on the CPU. You can workaround this
|
||||||
|
driver bug by reloading the NVIDIA UVM driver with `sudo rmmod nvidia_uvm &&
|
||||||
|
sudo modprobe nvidia_uvm`
|
||||||
|
|
||||||
## AMD Radeon
|
## AMD Radeon
|
||||||
Ollama supports the following AMD GPUs:
|
Ollama supports the following AMD GPUs:
|
||||||
| Family | Cards and accelerators |
|
| Family | Cards and accelerators |
|
||||||
|
@ -70,5 +85,18 @@ future release which should increase support for more GPUs.
|
||||||
Reach out on [Discord](https://discord.gg/ollama) or file an
|
Reach out on [Discord](https://discord.gg/ollama) or file an
|
||||||
[issue](https://github.com/ollama/ollama/issues) for additional help.
|
[issue](https://github.com/ollama/ollama/issues) for additional help.
|
||||||
|
|
||||||
|
### GPU Selection
|
||||||
|
|
||||||
|
If you have multiple AMD GPUs in your system and want to limit Ollama to use a
|
||||||
|
subset, you can set `HIP_VISIBLE_DEVICES` to a comma separated list of GPUs.
|
||||||
|
You can see the list of devices with `rocminfo`. If you want to ignore the GPUs
|
||||||
|
and force CPU usage, use an invalid GPU ID (e.g., "-1")
|
||||||
|
|
||||||
|
### Container Permission
|
||||||
|
|
||||||
|
In some Linux distributions, SELinux can prevent containers from
|
||||||
|
accessing the AMD GPU devices. On the host system you can run
|
||||||
|
`sudo setsebool container_use_devices=1` to allow containers to use devices.
|
||||||
|
|
||||||
### Metal (Apple GPUs)
|
### Metal (Apple GPUs)
|
||||||
Ollama supports GPU acceleration on Apple devices via the Metal API.
|
Ollama supports GPU acceleration on Apple devices via the Metal API.
|
||||||
|
|
Loading…
Reference in a new issue