Add docs for GPU selection and nvidia uvm workaround
This commit is contained in:
parent
a5ba0fcf78
commit
d8fdbfd8da
2 changed files with 28 additions and 10 deletions
10
docs/faq.md
10
docs/faq.md
|
@ -228,13 +228,3 @@ To unload the model and free up memory use:
|
|||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
|
||||
```
|
||||
|
||||
## Controlling which GPUs to use
|
||||
|
||||
By default, on Linux and Windows, Ollama will attempt to use Nvidia GPUs, or
|
||||
Radeon GPUs, and will use all the GPUs it can find. You can limit which GPUs
|
||||
will be utilized by setting the environment variable `CUDA_VISIBLE_DEVICES` for
|
||||
NVIDIA cards, or `HIP_VISIBLE_DEVICES` for Radeon GPUs to a comma delimited list
|
||||
of GPU IDs. You can see the list of devices with GPU tools such as `nvidia-smi` or
|
||||
`rocminfo`. You can set to an invalid GPU ID (e.g., "-1") to bypass the GPU and
|
||||
fallback to CPU.
|
||||
|
|
28
docs/gpu.md
28
docs/gpu.md
|
@ -29,6 +29,21 @@ Check your compute compatibility to see if your card is supported:
|
|||
| | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` |
|
||||
|
||||
|
||||
### GPU Selection
|
||||
|
||||
If you have multiple NVIDIA GPUs in your system and want to limit Ollama to use
|
||||
a subset, you can set `CUDA_VISIBLE_DEVICES` to a comma separated list of GPUs.
|
||||
Numeric IDs may be used, however ordering may vary, so UUIDs are more reliable.
|
||||
You can discover the UUID of your GPUs by running `nvidia-smi -L` If you want to
|
||||
ignore the GPUs and force CPU usage, use an invalid GPU ID (e.g., "-1")
|
||||
|
||||
### Laptop Suspend Resume
|
||||
|
||||
On linux, after a suspend/resume cycle, sometimes Ollama will fail to discover
|
||||
your NVIDIA GPU, and fallback to running on the CPU. You can workaround this
|
||||
driver bug by reloading the NVIDIA UVM driver with `sudo rmmod nvidia_uvm &&
|
||||
sudo modprobe nvidia_uvm`
|
||||
|
||||
## AMD Radeon
|
||||
Ollama supports the following AMD GPUs:
|
||||
| Family | Cards and accelerators |
|
||||
|
@ -70,5 +85,18 @@ future release which should increase support for more GPUs.
|
|||
Reach out on [Discord](https://discord.gg/ollama) or file an
|
||||
[issue](https://github.com/ollama/ollama/issues) for additional help.
|
||||
|
||||
### GPU Selection
|
||||
|
||||
If you have multiple AMD GPUs in your system and want to limit Ollama to use a
|
||||
subset, you can set `HIP_VISIBLE_DEVICES` to a comma separated list of GPUs.
|
||||
You can see the list of devices with `rocminfo`. If you want to ignore the GPUs
|
||||
and force CPU usage, use an invalid GPU ID (e.g., "-1")
|
||||
|
||||
### Container Permission
|
||||
|
||||
In some Linux distributions, SELinux can prevent containers from
|
||||
accessing the AMD GPU devices. On the host system you can run
|
||||
`sudo setsebool container_use_devices=1` to allow containers to use devices.
|
||||
|
||||
### Metal (Apple GPUs)
|
||||
Ollama supports GPU acceleration on Apple devices via the Metal API.
|
||||
|
|
Loading…
Reference in a new issue