From 6606e4243c481eb5bcb552c95fa50ee5aa594f3b Mon Sep 17 00:00:00 2001 From: Daniel Hiltgen Date: Tue, 12 Nov 2024 09:12:50 -0800 Subject: [PATCH] docs: Capture docker cgroup workaround (#7519) GPU support can break on some systems after a while. This captures a known workaround to solve the problem. --- docs/troubleshooting.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 0a89b87f..cbd73c7b 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -97,6 +97,8 @@ On linux, AMD GPU access typically requires `video` and/or `render` group member When running in a container, in some Linux distributions and container runtimes, the ollama process may be unable to access the GPU. Use `ls -ld /dev/kfd /dev/dri /dev/dri/*` on the host system to determine the group assignments on your system, and pass additional `--group-add ...` arguments to the container so it can access the required devices. +If Ollama initially works on the GPU in a docker container, but then switches to running on CPU after some period of time with errors in the server log reporting GPU discovery failures, this can be resolved by disabling systemd cgroup management in Docker. Edit `/etc/docker/daemon.json` on the host and add `"exec-opts": ["native.cgroupdriver=cgroupfs"]` to the docker configuration. + If you are experiencing problems getting Ollama to correctly discover or use your GPU for inference, the following may help isolate the failure. - `AMD_LOG_LEVEL=3` Enable info log levels in the AMD HIP/ROCm libraries. This can help show more detailed error codes that can help troubleshoot problems - `OLLAMA_DEBUG=1` During GPU discovery additional information will be reported