Doc container usage and workaround for nvidia errors

2024-05-09 08:49:40 -07:00 · 2024-05-09 08:49:40 -07:00 · 8cc0ee2efe
commit 8cc0ee2efe
parent d5eec16d23
3 changed files with 92 additions and 2 deletions
--- a/docs/README.md
+++ b/docs/README.md
@ -6,7 +6,7 @@
 * [Importing models](./import.md)
 * [Linux Documentation](./linux.md)
 * [Windows Documentation](./windows.md)
-* [Docker Documentation](https://hub.docker.com/r/ollama/ollama)
+* [Docker Documentation](./docker.md)
 ### Reference
--- a/docs/docker.md
+++ b/docs/docker.md
@ -0,0 +1,71 @@
 # Ollama Docker image
 ### CPU only
 ```bash
 docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
 ```
 ### Nvidia GPU
 Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation).
 #### Install with Apt
 1.  Configure the repository
 ```bash
 curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
 curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
 sudo apt-get update
 ```
 2.  Install the NVIDIA Container Toolkit packages
 ```bash
 sudo apt-get install -y nvidia-container-toolkit
 ```
 #### Install with Yum or Dnf
 1.  Configure the repository
 ```bash
 curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
    | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
 ```
 2. Install the NVIDIA Container Toolkit packages
 ```bash
 sudo yum install -y nvidia-container-toolkit
 ```
 #### Configure Docker to use Nvidia driver 
 ```
 sudo nvidia-ctk runtime configure --runtime=docker
 sudo systemctl restart docker
 ```
 #### Start the container
 ```bash
 docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
 ```
 ### AMD GPU
 To run Ollama using Docker with AMD GPUs, use the `rocm` tag and the following command:
 ```
 docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
 ```
 ### Run model locally
 Now you can run a model:
 ```
 docker exec -it ollama ollama run llama3
 ```
 ### Try different models
 More models can be found on the [Ollama library](https://ollama.com/library).
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@ -83,3 +83,22 @@ If your system is configured with the "noexec" flag where Ollama stores its
 temporary executable files, you can specify an alternate location by setting
 OLLAMA_TMPDIR to a location writable by the user ollama runs as.  For example
 OLLAMA_TMPDIR=/usr/share/ollama/
 ## Container fails to run on NVIDIA GPU
 Make sure you've set up the conatiner runtime first as described in [docker.md](./docker.md)
 Sometimes the container runtime can have difficulties initializing the GPU.
 When you check the server logs, this can show up as various error codes, such
 as "3" (not initialized), "46" (device unavailable), "100" (no device), "999"
 (unknown), or others.  The following troubleshooting techniques may help resolve
 the problem
 - Is the uvm driver not loaded? `sudo nvidia-modprobe -u`
 - Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm`
 - Try rebooting
 - Make sure you're running the latest nvidia drivers
 If none of those resolve the problem, gather additional information and file an issue:
 - Set `CUDA_ERROR_LEVEL=50` and try again to get more diagnostic logs
 - Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia`