From 8cc0ee2efe39b5096ab5a86418d3c067b3474db6 Mon Sep 17 00:00:00 2001 From: Daniel Hiltgen Date: Thu, 9 May 2024 08:49:40 -0700 Subject: [PATCH] Doc container usage and workaround for nvidia errors --- docs/README.md | 2 +- docs/docker.md | 71 +++++++++++++++++++++++++++++++++++++++++ docs/troubleshooting.md | 21 +++++++++++- 3 files changed, 92 insertions(+), 2 deletions(-) create mode 100644 docs/docker.md diff --git a/docs/README.md b/docs/README.md index a3edb18c..b6221041 100644 --- a/docs/README.md +++ b/docs/README.md @@ -6,7 +6,7 @@ * [Importing models](./import.md) * [Linux Documentation](./linux.md) * [Windows Documentation](./windows.md) -* [Docker Documentation](https://hub.docker.com/r/ollama/ollama) +* [Docker Documentation](./docker.md) ### Reference diff --git a/docs/docker.md b/docs/docker.md new file mode 100644 index 00000000..0b58562b --- /dev/null +++ b/docs/docker.md @@ -0,0 +1,71 @@ +# Ollama Docker image + +### CPU only + +```bash +docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama +``` + +### Nvidia GPU +Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation). + +#### Install with Apt +1. Configure the repository +```bash +curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ + | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg +curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ + | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ + | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list +sudo apt-get update +``` +2. Install the NVIDIA Container Toolkit packages +```bash +sudo apt-get install -y nvidia-container-toolkit +``` + +#### Install with Yum or Dnf +1. Configure the repository + +```bash +curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \ + | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo +``` + +2. Install the NVIDIA Container Toolkit packages + +```bash +sudo yum install -y nvidia-container-toolkit +``` + +#### Configure Docker to use Nvidia driver +``` +sudo nvidia-ctk runtime configure --runtime=docker +sudo systemctl restart docker +``` + +#### Start the container + +```bash +docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama +``` + +### AMD GPU + +To run Ollama using Docker with AMD GPUs, use the `rocm` tag and the following command: + +``` +docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm +``` + +### Run model locally + +Now you can run a model: + +``` +docker exec -it ollama ollama run llama3 +``` + +### Try different models + +More models can be found on the [Ollama library](https://ollama.com/library). diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index b9038e38..2586e4e4 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -82,4 +82,23 @@ curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION="0.1.29" sh If your system is configured with the "noexec" flag where Ollama stores its temporary executable files, you can specify an alternate location by setting OLLAMA_TMPDIR to a location writable by the user ollama runs as. For example -OLLAMA_TMPDIR=/usr/share/ollama/ \ No newline at end of file +OLLAMA_TMPDIR=/usr/share/ollama/ + +## Container fails to run on NVIDIA GPU + +Make sure you've set up the conatiner runtime first as described in [docker.md](./docker.md) + +Sometimes the container runtime can have difficulties initializing the GPU. +When you check the server logs, this can show up as various error codes, such +as "3" (not initialized), "46" (device unavailable), "100" (no device), "999" +(unknown), or others. The following troubleshooting techniques may help resolve +the problem + +- Is the uvm driver not loaded? `sudo nvidia-modprobe -u` +- Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm` +- Try rebooting +- Make sure you're running the latest nvidia drivers + +If none of those resolve the problem, gather additional information and file an issue: +- Set `CUDA_ERROR_LEVEL=50` and try again to get more diagnostic logs +- Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia`