docs: Capture docker cgroup workaround (#7519)

GPU support can break on some systems after a while. This captures a known workaround to solve the problem.

docs: Capture docker cgroup workaround (#7519)
GPU support can break on some systems after a while. This captures a known workaround to solve the problem.
6606e424 · Daniel Hiltgen · GitHub · 65973ceb · 6606e424
Unverified Commit 6606e424 authored Nov 12, 2024 by Daniel Hiltgen Committed by GitHub Nov 12, 2024
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 0 deletions

docs/troubleshooting.md docs/troubleshooting.md +2 -0

No files found.
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -97,6 +97,8 @@ On linux, AMD GPU access typically requires `video` and/or `render` group member

 When running in a container, in some Linux distributions and container runtimes, the ollama process may be unable to access the GPU.  Use `ls -ld /dev/kfd /dev/dri /dev/dri/*` on the host system to determine the group assignments on your system, and pass additional `--group-add ...` arguments to the container so it can access the required devices.

+If Ollama initially works on the GPU in a docker container, but then switches to running on CPU after some period of time with errors in the server log reporting GPU discovery failures, this can be resolved by disabling systemd cgroup management in Docker.  Edit `/etc/docker/daemon.json` on the host and add `"exec-opts": ["native.cgroupdriver=cgroupfs"]` to the docker configuration.
+
 If you are experiencing problems getting Ollama to correctly discover or use your GPU for inference, the following may help isolate the failure.
 - `AMD_LOG_LEVEL=3` Enable info log levels in the AMD HIP/ROCm libraries.  This can help show more detailed error codes that can help troubleshoot problems
 - `OLLAMA_DEBUG=1` During GPU discovery additional information will be reported