Unverified Commit 950d33aa authored by Daniel Hiltgen's avatar Daniel Hiltgen Committed by GitHub
Browse files

docs: show how to debug nvidia init failures (#12216)

This debug setting can help troubleshoot obscure initialization failures.
parent 9714e38d
...@@ -92,6 +92,9 @@ If none of those resolve the problem, gather additional information and file an ...@@ -92,6 +92,9 @@ If none of those resolve the problem, gather additional information and file an
- Set `CUDA_ERROR_LEVEL=50` and try again to get more diagnostic logs - Set `CUDA_ERROR_LEVEL=50` and try again to get more diagnostic logs
- Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia` - Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia`
You may get more details for initialization failures by enabling debug prints in the uvm driver. You should only use this temporarily while troubleshooting
- `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm uvm_debug_prints=1`
## AMD GPU Discovery ## AMD GPU Discovery
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment