- 13 Aug, 2025 1 commit
-
-
Daniel Hiltgen authored
We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.
-
- 17 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
Cleaning up go package naming
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.
-
- 01 May, 2024 1 commit
-
-
Daniel Hiltgen authored
We're seeing some corner cases with cudart which might be resolved by switching to the driver API which comes bundled with the driver package
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 01 Apr, 2024 2 commits
-
-
Daniel Hiltgen authored
Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.
-
Daniel Hiltgen authored
"cudart init failure: 35" isn't particularly helpful in the logs.
-
- 25 Mar, 2024 1 commit
-
-
Jeremy authored
-