- 13 Aug, 2025 1 commit
-
-
Daniel Hiltgen authored
We prefer the nvcuda library, which reports driver versions. When we dropped cuda v11, we added a safety check for too-old drivers. What we missed was the cudart fallback discovery logic didn't have driver version wired up. This fixes cudart discovery to expose the driver version as well so we no longer reject all GPUs if nvcuda didn't work.
-
- 06 May, 2025 1 commit
-
-
Michael Yang authored
-
- 05 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 17 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
Cleaning up go package naming
-
- 19 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
pointer deref's weren't correct on a few libraries, which explains some crashes on older systems or miswired symlinks for discovery libraries.
-
- 14 Jun, 2024 2 commits
-
-
Daniel Hiltgen authored
This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.
-
Daniel Hiltgen authored
Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 01 Apr, 2024 2 commits
-
-
Daniel Hiltgen authored
Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.
-
Daniel Hiltgen authored
"cudart init failure: 35" isn't particularly helpful in the logs.
-
- 25 Mar, 2024 1 commit
-
-
Jeremy authored
-