Commits · 6b50f2b9cd4105bedf55496bb2afe2ee12f74d20 · OpenDAS / ollama

13 Aug, 2025 1 commit

discovery: fix cudart driver version (#11614) · 837379a9

Daniel Hiltgen authored Aug 13, 2025

We prefer the nvcuda library, which reports driver versions. When we
dropped cuda v11, we added a safety check for too-old drivers. What
we missed was the cudart fallback discovery logic didn't have driver
version wired up. This fixes cudart discovery to expose the driver
version as well so we no longer reject all GPUs if nvcuda didn't work.

837379a9

06 May, 2025 1 commit
- discover: fix compiler warnings (#10572) · 95e744be
  Michael Yang authored May 06, 2025
  
  95e744be
05 May, 2025 1 commit
- all: fix cgo compiler warnings on windows (#10563) · 91390502
  Jeffrey Morgan authored May 05, 2025
  
  91390502
17 Oct, 2024 1 commit
- Rename gpu package discover (#7143) · 05cd82ef
  Daniel Hiltgen authored Oct 16, 2024
```
Cleaning up go package naming
```
  05cd82ef
19 Jun, 2024 1 commit

Fix bad symbol load detection · 52ce350b

Daniel Hiltgen authored Jun 19, 2024

pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.

52ce350b

14 Jun, 2024 2 commits

Reintroduce nvidia nvml library for windows · 434dfe30

Daniel Hiltgen authored Jun 03, 2024

This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.

434dfe30

Refine GPU discovery to bootstrap once · 43ed358f

Daniel Hiltgen authored May 15, 2024

Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.

43ed358f

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

01 Apr, 2024 2 commits

Release gpu discovery library after use · 526d4eb2

Daniel Hiltgen authored Mar 30, 2024

Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.

526d4eb2

Detect too-old cuda driver · 10ed1b62
Daniel Hiltgen authored Mar 28, 2024
```
"cudart init failure: 35" isn't particularly helpful in the logs.
```
10ed1b62

25 Mar, 2024 1 commit
- add support for libcudart.so for CUDA devices (adds Jetson support) · dfc6721b
  Jeremy authored Mar 25, 2024
  
  dfc6721b