Commits · bc8909fb38525c89dda842d4ecfc86a933089a99 · OpenDAS / ollama

01 Oct, 2025 1 commit

Use runners for GPU discovery (#12090) · bc8909fb

Daniel Hiltgen authored Oct 01, 2025

This revamps how we discover GPUs in the system by leveraging the Ollama
runner. This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs. Now the runner does that implicitly based on the actual
device list. In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.

bc8909fb

01 Apr, 2025 1 commit
- discover: /proc/cpuinfo file open and close. (#9950) · 4059a297
  湛露先生 authored Apr 01, 2025
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  4059a297
30 Oct, 2024 1 commit

Refine default thread selection for NUMA systems (#7322) · 16f4eabe

Daniel Hiltgen authored Oct 30, 2024

Until we have full NUMA support, this adjusts the default thread selection
algorithm to count up the number of performance cores across all sockets.

16f4eabe

17 Oct, 2024 1 commit
- Rename gpu package discover (#7143) · 05cd82ef
  Daniel Hiltgen authored Oct 16, 2024
```
Cleaning up go package naming
```
  05cd82ef
15 Oct, 2024 1 commit

Discovery CPU details for default thread selection (#6264) · 24636dfa

Daniel Hiltgen authored Oct 15, 2024

On windows, detect large multi-socket systems and reduce to the number of cores
in one socket for best performance

24636dfa

19 Aug, 2024 1 commit

Refactor linux packaging · 74d45f01

Daniel Hiltgen authored Jul 08, 2024

This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.

74d45f01

02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
11 Jul, 2024 1 commit

llm: avoid loading model if system memory is too small (#5637) · c4cf8ad5

Jeffrey Morgan authored Jul 11, 2024



* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

c4cf8ad5

14 Jun, 2024 1 commit
- review comments and coverage · 6f351bf5
  Daniel Hiltgen authored Jun 05, 2024
  
  6f351bf5