Commits · 2f36d769aa2db6e7bb41a0dbd079f9ce7a9bdc40 · OpenDAS / ollama

17 Nov, 2025 1 commit

bring back sysfs based VRAM information for AMD (#12871) · 2f36d769

Daniel Hiltgen authored Nov 17, 2025

* build: optimize dockerfile context for iterating

This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.

* amd: implement linux sysfs based VRAM lookup

This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.

2f36d769

06 Nov, 2025 2 commits
- Remove unnecessary MacOs 13 and lower Patches (#12656) · d4e0da08
  Thomas Stocker authored Nov 07, 2025
```
* Remove unnecessary macos 13 Patch

* Remove unnecessary MacOs Version Guard patch

* rename patchesw

* remove again macos13 patch

* rename files
```
  d4e0da08
- ggml update to b6840 (#12791) · 544b6739
  Daniel Hiltgen authored Nov 06, 2025
  
  544b6739
04 Nov, 2025 1 commit

discovery: only retry AMD GPUs (#12894) · 27f1fde4

Daniel Hiltgen authored Nov 04, 2025

* discovery: only retry AMD GPUs

CUDA and Vulkan don't crash on unsupported devices, so retry isn't necessary.
This also refactors the code to shift the Library specific logic into the ml
package.

* review comments

27f1fde4

28 Oct, 2025 1 commit

Fix vulkan PCI ID and ID handling (#12775) · 14977a93

Daniel Hiltgen authored Oct 28, 2025

* Fix vulkan PCI ID and ID handling

Intel GPUs may not report PCI IDs which was leading to incorrect overlap
detection.  Switch to using the existing PCI IDs, however AMD GPUs claim not to
report PCI IDs, but actually do, so try anyway, as this is required for ADLX to
find the GPUs on Windows. Numeric IDs lead to scheduling problems, so this also
switches Vulkan to use UUID based IDs. The GPU discovery patches have been
squashed into a single patch to simplify future rebases.

* review comments

14977a93

20 Oct, 2025 1 commit

cuda: get driver version after props (#12707) · 5d22953b

Daniel Hiltgen authored Oct 20, 2025

Users on Windows without GPUs are reporting errors relating to
cudaDriverGetVersion with the device set to -1. This ensures we only grab the
driver once we're enumerating actual devices.

5d22953b

16 Oct, 2025 1 commit
- vulkan: Get FilterID from Backend for Vulkan (#12655) · c7441342
  Thomas Stocker authored Oct 16, 2025
```
* vulkan: Get FilterID from Backend for Vulkan

* Fixing patch
```
  c7441342
13 Oct, 2025 1 commit

Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal... · 4987f13d

Gabe Goodhart authored Oct 13, 2025


Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552)

* feat: Bump llama.cpp to df1b612

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(mtmd): Correctly encode text chunks during mtmd tokenization

There can be text chunks that appear interspersed with the image embeddings
that contain template delimiter tokens for some models. These need to be
correctly translated to text tokens.

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* tests: Use MtmdChunk in image_test

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* style: Fix unnecessary conversion linting

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(ggml): Revert changes to ggml_hip.cpp

These changes were done largely by our code assistant and are likely wrong

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Revert changes in mem_nvml.cpp

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Update sync point to 1deee0

This brings in several more optimization commits and model support for
EmbeddingGemma

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Update patches for 1deee0

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: sync for bump to 1deee0

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Bad patch updates with errant `+`

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Bump llama.cpp/ggml to 7049736

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: format-patches after latest bump

Branch: LlamaCPPBump-GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

4987f13d

10 Oct, 2025 1 commit
- implement nvml for linux (#12517) · aab21904
  Daniel Hiltgen authored Oct 10, 2025
```
* implement nvml for linux

* Improve scheduler logging when VRAM doesn't recover
```
  aab21904
02 Oct, 2025 1 commit

Update GGML to b6646 (#12245) · c68f367e

Daniel Hiltgen authored Oct 02, 2025

Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported

c68f367e

01 Oct, 2025 1 commit

Use runners for GPU discovery (#12090) · bc8909fb

Daniel Hiltgen authored Oct 01, 2025

This revamps how we discover GPUs in the system by leveraging the Ollama
runner. This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs. Now the runner does that implicitly based on the actual
device list. In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.

bc8909fb