Commits · e89dc1d54bd5d3206af4a032b6268d1efa7e7463 · OpenDAS / ollama

09 Jan, 2024 9 commits
- Set corret CUDA minimum compute capability version · 1961a81f
  Daniel Hiltgen authored Jan 09, 2024
```
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
```
  1961a81f
- update rough cuda overhead estimate to 15% + 384MiB · 6df83e6d
  Jeffrey Morgan authored Jan 09, 2024
  
  6df83e6d
- revert cuda overhead to 20% · 6164f378
  Jeffrey Morgan authored Jan 09, 2024
  
  6164f378
- add `TODO` for cuda overhead · 6566387a
  Jeffrey Morgan authored Jan 09, 2024
  
  6566387a
- update cuda overhead to 20% to fix crashes when switching between models and large context sizes · 37708931
  Jeffrey Morgan authored Jan 09, 2024
  
  37708931
- update cuda overhead to 15% or 400MiB · f6cb0a55
  Jeffrey Morgan authored Jan 08, 2024
  
  f6cb0a55
- fix build on linux · 2680078c
  Jeffrey Morgan authored Jan 08, 2024
  
  2680078c
- update overhead to 15% · f1b7e5f5
  Jeffrey Morgan authored Jan 08, 2024
  
  f1b7e5f5
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

07 Jan, 2024 1 commit

Detect very old CUDA GPUs and fall back to CPU · d74ce6bd

Daniel Hiltgen authored Jan 06, 2024

If we try to load the CUDA library on an old GPU, it panics and crashes
the server. This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.

d74ce6bd

06 Jan, 2024 1 commit
- add cuda lib path for nvidia container toolkit · 1caa5612
  Jeffrey Morgan authored Jan 05, 2024
  
  1caa5612
05 Jan, 2024 1 commit

gpu: read memory info from all cuda devices (#1802) · df325373

Jeffrey Morgan authored Jan 05, 2024

* gpu: read memory info from all cuda devices

* add `LOOKUP_SIZE` constant

* better constant name

* address comments

df325373

03 Jan, 2024 2 commits
- set `num_gpu` to 1 only by default on darwin arm64 (#1771) · c7ea8f23
  Jeffrey Morgan authored Jan 03, 2024
  
  c7ea8f23
- Fix windows system memory lookup · a2ad9524
  Daniel Hiltgen authored Dec 22, 2023
```
This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.
```
  a2ad9524
02 Jan, 2024 1 commit

Switch windows build to fully dynamic · d966b730

Daniel Hiltgen authored Dec 23, 2023

Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.

d966b730

20 Dec, 2023 1 commit

Revamp the dynamic library shim · 7555ea44

Daniel Hiltgen authored Dec 20, 2023

This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.

7555ea44

19 Dec, 2023 5 commits
- Additional nvidial-ml path to check · 1d1eb168
  Daniel Hiltgen authored Dec 19, 2023
  
  1d1eb168
- Fix darwin intel build · 6558f94e
  Daniel Hiltgen authored Dec 19, 2023
  
  6558f94e
- Add WSL2 path to nvidia-ml.so library · 5646826a
  Daniel Hiltgen authored Dec 15, 2023
  
  5646826a
- Refine build to support CPU only · 1b991d0b
  Daniel Hiltgen authored Dec 13, 2023
```
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
```
  1b991d0b
- Adapted rocm support to cgo based llama.cpp · 35934b2e
  Daniel Hiltgen authored Nov 29, 2023
  
  35934b2e