Commits · 6c5ccb11f993ccc88c4761b8c31e0fefcbc1900f · OpenDAS / ollama

"vscode:/vscode.git/clone" did not exist on "c814bf1a23951fb2fd5341febbae6b0ad0d3cd4a"

07 Mar, 2024 2 commits

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

Allow setting max vram for workarounds · be330174

Daniel Hiltgen authored Mar 06, 2024

Until we get all the memory calculations correct, this can provide
and escape valve for users to workaround out of memory crashes.

be330174

17 Feb, 2024 1 commit
- Harden AMD driver lookup logic · 9754c6d9
  Daniel Hiltgen authored Feb 16, 2024
```
It looks like the version file doesnt exist on older(?) drivers
```
  9754c6d9
12 Feb, 2024 1 commit

Detect AMD GPU info via sysfs and block old cards · 6d84f075

Daniel Hiltgen authored Feb 11, 2024

This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.

6d84f075

28 Jan, 2024 2 commits

Don't disable GPUs on arm without AVX · 15562e88
Daniel Hiltgen authored Jan 28, 2024
```
AVX is an x86 feature, so ARM should be excluded from
the check.
```
15562e88

Harden for zero detected GPUs · f07f8b7a

Daniel Hiltgen authored Jan 28, 2024

At least with the ROCm libraries, its possible to have the library
present with zero GPUs. This fix avoids a divide by zero bug in llm.go
when we try to calculate GPU memory with zero GPUs.

f07f8b7a

26 Jan, 2024 2 commits

Detect lack of AVX and fallback to CPU mode · 667a2ba1

Daniel Hiltgen authored Jan 26, 2024

We build the GPU libraries with AVX enabled to ensure that if not all
layers fit on the GPU we get better performance in a mixed mode.
If the user is using a virtualization/emulation system that lacks AVX
this used to result in an illegal instruction error and crash before this
fix. Now we will report a warning in the server log, and just use
CPU mode to ensure we don't crash.

667a2ba1

Ignore AMD integrated GPUs · 9d7b5d6c
Daniel Hiltgen authored Jan 25, 2024
```
Detect and ignore integrated GPUs reported by rocm.
```
9d7b5d6c

24 Jan, 2024 1 commit

More logging for gpu management · 013fd071

Daniel Hiltgen authored Jan 24, 2024

Fix an ordering glitch of dlerr/dlclose and add more logging to help
root cause some crashes users are hitting. This also refines the
function pointer names to use the underlying function names instead
of simplified names for readability.

013fd071

23 Jan, 2024 1 commit

Report more information about GPUs in verbose mode · 987c16b2

Daniel Hiltgen authored Jan 22, 2024

This adds additional calls to both CUDA and ROCm management libraries to
discover additional attributes about the GPU(s) detected in the system, and
wires up runtime verbosity selection. When users hit problems with GPUs we can
ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.

987c16b2

20 Jan, 2024 3 commits
- Add compute capability 5.0, 7.5, and 8.0 · a447a083
  Daniel Hiltgen authored Jan 20, 2024
  
  a447a083
- increase minimum overhead to 1024MiB (#2114) · f32ea81b
  Jeffrey Morgan authored Jan 20, 2024
  
  f32ea81b
- Add support for CUDA 5.2 cards · 681a9149
  Daniel Hiltgen authored Jan 20, 2024
  
  681a9149
19 Jan, 2024 2 commits

More WSL paths · 552db98b
Daniel Hiltgen authored Jan 19, 2024

552db98b

Fix CPU-only build under Android Termux enviornment. · eb76f3e3

Self Denial authored Jan 15, 2024

Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.

Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).

eb76f3e3

18 Jan, 2024 1 commit
- Mechanical switch from log to slog · fedd705a
  Daniel Hiltgen authored Jan 18, 2024
```
A few obvious levels were adjusted, but generally everything mapped to "info" level.
```
  fedd705a
14 Jan, 2024 1 commit
- Let gpu.go and gen_linux.sh also find CUDA on Arch Linux · f4bf1d51
  Alexander F. Rødseth authored Jan 14, 2024
  
  f4bf1d51
11 Jan, 2024 3 commits

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17

Jeffrey Morgan authored Jan 10, 2024

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

b24e8d17

10 Jan, 2024 1 commit

Harden GPU mgmt library lookup · 3c49c3ab

Daniel Hiltgen authored Jan 10, 2024

When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.

3c49c3ab

09 Jan, 2024 10 commits
- calculate overhead based number of gpu devices (#1875) · c336693f
  Jeffrey Morgan authored Jan 09, 2024
  
  c336693f
- Set corret CUDA minimum compute capability version · 1961a81f
  Daniel Hiltgen authored Jan 09, 2024
```
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
```
  1961a81f
- update rough cuda overhead estimate to 15% + 384MiB · 6df83e6d
  Jeffrey Morgan authored Jan 09, 2024
  
  6df83e6d
- revert cuda overhead to 20% · 6164f378
  Jeffrey Morgan authored Jan 09, 2024
  
  6164f378
- add `TODO` for cuda overhead · 6566387a
  Jeffrey Morgan authored Jan 09, 2024
  
  6566387a
- update cuda overhead to 20% to fix crashes when switching between models and large context sizes · 37708931
  Jeffrey Morgan authored Jan 09, 2024
  
  37708931
- update cuda overhead to 15% or 400MiB · f6cb0a55
  Jeffrey Morgan authored Jan 08, 2024
  
  f6cb0a55
- fix build on linux · 2680078c
  Jeffrey Morgan authored Jan 08, 2024
  
  2680078c
- update overhead to 15% · f1b7e5f5
  Jeffrey Morgan authored Jan 08, 2024
  
  f1b7e5f5
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

07 Jan, 2024 1 commit

Detect very old CUDA GPUs and fall back to CPU · d74ce6bd

Daniel Hiltgen authored Jan 06, 2024

If we try to load the CUDA library on an old GPU, it panics and crashes
the server. This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.

d74ce6bd

03 Jan, 2024 1 commit

Fix windows system memory lookup · a2ad9524

Daniel Hiltgen authored Dec 22, 2023

This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.

a2ad9524

02 Jan, 2024 1 commit

Switch windows build to fully dynamic · d966b730

Daniel Hiltgen authored Dec 23, 2023

Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.

d966b730

20 Dec, 2023 1 commit

Revamp the dynamic library shim · 7555ea44

Daniel Hiltgen authored Dec 20, 2023

This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.

7555ea44

19 Dec, 2023 2 commits
- Refine build to support CPU only · 1b991d0b
  Daniel Hiltgen authored Dec 13, 2023
```
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
```
  1b991d0b
- Adapted rocm support to cgo based llama.cpp · 35934b2e
  Daniel Hiltgen authored Nov 29, 2023
  
  35934b2e