Commits · bc13da2bfefe5cfa18b733d404a20209e1c54576 · OpenDAS / ollama

11 Mar, 2024 1 commit

Avoid rocm runner and dependency clash · bc13da2b

Daniel Hiltgen authored Mar 11, 2024

Putting the rocm symlink next to the runners is risky.  This moves
the payloads into a subdir to avoid potential clashes.

bc13da2b

10 Mar, 2024 1 commit

Add ollama executable peer dir for rocm · 00ec2693

Daniel Hiltgen authored Mar 10, 2024

This allows people who package up ollama on their own to place
the rocm dependencies in a peer directory to the ollama executable
much like our windows install flow.

00ec2693

09 Mar, 2024 2 commits

tidy cleanup logs · 0bd0f4a2
Jeffrey Morgan authored Mar 09, 2024

0bd0f4a2

Finish unwinding idempotent payload logic · 4a5c9b80

Daniel Hiltgen authored Mar 08, 2024

The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.

4a5c9b80

07 Mar, 2024 2 commits

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

Allow setting max vram for workarounds · be330174

Daniel Hiltgen authored Mar 06, 2024

Until we get all the memory calculations correct, this can provide
and escape valve for users to workaround out of memory crashes.

be330174

29 Feb, 2024 1 commit
- fix: print usedMemory size right (#2827) · fa2f2b35
  tylinux authored Mar 01, 2024
  
  fa2f2b35
25 Feb, 2024 1 commit

Determine max VRAM on macOS using `recommendedMaxWorkingSetSize` (#2354) · a189810d

peanut256 authored Feb 26, 2024

* read iogpu.wired_limit_mb on macOS

Fix for https://github.com/ollama/ollama/issues/1826

* improved determination of available vram on macOS

read the recommended maximal vram on macOS via Metal API

* Removed macOS-specific logging

* Remove logging from gpu_darwin.go

* release Core Foundation object

fixes a possible memory leak

a189810d

17 Feb, 2024 1 commit
- Harden AMD driver lookup logic · 9754c6d9
  Daniel Hiltgen authored Feb 16, 2024
```
It looks like the version file doesnt exist on older(?) drivers
```
  9754c6d9
12 Feb, 2024 1 commit

Detect AMD GPU info via sysfs and block old cards · 6d84f075

Daniel Hiltgen authored Feb 11, 2024

This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.

6d84f075

28 Jan, 2024 2 commits

Don't disable GPUs on arm without AVX · 15562e88
Daniel Hiltgen authored Jan 28, 2024
```
AVX is an x86 feature, so ARM should be excluded from
the check.
```
15562e88

Harden for zero detected GPUs · f07f8b7a

Daniel Hiltgen authored Jan 28, 2024

At least with the ROCm libraries, its possible to have the library
present with zero GPUs. This fix avoids a divide by zero bug in llm.go
when we try to calculate GPU memory with zero GPUs.

f07f8b7a

27 Jan, 2024 1 commit
- Update gpu_info_rocm.c · 59d87127
  Jagadish Krishnamoorthy authored Jan 26, 2024
  
  59d87127
26 Jan, 2024 3 commits

Detect lack of AVX and fallback to CPU mode · 667a2ba1

Daniel Hiltgen authored Jan 26, 2024

We build the GPU libraries with AVX enabled to ensure that if not all
layers fit on the GPU we get better performance in a mixed mode.
If the user is using a virtualization/emulation system that lacks AVX
this used to result in an illegal instruction error and crash before this
fix. Now we will report a warning in the server log, and just use
CPU mode to ensure we don't crash.

667a2ba1

Ignore AMD integrated GPUs · 9d7b5d6c
Daniel Hiltgen authored Jan 25, 2024
```
Detect and ignore integrated GPUs reported by rocm.
```
9d7b5d6c
Fix crash on cuda ml init failure · 5d9c4a5f
Daniel Hiltgen authored Jan 26, 2024
```
The new driver lookup code was triggering after init failure due to a missing return
```
5d9c4a5f

24 Jan, 2024 1 commit

More logging for gpu management · 013fd071

Daniel Hiltgen authored Jan 24, 2024

Fix an ordering glitch of dlerr/dlclose and add more logging to help
root cause some crashes users are hitting. This also refines the
function pointer names to use the underlying function names instead
of simplified names for readability.

013fd071

23 Jan, 2024 1 commit

Report more information about GPUs in verbose mode · 987c16b2

Daniel Hiltgen authored Jan 22, 2024

This adds additional calls to both CUDA and ROCm management libraries to
discover additional attributes about the GPU(s) detected in the system, and
wires up runtime verbosity selection. When users hit problems with GPUs we can
ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.

987c16b2

20 Jan, 2024 3 commits
- Add compute capability 5.0, 7.5, and 8.0 · a447a083
  Daniel Hiltgen authored Jan 20, 2024
  
  a447a083
- increase minimum overhead to 1024MiB (#2114) · f32ea81b
  Jeffrey Morgan authored Jan 20, 2024
  
  f32ea81b
- Add support for CUDA 5.2 cards · 681a9149
  Daniel Hiltgen authored Jan 20, 2024
  
  681a9149
19 Jan, 2024 2 commits

More WSL paths · 552db98b
Daniel Hiltgen authored Jan 19, 2024

552db98b

Fix CPU-only build under Android Termux enviornment. · eb76f3e3

Self Denial authored Jan 15, 2024

Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.

Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).

eb76f3e3

18 Jan, 2024 1 commit
- Mechanical switch from log to slog · fedd705a
  Daniel Hiltgen authored Jan 18, 2024
```
A few obvious levels were adjusted, but generally everything mapped to "info" level.
```
  fedd705a
14 Jan, 2024 1 commit
- Let gpu.go and gen_linux.sh also find CUDA on Arch Linux · f4bf1d51
  Alexander F. Rødseth authored Jan 14, 2024
  
  f4bf1d51
11 Jan, 2024 6 commits

Fix up the CPU fallback selection · 7427fa13

Daniel Hiltgen authored Jan 11, 2024

The memory changes and multi-variant change had some merge
glitches I missed.  This fixes them so we actually get the cpu llm lib
and best variant for the given system.

7427fa13

Always dynamically load the llm server library · 39928a42

Daniel Hiltgen authored Jan 09, 2024

This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform

39928a42

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

fix gpu_test.go Error (same type) uint64->uint32 (#1921) · 3bc8b983
Fabian Preiß authored Jan 11, 2024

3bc8b983

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17

Jeffrey Morgan authored Jan 10, 2024

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

b24e8d17

10 Jan, 2024 1 commit

Harden GPU mgmt library lookup · 3c49c3ab

Daniel Hiltgen authored Jan 10, 2024

When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.

3c49c3ab

09 Jan, 2024 8 commits
- calculate overhead based number of gpu devices (#1875) · c336693f
  Jeffrey Morgan authored Jan 09, 2024
  
  c336693f
- Set corret CUDA minimum compute capability version · 1961a81f
  Daniel Hiltgen authored Jan 09, 2024
```
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
```
  1961a81f
- update rough cuda overhead estimate to 15% + 384MiB · 6df83e6d
  Jeffrey Morgan authored Jan 09, 2024
  
  6df83e6d
- revert cuda overhead to 20% · 6164f378
  Jeffrey Morgan authored Jan 09, 2024
  
  6164f378
- add `TODO` for cuda overhead · 6566387a
  Jeffrey Morgan authored Jan 09, 2024
  
  6566387a
- update cuda overhead to 20% to fix crashes when switching between models and large context sizes · 37708931
  Jeffrey Morgan authored Jan 09, 2024
  
  37708931
- update cuda overhead to 15% or 400MiB · f6cb0a55
  Jeffrey Morgan authored Jan 08, 2024
  
  f6cb0a55
- fix build on linux · 2680078c
  Jeffrey Morgan authored Jan 08, 2024
  
  2680078c