Commits · de2fbdec991ac52ff015818b19482fdff22e2deb · OpenDAS / ollama

11 Jan, 2024 10 commits

Merge pull request #1819 from dhiltgen/multi_variant · de2fbdec
Daniel Hiltgen authored Jan 11, 2024
```
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
```
de2fbdec
Add semantic kernel to Readme (#1931) · f5faf79a
Eduard van Valkenburg authored Jan 11, 2024

f5faf79a
Merge pull request #1552 from jmorganca/mxyng/lint-test · f4f939de
Michael Yang authored Jan 11, 2024
```
add lint and test on pull_request
```
f4f939de

Always dynamically load the llm server library · 39928a42

Daniel Hiltgen authored Jan 09, 2024

This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform

39928a42

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

fix gpu_test.go Error (same type) uint64->uint32 (#1921) · 3bc8b983
Fabian Preiß authored Jan 11, 2024

3bc8b983
revisit memory allocation to account for full kv cache on main gpu · ab6be852
Jeffrey Morgan authored Jan 11, 2024

ab6be852
DRY out the Dockefile.build · 052b33b8
Daniel Hiltgen authored Jan 06, 2024

052b33b8

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17

Jeffrey Morgan authored Jan 10, 2024

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

b24e8d17

10 Jan, 2024 6 commits

revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1` · f8388139
Jeffrey Morgan authored Jan 10, 2024

f8388139
Merge pull request #1914 from dhiltgen/smarter_cuda_detection · ac70ab67
Daniel Hiltgen authored Jan 10, 2024
```
Smarter GPU Management library detection
```
ac70ab67

Harden GPU mgmt library lookup · 3c49c3ab

Daniel Hiltgen authored Jan 10, 2024

When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.

3c49c3ab

Support optional override of the target archictures · 9754ae4c

Daniel Hiltgen authored Jan 10, 2024

This can help speed up incremental builds when you're only testing one
archicture, like amd64. E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:

9754ae4c

update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until... · 224fbf27
Jeffrey Morgan authored Jan 10, 2024
```
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
```
224fbf27

Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885) · 2c6e8f52

Jeffrey Morgan authored Jan 10, 2024

* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server

2c6e8f52

09 Jan, 2024 24 commits
- clean up cmake `build` directory when cross compiling macOS builds · 34344d80
  Jeffrey Morgan authored Jan 09, 2024
  
  34344d80
- Update api.md (#1878) · e868c8a5
  Robin Glauser authored Jan 09, 2024
```
Fixed assistant in the example response.
```
  e868c8a5
- calculate overhead based number of gpu devices (#1875) · c336693f
  Jeffrey Morgan authored Jan 09, 2024
  
  c336693f
- Merge pull request #1874 from dhiltgen/correct_cuda_min · e89dc1d5
  Daniel Hiltgen authored Jan 09, 2024
```
Set corret CUDA minimum compute capability version
```
  e89dc1d5
- Set corret CUDA minimum compute capability version · 1961a81f
  Daniel Hiltgen authored Jan 09, 2024
```
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
```
  1961a81f
- only build for metal on `arm64` · 8a8c7e7f
  Jeffrey Morgan authored Jan 09, 2024
  
  8a8c7e7f
- update rough cuda overhead estimate to 15% + 384MiB · 6df83e6d
  Jeffrey Morgan authored Jan 09, 2024
  
  6df83e6d
- typo · f921e269
  Michael Yang authored Jan 09, 2024
  
  f921e269
- remove unused fields and functions · 4a33cede
  Michael Yang authored Dec 22, 2023
  
  4a33cede
- fix temporary history file permissions · f95d2f25
  Michael Yang authored Dec 18, 2023
  
  f95d2f25
- fix(windows): modelpath and list · 2b9892a8
  Michael Yang authored Dec 15, 2023
  
  2b9892a8
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5
- add .golangci.yaml · acfc376e
  Michael Yang authored Dec 15, 2023
  
  acfc376e
- add lint and test on pull_request · 99725314
  Michael Yang authored Dec 15, 2023
  
  99725314
- Merge pull request #1614 from jmorganca/mxyng/fix-set-template · 62023177
  Michael Yang authored Jan 09, 2024
```
fix: set template without triple quotes
```
  62023177
- revert cuda overhead to 20% · 6164f378
  Jeffrey Morgan authored Jan 09, 2024
  
  6164f378
- use runner if cuda alloc won't fit · f387e963
  Jeffrey Morgan authored Jan 09, 2024
  
  f387e963
- add `TODO` for cuda overhead · 6566387a
  Jeffrey Morgan authored Jan 09, 2024
  
  6566387a
- update cuda overhead to 20% to fix crashes when switching between models and large context sizes · 37708931
  Jeffrey Morgan authored Jan 09, 2024
  
  37708931
- update cuda overhead to 15% or 400MiB · f6cb0a55
  Jeffrey Morgan authored Jan 08, 2024
  
  f6cb0a55
- fix build on linux · 2680078c
  Jeffrey Morgan authored Jan 08, 2024
  
  2680078c
- update overhead to 15% · f1b7e5f5
  Jeffrey Morgan authored Jan 08, 2024
  
  f1b7e5f5
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
- better estimate scratch buffer size · 58ce2d82
  Jeffrey Morgan authored Jan 08, 2024
  
  58ce2d82