Commits · de2fbdec991ac52ff015818b19482fdff22e2deb · OpenDAS / ollama

11 Jan, 2024 5 commits

Always dynamically load the llm server library · 39928a42

Daniel Hiltgen authored Jan 09, 2024

This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform

39928a42

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

revisit memory allocation to account for full kv cache on main gpu · ab6be852
Jeffrey Morgan authored Jan 11, 2024

ab6be852

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17

Jeffrey Morgan authored Jan 10, 2024

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

b24e8d17

10 Jan, 2024 3 commits
- revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1` · f8388139
  Jeffrey Morgan authored Jan 10, 2024
  
  f8388139
- update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until... · 224fbf27
  Jeffrey Morgan authored Jan 10, 2024
```
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
```
  224fbf27
- Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885) · 2c6e8f52
  Jeffrey Morgan authored Jan 10, 2024
```
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server
```
  2c6e8f52
09 Jan, 2024 9 commits
- clean up cmake `build` directory when cross compiling macOS builds · 34344d80
  Jeffrey Morgan authored Jan 09, 2024
  
  34344d80
- only build for metal on `arm64` · 8a8c7e7f
  Jeffrey Morgan authored Jan 09, 2024
  
  8a8c7e7f
- typo · f921e269
  Michael Yang authored Jan 09, 2024
  
  f921e269
- remove unused fields and functions · 4a33cede
  Michael Yang authored Dec 22, 2023
  
  4a33cede
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5
- use runner if cuda alloc won't fit · f387e963
  Jeffrey Morgan authored Jan 09, 2024
  
  f387e963
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
- better estimate scratch buffer size · 58ce2d82
  Jeffrey Morgan authored Jan 08, 2024
  
  58ce2d82
- fix windows build · 18ddf6d5
  Jeffrey Morgan authored Jan 08, 2024
  
  18ddf6d5
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

07 Jan, 2024 2 commits
- dont use `-Wall` in static build (#1833) · 5feec959
  Jeffrey Morgan authored Jan 07, 2024
  
  5feec959
- add `-DCMAKE_SYSTEM_NAME=Darwin` cmake flag (#1832) · dbdd50b2
  Jeffrey Morgan authored Jan 07, 2024
  
  dbdd50b2
05 Jan, 2024 1 commit
- remove unused generate patches (#1810) · 3367b5f3
  Bruce MacDonald authored Jan 05, 2024
  
  3367b5f3
04 Jan, 2024 9 commits
- Cleaup stale submodule · 9983fa5f
  Daniel Hiltgen authored Jan 04, 2024
```
If the tree has a stale submodule, make sure we clean it up first
```
  9983fa5f
- Init submodule with new path · fac9060d
  Daniel Hiltgen authored Jan 04, 2024
  
  fac9060d
- Code shuffle to clean up the llm dir · 77d96da9
  Daniel Hiltgen authored Jan 04, 2024
  
  77d96da9
- Load dynamic cpu lib on windows · e9ce91e9
  Daniel Hiltgen authored Jan 04, 2024
```
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI.  This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
```
  e9ce91e9
- tweak memory requirements error text · c0285158
  Jeffrey Morgan authored Jan 03, 2024
  
  c0285158
- add macOS memory check for 47B models · 77a66df7
  Jeffrey Morgan authored Jan 03, 2024
  
  77a66df7
- remove unused filetype check · 5b4837f8
  Jeffrey Morgan authored Jan 03, 2024
  
  5b4837f8
- update cmake flags for `amd64` macOS (#1780) · 29340c2e
  Jeffrey Morgan authored Jan 03, 2024
```
* update cmake flags for intel macOS

* remove `LLAMA_K_QUANTS`

* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
```
  29340c2e
- Fix CPU only builds · ddbfa6fe
  Daniel Hiltgen authored Jan 03, 2024
```
Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.
```
  ddbfa6fe
03 Jan, 2024 2 commits
- Improve maintainability of Radeon card list · 16f4603b
  Daniel Hiltgen authored Jan 03, 2024
```
This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.
```
  16f4603b
- fix: relay request opts to loaded llm prediction (#1761) · 0b3118e0
  Bruce MacDonald authored Jan 03, 2024
  
  0b3118e0
02 Jan, 2024 4 commits

Get rid of one-line llama.log · 0498f7ce

Daniel Hiltgen authored Dec 30, 2023

This one log line was triggering a single line llama.log to be generated
in the pwd of the server

0498f7ce

Rename the ollama cmakefile · 738a8d12
Daniel Hiltgen authored Dec 24, 2023

738a8d12

Switch windows build to fully dynamic · d966b730

Daniel Hiltgen authored Dec 23, 2023

Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.

d966b730

Refactor how we augment llama.cpp · 9a70aecc

Daniel Hiltgen authored Dec 22, 2023

This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.

9a70aecc

27 Dec, 2023 1 commit
- enable `cache_prompt` by default · d4ebdadb
  Jeffrey Morgan authored Dec 27, 2023
  
  d4ebdadb
22 Dec, 2023 3 commits
- Add Cache flag to api (#1642) · 10da41d6
  K0IN authored Dec 22, 2023
  
  10da41d6
- Quiet down llama.cpp logging by default · e5202eb6
  Daniel Hiltgen authored Dec 22, 2023
```
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
```
  e5202eb6
- Remove CPU build, fixup linux build script · fa24e73b
  Daniel Hiltgen authored Dec 21, 2023
  
  fa24e73b