Commits · 3bc28736cd3eec7c7fcc4981ebfef5c36e4bdd7d · OpenDAS / ollama

22 Jan, 2024 3 commits
- Refine debug logging for llm · 730dcfcc
  Daniel Hiltgen authored Jan 22, 2024
```
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
```
  730dcfcc
- Debug logging on init failure · 27a2d5af
  Daniel Hiltgen authored Jan 22, 2024
  
  27a2d5af
- update submodule to `6f9939d` (#2115) · 5f81a33f
  Jeffrey Morgan authored Jan 22, 2024
  
  5f81a33f
21 Jan, 2024 3 commits
- Probe GPUs before backend init · ec376453
  Daniel Hiltgen authored Jan 21, 2024
```
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
```
  ec376453
- Make CPU builds parallel and customizable AMD GPUs · df54c723
  Daniel Hiltgen authored Jan 21, 2024
```
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
```
  df54c723
- Unlock mutex when failing to load model (#2117) · 89c4aee2
  Jeffrey Morgan authored Jan 20, 2024
  
  89c4aee2
20 Jan, 2024 1 commit
- sign dylibs on macOS (#2101) · 4c54f0dd
  Jeffrey Morgan authored Jan 19, 2024
  
  4c54f0dd
19 Jan, 2024 4 commits
- Switch to local dlopen symbols · 6a042438
  Daniel Hiltgen authored Jan 19, 2024
  
  6a042438
- use `gzip` for runner embedding (#2067) · dc88cc39
  Jeffrey Morgan authored Jan 19, 2024
  
  dc88cc39
- Restore dyn_ext_server.c since RTLD_DEEPBIND has been removed · 344342ab
  Self Denial authored Jan 18, 2024
  
  344342ab
- Fix CPU-only build under Android Termux enviornment. · eb76f3e3
  Self Denial authored Jan 15, 2024
```
Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.

Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).
```
  eb76f3e3
18 Jan, 2024 1 commit
- Mechanical switch from log to slog · fedd705a
  Daniel Hiltgen authored Jan 18, 2024
```
A few obvious levels were adjusted, but generally everything mapped to "info" level.
```
  fedd705a
17 Jan, 2024 1 commit
- Add multiple CPU variants for Intel Mac · 1b249748
  Daniel Hiltgen authored Jan 12, 2024
```
This also refines the build process for the ext_server build.
```
  1b249748
16 Jan, 2024 2 commits

Bump llama.cpp to b1842 and add new cuda lib dep · 795674dd

Daniel Hiltgen authored Jan 10, 2024

Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.

795674dd

do not cache prompt (#2018) · a897e833
Bruce MacDonald authored Jan 16, 2024
```
- prompt cache causes inferance to hang after some time
```
a897e833

14 Jan, 2024 3 commits
- Fix typo in arm mac arch script · 3ca5f69c
  Daniel Hiltgen authored Jan 14, 2024
  
  3ca5f69c
- Let gpu.go and gen_linux.sh also find CUDA on Arch Linux · f4bf1d51
  Alexander F. Rødseth authored Jan 14, 2024
  
  f4bf1d51
- Disable `mmap` with lora layers (#1985) · 557110d0
  Jeffrey Morgan authored Jan 13, 2024
  
  557110d0
13 Jan, 2024 3 commits
- Fix intel mac build · 2ecb2472
  Daniel Hiltgen authored Jan 13, 2024
```
Make sure we're building an x86 ext_server lib when cross-compiling
```
  2ecb2472
- add `gcc -lstdc++` flag for linux cpu (#1974) · 288ef8ff
  Jeffrey Morgan authored Jan 13, 2024
  
  288ef8ff
- use g++ to build `libext_server.so` on linux (#1972) · 4cf17990
  Jeffrey Morgan authored Jan 13, 2024
  
  4cf17990
12 Jan, 2024 2 commits
- add max context length check · eaed6f8c
  Michael Yang authored Jan 12, 2024
  
  eaed6f8c
- improve cuda detection (rel. issue #1704) · 905862e1
  Fabian Preiss authored Jan 09, 2024
  
  905862e1
11 Jan, 2024 9 commits

Fix up the CPU fallback selection · 7427fa13

Daniel Hiltgen authored Jan 11, 2024

The memory changes and multi-variant change had some merge
glitches I missed.  This fixes them so we actually get the cpu llm lib
and best variant for the given system.

7427fa13

fix typo · d2be6387
Michael Yang authored Jan 11, 2024

d2be6387
import fmt · d7af35d3
Michael Yang authored Jan 11, 2024

d7af35d3
use x/exp/slices · defc1dbd
Michael Yang authored Jan 11, 2024

defc1dbd

Always dynamically load the llm server library · 39928a42

Daniel Hiltgen authored Jan 09, 2024

This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform

39928a42

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

revisit memory allocation to account for full kv cache on main gpu · ab6be852
Jeffrey Morgan authored Jan 11, 2024

ab6be852

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17

Jeffrey Morgan authored Jan 10, 2024

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

b24e8d17

10 Jan, 2024 3 commits
- revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1` · f8388139
  Jeffrey Morgan authored Jan 10, 2024
  
  f8388139
- update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until... · 224fbf27
  Jeffrey Morgan authored Jan 10, 2024
```
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
```
  224fbf27
- Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885) · 2c6e8f52
  Jeffrey Morgan authored Jan 10, 2024
```
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server
```
  2c6e8f52
09 Jan, 2024 5 commits
- clean up cmake `build` directory when cross compiling macOS builds · 34344d80
  Jeffrey Morgan authored Jan 09, 2024
  
  34344d80
- only build for metal on `arm64` · 8a8c7e7f
  Jeffrey Morgan authored Jan 09, 2024
  
  8a8c7e7f
- typo · f921e269
  Michael Yang authored Jan 09, 2024
  
  f921e269
- remove unused fields and functions · 4a33cede
  Michael Yang authored Dec 22, 2023
  
  4a33cede
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5