Commits · d88c527be392ff4a05648f6e2cbd8f69241714ca · OpenDAS / ollama

11 Jan, 2024 3 commits

Build multiple CPU variants and pick the best · d88c527b

Daniel Hiltgen authored Jan 07, 2024

This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available

d88c527b

Support multiple variants for a given llm lib type · 8da7bef0

Daniel Hiltgen authored Jan 05, 2024

In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.

8da7bef0

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17

Jeffrey Morgan authored Jan 10, 2024

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

b24e8d17

09 Jan, 2024 3 commits
- use runner if cuda alloc won't fit · f387e963
  Jeffrey Morgan authored Jan 09, 2024
  
  f387e963
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
- better estimate scratch buffer size · 58ce2d82
  Jeffrey Morgan authored Jan 08, 2024
  
  58ce2d82
08 Jan, 2024 1 commit

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

04 Jan, 2024 4 commits
- Load dynamic cpu lib on windows · e9ce91e9
  Daniel Hiltgen authored Jan 04, 2024
```
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI.  This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
```
  e9ce91e9
- tweak memory requirements error text · c0285158
  Jeffrey Morgan authored Jan 03, 2024
  
  c0285158
- add macOS memory check for 47B models · 77a66df7
  Jeffrey Morgan authored Jan 03, 2024
  
  77a66df7
- remove unused filetype check · 5b4837f8
  Jeffrey Morgan authored Jan 03, 2024
  
  5b4837f8
20 Dec, 2023 1 commit

Revamp the dynamic library shim · 7555ea44

Daniel Hiltgen authored Dec 20, 2023

This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.

7555ea44

19 Dec, 2023 4 commits

Refine handling of shim presence · 3269535a
Daniel Hiltgen authored Dec 15, 2023
```
This allows the CPU only builds to work on systems with Radeon cards
```
3269535a
Adapted rocm support to cgo based llama.cpp · 35934b2e
Daniel Hiltgen authored Nov 29, 2023

35934b2e

Add cgo implementation for llama.cpp · d4cd6957

Daniel Hiltgen authored Nov 13, 2023

Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.

d4cd6957

deprecate ggml · 811b1f03

Bruce MacDonald authored Nov 24, 2023



- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>

811b1f03

05 Dec, 2023 3 commits
- load projectors · b9495ea1
  Michael Yang authored Nov 30, 2023
  
  b9495ea1
- chat api endpoint (#1392) · 195e3d9d
  Bruce MacDonald authored Dec 05, 2023
  
  195e3d9d
- Revert "chat api (#991)" while context variable is fixed · 00d06619
  Jeffrey Morgan authored Dec 04, 2023
```
This reverts commit 7a0899d6.
```
  00d06619
04 Dec, 2023 1 commit

chat api (#991) · 7a0899d6

Bruce MacDonald authored Dec 04, 2023

- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history

7a0899d6

20 Nov, 2023 1 commit
- recent llama.cpp update added kernels for fp32, q5_0, and q5_1 · 19b7a4d7
  Michael Yang authored Nov 20, 2023
  
  19b7a4d7
10 Nov, 2023 1 commit

JSON mode: add `"format" as an api parameter (#1051) · 5cba29b9

Jeffrey Morgan authored Nov 09, 2023



* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

5cba29b9

02 Nov, 2023 1 commit
- default rope params to 0 for new models (#968) · 2e537046
  Jeffrey Morgan authored Nov 02, 2023
  
  2e537046
19 Oct, 2023 2 commits
- simpler check for model loading compatibility errors · 7ed5a39b
  Jeffrey Morgan authored Oct 19, 2023
  
  7ed5a39b
- add error for `falcon` and `starcoder` vocab compatibility (#844) · a7dad24d
  Jeffrey Morgan authored Oct 19, 2023
```
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
```
  a7dad24d
13 Oct, 2023 4 commits
- only check system memory on macos · 36fe2dee
  Michael Yang authored Oct 13, 2023
  
  36fe2dee
- check total (system + video) memory · 4a8931f6
  Michael Yang authored Oct 12, 2023
  
  4a8931f6
- refactor memory check · bd6e38fb
  Michael Yang authored Oct 12, 2023
  
  bd6e38fb
- fix memory check · 92189a58
  Michael Yang authored Oct 12, 2023
  
  92189a58
11 Oct, 2023 1 commit
- add format bytes · b599946b
  Michael Yang authored Oct 11, 2023
  
  b599946b
05 Oct, 2023 1 commit
- enable q8, q5, 5_1, and f32 for linux gpu (#699) · d06bc0cb
  Bruce MacDonald authored Oct 05, 2023
  
  d06bc0cb
25 Sep, 2023 1 commit
- unbound max num gpu layers (#591) · 86279f4a
  Bruce MacDonald authored Sep 25, 2023
```
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  86279f4a
21 Sep, 2023 1 commit

remove tmp directories created by previous servers (#559) · 4cba75ef

Bruce MacDonald authored Sep 21, 2023



* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>

4cba75ef

12 Sep, 2023 1 commit
- fix falcon decode · 7dee25a0
  Michael Yang authored Sep 12, 2023
```
get model and file type from bin file
```
  7dee25a0
07 Sep, 2023 1 commit
- GGUF support (#441) · 09dd2aef
  Bruce MacDonald authored Sep 07, 2023
  
  09dd2aef
30 Aug, 2023 1 commit

subprocess llama.cpp server (#401) · 42998d79

Bruce MacDonald authored Aug 30, 2023

* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm

42998d79

26 Aug, 2023 2 commits
- allow F16 to use metal · b25dd179
  Michael Yang authored Aug 26, 2023
```
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
```
  b25dd179
- add 34b to mem check · 304f2b6c
  Michael Yang authored Aug 26, 2023
  
  304f2b6c
17 Aug, 2023 1 commit
- model and file type as strings · a894cc79
  Michael Yang authored Aug 17, 2023
  
  a894cc79
14 Aug, 2023 1 commit
- close open files · e26085b9
  Michael Yang authored Aug 14, 2023
  
  e26085b9