Commits · f4f939de28aa1f9365ef81f837516764046492d0 · OpenDAS / ollama

11 Jan, 2024 4 commits
- Merge pull request #1552 from jmorganca/mxyng/lint-test · f4f939de
  Michael Yang authored Jan 11, 2024
```
add lint and test on pull_request
```
  f4f939de
- fix gpu_test.go Error (same type) uint64->uint32 (#1921) · 3bc8b983
  Fabian Preiß authored Jan 11, 2024
  
  3bc8b983
- revisit memory allocation to account for full kv cache on main gpu · ab6be852
  Jeffrey Morgan authored Jan 11, 2024
  
  ab6be852
- Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) · b24e8d17
  Jeffrey Morgan authored Jan 10, 2024
```
* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc
```
  b24e8d17
10 Jan, 2024 6 commits

revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1` · f8388139
Jeffrey Morgan authored Jan 10, 2024

f8388139
Merge pull request #1914 from dhiltgen/smarter_cuda_detection · ac70ab67
Daniel Hiltgen authored Jan 10, 2024
```
Smarter GPU Management library detection
```
ac70ab67

Harden GPU mgmt library lookup · 3c49c3ab

Daniel Hiltgen authored Jan 10, 2024

When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.

3c49c3ab

Support optional override of the target archictures · 9754ae4c

Daniel Hiltgen authored Jan 10, 2024

This can help speed up incremental builds when you're only testing one
archicture, like amd64. E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:

9754ae4c

update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until... · 224fbf27
Jeffrey Morgan authored Jan 10, 2024
```
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
```
224fbf27

Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885) · 2c6e8f52

Jeffrey Morgan authored Jan 10, 2024

* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server

2c6e8f52

09 Jan, 2024 25 commits
- clean up cmake `build` directory when cross compiling macOS builds · 34344d80
  Jeffrey Morgan authored Jan 09, 2024
  
  34344d80
- Update api.md (#1878) · e868c8a5
  Robin Glauser authored Jan 09, 2024
```
Fixed assistant in the example response.
```
  e868c8a5
- calculate overhead based number of gpu devices (#1875) · c336693f
  Jeffrey Morgan authored Jan 09, 2024
  
  c336693f
- Merge pull request #1874 from dhiltgen/correct_cuda_min · e89dc1d5
  Daniel Hiltgen authored Jan 09, 2024
```
Set corret CUDA minimum compute capability version
```
  e89dc1d5
- Set corret CUDA minimum compute capability version · 1961a81f
  Daniel Hiltgen authored Jan 09, 2024
```
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
```
  1961a81f
- only build for metal on `arm64` · 8a8c7e7f
  Jeffrey Morgan authored Jan 09, 2024
  
  8a8c7e7f
- update rough cuda overhead estimate to 15% + 384MiB · 6df83e6d
  Jeffrey Morgan authored Jan 09, 2024
  
  6df83e6d
- typo · f921e269
  Michael Yang authored Jan 09, 2024
  
  f921e269
- remove unused fields and functions · 4a33cede
  Michael Yang authored Dec 22, 2023
  
  4a33cede
- fix temporary history file permissions · f95d2f25
  Michael Yang authored Dec 18, 2023
  
  f95d2f25
- fix(windows): modelpath and list · 2b9892a8
  Michael Yang authored Dec 15, 2023
  
  2b9892a8
- fix lint · 2bb2bdd5
  Michael Yang authored Dec 15, 2023
  
  2bb2bdd5
- add .golangci.yaml · acfc376e
  Michael Yang authored Dec 15, 2023
  
  acfc376e
- add lint and test on pull_request · 99725314
  Michael Yang authored Dec 15, 2023
  
  99725314
- Merge pull request #1614 from jmorganca/mxyng/fix-set-template · 62023177
  Michael Yang authored Jan 09, 2024
```
fix: set template without triple quotes
```
  62023177
- revert cuda overhead to 20% · 6164f378
  Jeffrey Morgan authored Jan 09, 2024
  
  6164f378
- use runner if cuda alloc won't fit · f387e963
  Jeffrey Morgan authored Jan 09, 2024
  
  f387e963
- add `TODO` for cuda overhead · 6566387a
  Jeffrey Morgan authored Jan 09, 2024
  
  6566387a
- update cuda overhead to 20% to fix crashes when switching between models and large context sizes · 37708931
  Jeffrey Morgan authored Jan 09, 2024
  
  37708931
- update cuda overhead to 15% or 400MiB · f6cb0a55
  Jeffrey Morgan authored Jan 08, 2024
  
  f6cb0a55
- fix build on linux · 2680078c
  Jeffrey Morgan authored Jan 08, 2024
  
  2680078c
- update overhead to 15% · f1b7e5f5
  Jeffrey Morgan authored Jan 08, 2024
  
  f1b7e5f5
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
- better estimate scratch buffer size · 58ce2d82
  Jeffrey Morgan authored Jan 08, 2024
  
  58ce2d82
- fix windows build · 18ddf6d5
  Jeffrey Morgan authored Jan 08, 2024
  
  18ddf6d5
08 Jan, 2024 4 commits

Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt · 61e65024
Michael Yang authored Jan 08, 2024
```
fix(cmd): history in alt prompt
```
61e65024

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

remove ggml automatic re-pull (#1856) · 7e8f7c83
Bruce MacDonald authored Jan 08, 2024

7e8f7c83
document response in modelfile template variables (#1428) · 3f3eb19a
Bruce MacDonald authored Jan 08, 2024

3f3eb19a

07 Jan, 2024 1 commit
- Merge pull request #1834 from dhiltgen/old_cuda · 059ae458
  Daniel Hiltgen authored Jan 07, 2024
```
Detect very old CUDA GPUs and fall back to CPU
```
  059ae458