Commits · 3c49c3ab0da9cccbcca333ae278687734bb5cec8 · OpenDAS / ollama

10 Jan, 2024 4 commits

Harden GPU mgmt library lookup · 3c49c3ab

Daniel Hiltgen authored Jan 10, 2024

When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.

3c49c3ab

Support optional override of the target archictures · 9754ae4c

Daniel Hiltgen authored Jan 10, 2024

This can help speed up incremental builds when you're only testing one
archicture, like amd64. E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:

9754ae4c

update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until... · 224fbf27
Jeffrey Morgan authored Jan 10, 2024
```
update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed
```
224fbf27

Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885) · 2c6e8f52

Jeffrey Morgan authored Jan 10, 2024

* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server

2c6e8f52

09 Jan, 2024 18 commits
- clean up cmake `build` directory when cross compiling macOS builds · 34344d80
  Jeffrey Morgan authored Jan 09, 2024
  
  34344d80
- Update api.md (#1878) · e868c8a5
  Robin Glauser authored Jan 09, 2024
```
Fixed assistant in the example response.
```
  e868c8a5
- calculate overhead based number of gpu devices (#1875) · c336693f
  Jeffrey Morgan authored Jan 09, 2024
  
  c336693f
- Merge pull request #1874 from dhiltgen/correct_cuda_min · e89dc1d5
  Daniel Hiltgen authored Jan 09, 2024
```
Set corret CUDA minimum compute capability version
```
  e89dc1d5
- Set corret CUDA minimum compute capability version · 1961a81f
  Daniel Hiltgen authored Jan 09, 2024
```
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
```
  1961a81f
- only build for metal on `arm64` · 8a8c7e7f
  Jeffrey Morgan authored Jan 09, 2024
  
  8a8c7e7f
- update rough cuda overhead estimate to 15% + 384MiB · 6df83e6d
  Jeffrey Morgan authored Jan 09, 2024
  
  6df83e6d
- Merge pull request #1614 from jmorganca/mxyng/fix-set-template · 62023177
  Michael Yang authored Jan 09, 2024
```
fix: set template without triple quotes
```
  62023177
- revert cuda overhead to 20% · 6164f378
  Jeffrey Morgan authored Jan 09, 2024
  
  6164f378
- use runner if cuda alloc won't fit · f387e963
  Jeffrey Morgan authored Jan 09, 2024
  
  f387e963
- add `TODO` for cuda overhead · 6566387a
  Jeffrey Morgan authored Jan 09, 2024
  
  6566387a
- update cuda overhead to 20% to fix crashes when switching between models and large context sizes · 37708931
  Jeffrey Morgan authored Jan 09, 2024
  
  37708931
- update cuda overhead to 15% or 400MiB · f6cb0a55
  Jeffrey Morgan authored Jan 08, 2024
  
  f6cb0a55
- fix build on linux · 2680078c
  Jeffrey Morgan authored Jan 08, 2024
  
  2680078c
- update overhead to 15% · f1b7e5f5
  Jeffrey Morgan authored Jan 08, 2024
  
  f1b7e5f5
- use 10% vram overhead for cuda · cb534e6a
  Jeffrey Morgan authored Jan 08, 2024
  
  cb534e6a
- better estimate scratch buffer size · 58ce2d82
  Jeffrey Morgan authored Jan 08, 2024
  
  58ce2d82
- fix windows build · 18ddf6d5
  Jeffrey Morgan authored Jan 08, 2024
  
  18ddf6d5
08 Jan, 2024 4 commits

Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt · 61e65024
Michael Yang authored Jan 08, 2024
```
fix(cmd): history in alt prompt
```
61e65024

Offload layers to GPU based on new model size estimates (#1850) · 08f1e189

Jeffrey Morgan authored Jan 08, 2024



* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

08f1e189

remove ggml automatic re-pull (#1856) · 7e8f7c83
Bruce MacDonald authored Jan 08, 2024

7e8f7c83
document response in modelfile template variables (#1428) · 3f3eb19a
Bruce MacDonald authored Jan 08, 2024

3f3eb19a

07 Jan, 2024 6 commits
- Merge pull request #1834 from dhiltgen/old_cuda · 059ae458
  Daniel Hiltgen authored Jan 07, 2024
```
Detect very old CUDA GPUs and fall back to CPU
```
  059ae458
- Merge pull request #1828 from dhiltgen/fix_llava · 6347f501
  Daniel Hiltgen authored Jan 07, 2024
```
Accept windows paths for image processing
```
  6347f501
- dont use `-Wall` in static build (#1833) · 5feec959
  Jeffrey Morgan authored Jan 07, 2024
  
  5feec959
- add `-DCMAKE_SYSTEM_NAME=Darwin` cmake flag (#1832) · dbdd50b2
  Jeffrey Morgan authored Jan 07, 2024
  
  dbdd50b2
- Detect very old CUDA GPUs and fall back to CPU · d74ce6bd
  Daniel Hiltgen authored Jan 06, 2024
```
If we try to load the CUDA library on an old GPU, it panics and crashes
the server.  This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.
```
  d74ce6bd
- Update README.md - Community Integrations - Ollama for Ruby (#1830) · 57942b46
  Guilherme Baptista authored Jan 07, 2024
  
  57942b46
06 Jan, 2024 5 commits
- Accept windows paths for image processing · e0d05b0f
  Daniel Hiltgen authored Jan 06, 2024
```
This enhances our regex to support windows style paths.  The regex will
match invalid path specifications, but we'll still validate file
existence and filter out mismatches
```
  e0d05b0f
- Merge pull request #1697 from dhiltgen/win_docs · 2d9dd14f
  Daniel Hiltgen authored Jan 05, 2024
```
Add windows native build instructions
```
  2d9dd14f
- add cuda lib path for nvidia container toolkit · 1caa5612
  Jeffrey Morgan authored Jan 05, 2024
  
  1caa5612
- Merge pull request #1797 from... · 0101e76d
  Michael Yang authored Jan 05, 2024
```
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05

fix: allow extension origins (still needs explicit listing), fixes #1686
```
  0101e76d
- fix(cmd): history in alt mode · 2ef9352b
  Michael Yang authored Jan 05, 2024
  
  2ef9352b
05 Jan, 2024 3 commits
- fix: set template without triple quotes · 5580ae24
  Michael Yang authored Jan 05, 2024
  
  5580ae24
- only pull gguf model if already exists (#1817) · 3a9f4471
  Bruce MacDonald authored Jan 05, 2024
  
  3a9f4471
- switch api for ShowRequest to use the name field (#1816) · 9c2941e6
  Patrick Devine authored Jan 05, 2024
  
  9c2941e6