Commits · 3b19cdba2a090772b2e886dbfbf712992fafe0cd · OpenDAS / ollama

19 Aug, 2024 4 commits

Add windows cuda v12 + v11 support · 927d98a6
Daniel Hiltgen authored Jul 12, 2024

927d98a6
Add Jetson cuda variants for arm · d470ebe7
Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
d470ebe7
Wire up ccache and pigz in the docker based build · c7bcb003
Daniel Hiltgen authored Aug 09, 2024
```
This should help speed things up a little
```
c7bcb003

Daniel Hiltgen authored Jul 08, 2024

This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.

74d45f01

12 Aug, 2024 1 commit
- add conversion for microsoft phi 3 mini/medium 4k, 128 · 6ffb5cb0
  Michael Yang authored Jun 03, 2024
  
  6ffb5cb0
11 Aug, 2024 2 commits

server: parallelize embeddings in API web handler instead of in subprocess runner (#6220) · 15c2d8fe

Jeffrey Morgan authored Aug 11, 2024

For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.

15c2d8fe

llm: prevent loading too large models on windows (#5926) · 25906d72

Daniel Hiltgen authored Aug 11, 2024

Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.

25906d72

07 Aug, 2024 1 commit
- llm: reserve required number of slots for embeddings (#6219) · de4fc297
  Jeffrey Morgan authored Aug 06, 2024
  
  de4fc297
06 Aug, 2024 1 commit
- update llama.cpp submodule to `1e6f6554` (#6208) · e04c7012
  Jeffrey Morgan authored Aug 06, 2024
  
  e04c7012
05 Aug, 2024 4 commits
- sort batch results (#6189) · 86b907f8
  royjhan authored Aug 05, 2024
  
  86b907f8
- Implement linux NUMA detection · f457d634
  Daniel Hiltgen authored Aug 05, 2024
```
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
```
  f457d634
- Catch one more error log · 04210aa6
  Daniel Hiltgen authored Aug 05, 2024
  
  04210aa6
- line feed · 6a073447
  Michael Yang authored Aug 04, 2024
  
  6a073447
02 Aug, 2024 1 commit
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
31 Jul, 2024 5 commits
- comments · df993fa3
  Michael Yang authored Jul 08, 2024
  
  df993fa3
- refactor convert · 5e9db9fb
  Michael Yang authored May 31, 2024
  
  5e9db9fb
- patches: phi3 default sliding window attention · 0f3271db
  Michael Yang authored Jul 31, 2024
  
  0f3271db
- update convert test to check result data · 6b252918
  Michael Yang authored Jun 03, 2024
  
  6b252918
- patch gemma support · afa8d6e9
  jmorganca authored Jul 30, 2024
  
  afa8d6e9
30 Jul, 2024 1 commit

Add Metrics to `api\embed` response (#5709) · 1b44d873

royjhan authored Jul 30, 2024

* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics

1b44d873

29 Jul, 2024 1 commit
- update llama.cpp submodule to `6eeaeba1` (#6039) · 68ee42f9
  Jeffrey Morgan authored Jul 29, 2024
  
  68ee42f9
27 Jul, 2024 1 commit
- feat: add support for min_p (resolve #1142) (#1825) · f3d7a481
  Tibor Schmidt authored Jul 27, 2024
  
  f3d7a481
26 Jul, 2024 1 commit
- llm: keep patch for llama 3 rope factors (#5987) · f2a96c7d
  Jeffrey Morgan authored Jul 26, 2024
  
  f2a96c7d
25 Jul, 2024 1 commit
- Revert "llm(llama): pass rope factors (#5924)" (#5963) · bbf8f102
  Jeffrey Morgan authored Jul 25, 2024
```
This reverts commit bb46bbcf.
```
  bbf8f102
24 Jul, 2024 1 commit
- llm(llama): pass rope factors (#5924) · bb46bbcf
  Michael Yang authored Jul 24, 2024
  
  bb46bbcf
22 Jul, 2024 6 commits

Enable windows error dialog for subprocess startup · e12fff88

Daniel Hiltgen authored Jul 15, 2024

Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it. Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.

e12fff88

string · e2c3f6b3
Michael Yang authored Jul 03, 2024

e2c3f6b3
bool · 55cd3ddc
Michael Yang authored Jul 03, 2024

55cd3ddc
rfc: dynamic environ lookup · 35b89b2e
Michael Yang authored Jul 03, 2024

35b89b2e
Update llama.cpp submodule commit to `d94c6e0c` (#5805) · f8fedbda
Jeffrey Morgan authored Jul 22, 2024

f8fedbda

Refine error reporting for subprocess crash · a3c20e3f

Daniel Hiltgen authored Jul 22, 2024

On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.

a3c20e3f

21 Jul, 2024 1 commit
- llm: consider `head_dim` in llama arch (#5817) · 5534f2cc
  Jeffrey Morgan authored Jul 20, 2024
  
  5534f2cc
20 Jul, 2024 2 commits

Adjust windows ROCm discovery · 283948c8

Daniel Hiltgen authored Jul 19, 2024

The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.

283948c8

add patch for tekken (#5807) · 1475eab9
Jeffrey Morgan authored Jul 20, 2024

1475eab9

16 Jul, 2024 1 commit
- add chat and generate tests with mock runner · 4a565cbf
  Michael Yang authored Jul 13, 2024
  
  4a565cbf
15 Jul, 2024 1 commit

Introduce `/api/embed` endpoint supporting batch embedding (#5127) · b9f5e16c

royjhan authored Jul 15, 2024

* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up

b9f5e16c

13 Jul, 2024 1 commit
- llm: looser checks for minimum memory (#5677) · ef98803d
  Jeffrey Morgan authored Jul 13, 2024
  
  ef98803d
12 Jul, 2024 1 commit
- fix: quant err message (#5616) · 10e76882
  Josh authored Jul 11, 2024
  
  10e76882
11 Jul, 2024 2 commits

llm: avoid loading model if system memory is too small (#5637) · c4cf8ad5

Jeffrey Morgan authored Jul 11, 2024



* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

c4cf8ad5

sched: only error when over-allocating system memory (#5626) · 791650dd
Jeffrey Morgan authored Jul 11, 2024

791650dd