Commits · de982616f1dde636e46b2cef2edd971b54ef7691 · OpenDAS / ollama

21 Sep, 2024 1 commit

runner: Set windows above normal priority (#6905) · dbba7346

Daniel Hiltgen authored Sep 21, 2024

When running the subprocess as a background service windows may
throttle, which can lead to thrashing and very poor token rate.

dbba7346

20 Sep, 2024 1 commit

Add Windows arm64 support to official builds (#5712) · d632e23f

Daniel Hiltgen authored Sep 20, 2024

* Unified arm/x86 windows installer

This adjusts the installer payloads to be architecture aware so we can cary
both amd64 and arm64 binaries in the installer, and install only the applicable
architecture at install time.

* Include arm64 in official windows build

* Harden schedule test for slow windows timers

This test seems to be a bit flaky on windows, so give it more time to converge

d632e23f

18 Sep, 2024 1 commit
- llm: add solar pro (preview) (#6846) · 504a410f
  Michael Yang authored Sep 17, 2024
  
  504a410f
17 Sep, 2024 1 commit

make patches git am-able · 7bd7b027

Michael Yang authored Sep 16, 2024

raw diffs can be applied using `git apply` but not with `git am`. git
patches, e.g. through `git format-patch` are both apply-able and am-able

7bd7b027

13 Sep, 2024 1 commit
- Fix incremental builds on linux (#6780) · 56b9af33
  Daniel Hiltgen authored Sep 13, 2024
```
scripts: fix incremental builds on linux or similar
```
  56b9af33
12 Sep, 2024 2 commits

Use GOARCH for build dirs (#6779) · fda0d3be
Daniel Hiltgen authored Sep 12, 2024
```
Corrects x86_64 vs amd64 discrepancy
```
fda0d3be

Optimize container images for startup (#6547) · cd5c8f64

Daniel Hiltgen authored Sep 12, 2024

* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release

cd5c8f64

11 Sep, 2024 1 commit

runner: Flush pending responses before returning · 93ac3760

Jesse Gross authored Sep 11, 2024

If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.

Fixes #6707

93ac3760

10 Sep, 2024 1 commit

Quiet down dockers new lint warnings (#6716) · 4a8069f9

Daniel Hiltgen authored Sep 09, 2024

* Quiet down dockers new lint warnings

Docker has recently added lint warnings to build.  This cleans up those warnings.

* Fix go lint regression

4a8069f9

06 Sep, 2024 1 commit

Improve logging on GPU too small (#6666) · 56318fb3

Daniel Hiltgen authored Sep 06, 2024

When we determine a GPU is too small for any layers, it's not always clear why.
This will help troubleshoot those scenarios.

56318fb3

05 Sep, 2024 2 commits

llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT · 67190976
Daniel Hiltgen authored Sep 05, 2024
```
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
```
67190976

Introduce GPU Overhead env var (#5922) · b05c9e83

Daniel Hiltgen authored Sep 05, 2024

Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs

b05c9e83

04 Sep, 2024 2 commits
- llm: use json.hpp from common (#6642) · bbe7b96d
  Pascal Patry authored Sep 04, 2024
  
  bbe7b96d
- llm: update llama.cpp commit to 8962422 (#6618) · 5e2653f9
  Jeffrey Morgan authored Sep 03, 2024
  
  5e2653f9
03 Sep, 2024 2 commits

Log system memory at info (#6617) · 037a4d10

Daniel Hiltgen authored Sep 03, 2024

On systems with low system memory, we can hit allocation failures that are difficult to diagnose
without debug logs.  This will make it easier to spot.

037a4d10

Fix sprintf to snprintf (#5664) · 94fff580

FellowTraveler authored Sep 03, 2024

/Users/au/src/ollama/llm/ext_server/server.cpp:289:9: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.

94fff580

29 Aug, 2024 1 commit
- remove any unneeded build artifacts · 11018196
  Michael Yang authored Aug 29, 2024
  
  11018196
27 Aug, 2024 1 commit
- llm: fix typo in comment (#6530) · 397cae79
  Sean Khatiri authored Aug 27, 2024
  
  397cae79
25 Aug, 2024 1 commit

Only enable numa on CPUs (#6484) · 0f92b19b

Daniel Hiltgen authored Aug 24, 2024

The numa flag may be having a performance impact on multi-socket systems with GPU loads

0f92b19b

23 Aug, 2024 2 commits
- convert safetensor adapters into GGUF (#6327) · 0c819e16
  Patrick Devine authored Aug 23, 2024
  
  0c819e16
- llm: Align cmake define for cuda no peer copy (#6455) · 0b03b9c3
  Daniel Hiltgen authored Aug 23, 2024
```
Define changed recently and this slipped through the cracks with the old
name.
```
  0b03b9c3
22 Aug, 2024 1 commit

Fix embeddings memory corruption (#6467) · 90ca8417

Daniel Hiltgen authored Aug 22, 2024

* Fix embeddings memory corruption

The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.

* Fix embed integration test assumption

The token eval count has changed with recent llama.cpp bumps (0.3.5+)

90ca8417

21 Aug, 2024 1 commit
- llama3.1 · 77903ab8
  Michael Yang authored Jul 29, 2024
  
  77903ab8
20 Aug, 2024 1 commit

Split rocm back out of bundle (#6432) · a017cf2f

Daniel Hiltgen authored Aug 20, 2024

We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.

a017cf2f

19 Aug, 2024 6 commits
- Review comments · f9e31da9
  Daniel Hiltgen authored Aug 15, 2024
  
  f9e31da9
- Adjust layout to bin+lib/ollama · 88bb9e33
  Daniel Hiltgen authored Aug 14, 2024
  
  88bb9e33
- Add windows cuda v12 + v11 support · 927d98a6
  Daniel Hiltgen authored Jul 12, 2024
  
  927d98a6
- Add Jetson cuda variants for arm · d470ebe7
  Daniel Hiltgen authored May 30, 2024
```
This adds new variants for arm64 specific to Jetson platforms
```
  d470ebe7
- Wire up ccache and pigz in the docker based build · c7bcb003
  Daniel Hiltgen authored Aug 09, 2024
```
This should help speed things up a little
```
  c7bcb003
- Refactor linux packaging · 74d45f01
  Daniel Hiltgen authored Jul 08, 2024
```
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
```
  74d45f01
12 Aug, 2024 1 commit
- add conversion for microsoft phi 3 mini/medium 4k, 128 · 6ffb5cb0
  Michael Yang authored Jun 03, 2024
  
  6ffb5cb0
11 Aug, 2024 2 commits

server: parallelize embeddings in API web handler instead of in subprocess runner (#6220) · 15c2d8fe

Jeffrey Morgan authored Aug 11, 2024

For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.

15c2d8fe

llm: prevent loading too large models on windows (#5926) · 25906d72

Daniel Hiltgen authored Aug 11, 2024

Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.

25906d72

08 Aug, 2024 1 commit
- llama3.1 memory · 2003d601
  Michael Yang authored Aug 08, 2024
  
  2003d601
07 Aug, 2024 1 commit
- llm: reserve required number of slots for embeddings (#6219) · de4fc297
  Jeffrey Morgan authored Aug 06, 2024
  
  de4fc297
06 Aug, 2024 1 commit
- update llama.cpp submodule to `1e6f6554` (#6208) · e04c7012
  Jeffrey Morgan authored Aug 06, 2024
  
  e04c7012
05 Aug, 2024 4 commits
- sort batch results (#6189) · 86b907f8
  royjhan authored Aug 05, 2024
  
  86b907f8
- Implement linux NUMA detection · f457d634
  Daniel Hiltgen authored Aug 05, 2024
```
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
```
  f457d634
- Catch one more error log · 04210aa6
  Daniel Hiltgen authored Aug 05, 2024
  
  04210aa6
- line feed · 6a073447
  Michael Yang authored Aug 04, 2024
  
  6a073447