Commits · d05da2991245cfa0cd8da0bda476c626e26caaec · OpenDAS / ollama

18 Sep, 2024 1 commit
- server: add tool parsing support for nemotron-mini (#6849) · d05da299
  Jeffrey Morgan authored Sep 17, 2024
  
  d05da299
12 Sep, 2024 1 commit

Optimize container images for startup (#6547) · cd5c8f64

Daniel Hiltgen authored Sep 12, 2024

* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release

cd5c8f64

11 Sep, 2024 1 commit
- add "stop" command (#6739) · abed273d
  Patrick Devine authored Sep 11, 2024
  
  abed273d
05 Sep, 2024 3 commits
- Revert "Detect running in a container (#6495)" (#6662) · 9565fa64
  Daniel Hiltgen authored Sep 05, 2024
```
This reverts commit a60d9b89.
```
  9565fa64
- Detect running in a container (#6495) · a60d9b89
  Daniel Hiltgen authored Sep 05, 2024
  
  a60d9b89
- server: fix blob download when receiving a 200 response (#6656) · 6fc9d227
  Tobias Heinze authored Sep 05, 2024
  
  6fc9d227
28 Aug, 2024 2 commits
- fix(test): do not clobber models directory · e4d0a9c3
  Michael Yang authored Aug 28, 2024
  
  e4d0a9c3
- validate model path · d9d50c43
  Michael Yang authored Aug 27, 2024
  
  d9d50c43
27 Aug, 2024 2 commits
- update templates to use messages · 413ae39f
  Michael Yang authored Aug 27, 2024
  
  413ae39f
- server: clean up route names for consistency (#6524) · 47fa0839
  Jeffrey Morgan authored Aug 26, 2024
  
  47fa0839
23 Aug, 2024 1 commit
- convert safetensor adapters into GGUF (#6327) · 0c819e16
  Patrick Devine authored Aug 23, 2024
  
  0c819e16
22 Aug, 2024 1 commit

Fix embeddings memory corruption (#6467) · 90ca8417

Daniel Hiltgen authored Aug 22, 2024

* Fix embeddings memory corruption

The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.

* Fix embed integration test assumption

The token eval count has changed with recent llama.cpp bumps (0.3.5+)

90ca8417

21 Aug, 2024 1 commit
- llama3.1 · 77903ab8
  Michael Yang authored Jul 29, 2024
  
  77903ab8
19 Aug, 2024 1 commit
- server: limit upload parts to 16 (#6411) · 9fddef37
  Jeffrey Morgan authored Aug 19, 2024
  
  9fddef37
18 Aug, 2024 2 commits
- Fix white space. · 885cf450
  Richard Lyons authored Aug 18, 2024
  
  885cf450
- Reset NumCtx. · 9352eeb7
  Richard Lyons authored Aug 18, 2024
  
  9352eeb7
17 Aug, 2024 1 commit
- Override numParallel only if unset. · 0ad0e738
  Richard Lyons authored Aug 18, 2024
  
  0ad0e738
16 Aug, 2024 1 commit
- fix: chmod new layer to 0o644 when creating it · bdc4308a
  zwwhdls authored Aug 16, 2024
```
Signed-off-by: zwwhdls <zww@hdls.me>
```
  bdc4308a
15 Aug, 2024 1 commit
- only skip invalid json manifests · 3a75e74e
  Michael Yang authored Aug 15, 2024
  
  3a75e74e
14 Aug, 2024 2 commits
- skip invalid manifest files · 237dccba
  Michael Yang authored Aug 14, 2024
  
  237dccba
- fix noprune · b3f75fc8
  Michael Yang authored Aug 14, 2024
  
  b3f75fc8
13 Aug, 2024 3 commits

server: reduce max connections used in download (#6347) · 8e1050f3

Blake Mizerany authored Aug 13, 2024

The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.

8e1050f3

lint · 2697d7f5

Michael Yang authored Aug 13, 2024

- fixes printf: non-constant format string in call to fmt.Printf
- fixes SA1032: arguments have the wrong order
- disables testifylint

2697d7f5

Load Embedding Model on Empty Input (#6325) · 8b00a415
royjhan authored Aug 13, 2024
```
* load on empty input

* no load on invalid input
```
8b00a415

12 Aug, 2024 3 commits
- cmd: speed up gguf creates (#6324) · 980dd15f
  Josh authored Aug 12, 2024
  
  980dd15f
- Revert "server: speed up single gguf creates (#5898)" (#6323) · 1dc3ef3a
  Josh authored Aug 12, 2024
```
This reverts commit 8aac2243.
```
  1dc3ef3a
- server: speed up single gguf creates (#5898) · 8aac2243
  Josh authored Aug 12, 2024
  
  8aac2243
11 Aug, 2024 1 commit

server: parallelize embeddings in API web handler instead of in subprocess runner (#6220) · 15c2d8fe

Jeffrey Morgan authored Aug 11, 2024

For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.

15c2d8fe

09 Aug, 2024 1 commit
- Don't hard fail on sparse setup error · 2fa1db43
  Daniel Hiltgen authored Aug 09, 2024
```
It seems this can fail in some casees, but proceed
with the download anyway.
```
  2fa1db43
08 Aug, 2024 2 commits

server/download.go: Fix a typo in log · 7b61eba4
Jitang Lei authored Aug 08, 2024
```
Signed-off-by: Jitang Lei <leijitang@outlook.com>
```
7b61eba4

manifest: Store layers inside manifests consistently as values. · 7edaf6e7

Jesse Gross authored Aug 07, 2024

Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up
unused files (#5840)") changed the config layer stored in manifests
from a pointer to a value. This was done in order to avoid potential
nil pointer dereferences after it is deserialized from JSON in the
event that the field is missing.

This changes the Layers slice to also be stored by value. This enables
consistency in handling across the two objects.

7edaf6e7

07 Aug, 2024 3 commits

image: Clarify argument to WriteManifest is config · 97ec8cfd

Jesse Gross authored Aug 07, 2024

When creating a model the config layer is appended to the list of
layers and then the last layer is used as the config when writing the
manifest. This change directly uses the config layer to write the
manifest. There is no behavior change but it is less error prone.

97ec8cfd

manifest: Fix crash on startup when trying to clean up unused files (#5840) · 1829fb61

Jesse Gross authored Aug 05, 2024

Currently if the config field is missing in the manifest file (or
corrupted), Ollama will crash when it tries to read it. This can
happen at startup or when pulling new models.

This data is mostly just used for showing model information so we
can be tolerant of it not being present - it is not required to
run the models. Besides avoiding crashing, this also gives us the
ability to restructure the config in the future by pulling it
into the main manifest file.

1829fb61

manifest: Don't prune layers if we can't open a manifest file · 685a5353

Jesse Gross authored Aug 01, 2024

If there is an error when opening a manifest file (corrupted, permission denied, etc.)
then the referenced layers will not be included in the list of active
layers. This causes them to be deleted when pruning happens at startup
or a model is pulled.

In such a situation, we should prefer to preserve data in the hopes that
it can be recovered rather than being agressive about deletion.

685a5353

06 Aug, 2024 1 commit

Ensure sparse files on windows during download · fc85f50a

Daniel Hiltgen authored Aug 06, 2024

The file.Truncate call on windows will write the whole file
unless you set the sparse flag, leading to heavy I/O at the
beginning of download.  This should improve our
I/O behavior on windows and put less stress on the users disk.

fc85f50a

02 Aug, 2024 2 commits
- use testing tempdirs · a091fadf
  Michael Yang authored Aug 02, 2024
  
  a091fadf
- lint · b732beba
  Michael Yang authored Aug 01, 2024
  
  b732beba
01 Aug, 2024 3 commits
- Refactor and format code. · 8a9f946c
  Vyacheslav Moskalev authored Aug 02, 2024
  
  8a9f946c
- Refactor code. Remove extra variable. · 3b521054
  Vyacheslav Moskalev authored Aug 01, 2024
  
  3b521054
- Better types and naming closer to style. · b0c21658
  Vyacheslav Moskalev authored Aug 01, 2024
  
  b0c21658