Commits · d290e87513664be8ca3120348614d124991ccb86 · OpenDAS / ollama

16 Jul, 2024 7 commits
- add suffix support to generate endpoint · d290e875
  Michael Yang authored Jun 20, 2024
```
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
```
  d290e875
- OpenAI: /v1/embeddings compatibility (#5285) · 987dbab0
  royjhan authored Jul 16, 2024
```
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
```
  987dbab0
- fix unmarshal type errors · 5afbb60f
  Michael Yang authored Jul 16, 2024
  
  5afbb60f
- server: omit model system prompt if empty (#5717) · 4cb5d7de
  Jeffrey Morgan authored Jul 16, 2024
  
  4cb5d7de
- add chat and generate tests with mock runner · 4a565cbf
  Michael Yang authored Jul 13, 2024
  
  4a565cbf
- server: return empty slice on empty `/api/embed` request (#5713) · 7ac6d462
  Jeffrey Morgan authored Jul 15, 2024
```
* server: return empty slice on empty `/api/embed` request

* fix tests
```
  7ac6d462
- tools test · ef5136a7
  Michael Yang authored Jul 15, 2024
  
  ef5136a7
15 Jul, 2024 2 commits

tools · d02bbebb
Michael Yang authored Jun 20, 2024

d02bbebb

Introduce `/api/embed` endpoint supporting batch embedding (#5127) · b9f5e16c

royjhan authored Jul 15, 2024

* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up

b9f5e16c

14 Jul, 2024 1 commit
- remove template (#5655) · 057d3186
  Patrick Devine authored Jul 13, 2024
  
  057d3186
13 Jul, 2024 3 commits
- server: prepend system message in chat handler · f7ee0123
  jmorganca authored Jul 13, 2024
  
  f7ee0123
- server: fix `context`, `load_duration` and `total_duration` fields (#5676) · 1ed0aa8f
  Jeffrey Morgan authored Jul 13, 2024
```
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
```
  1ed0aa8f
- fix system prompt (#5662) · 22c5451f
  Michael Yang authored Jul 12, 2024
```
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
```
  22c5451f
11 Jul, 2024 3 commits
- revert embedded templates to use prompt/response · 57ec6901
  Michael Yang authored Jul 11, 2024
```
This reverts commit 19753c18.

for compat. messages will be added at a later date
```
  57ec6901
- sched: only error when over-allocating system memory (#5626) · 791650dd
  Jeffrey Morgan authored Jul 11, 2024
  
  791650dd
- add system prompt to first legacy template · 41be2809
  Michael Yang authored Jul 10, 2024
  
  41be2809
09 Jul, 2024 1 commit
- server: fix model reloads when setting `OLLAMA_NUM_PARALLEL` (#5560) · e4ff7329
  Jeffrey Morgan authored Jul 08, 2024
```
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`

* remove whitespace change

* undo some changes
```
  e4ff7329
07 Jul, 2024 1 commit
- sched: don't error if paging to disk on Windows and macOS (#5523) · 0ee87615
  Jeffrey Morgan authored Jul 06, 2024
  
  0ee87615
05 Jul, 2024 4 commits
- update named templates · fb6cbc02
  Michael Yang authored Jun 27, 2024
  
  fb6cbc02
- fix model reloading · ac7a842e
  Michael Yang authored Jul 03, 2024
```
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
```
  ac7a842e
- comments · 2c3fe1fd
  Michael Yang authored Jun 20, 2024
  
  2c3fe1fd
- update message processing · 269ed6e6
  Michael Yang authored Jun 17, 2024
  
  269ed6e6
03 Jul, 2024 3 commits

fix: use `envconfig.ModelsDir` directly (#4821) · 0d16eb31

Anatoli Babenia authored Jul 04, 2024



* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>
Co-authored-by: Maas Lalani <maas@lalani.dev>

0d16eb31

Only set default keep_alive on initial model load · 955f2a4e

Daniel Hiltgen authored Jul 02, 2024

This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.

955f2a4e

Prevent loading models larger than total memory · 3c75113e

Daniel Hiltgen authored Jul 03, 2024

Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory.  Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.

3c75113e

02 Jul, 2024 3 commits

fix generate template · 65a5040e
Michael Yang authored Jul 02, 2024

65a5040e

OpenAI: v1/completions compatibility (#5209) · d626b99b

royjhan authored Jul 02, 2024



* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Completions Endpoint

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* Rename function

* float types

* type cleanup

* cleaning

* more cleaning

* Extra test cases

* merge conflicts

* merge conflicts

* merge conflicts

* merge conflicts

* cleaning

* cleaning

---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

d626b99b

OpenAI: /v1/models and /v1/models/{model} compatibility (#5007) · 996bb1b8

royjhan authored Jul 02, 2024



* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* OpenAI: /v1/models/{model} compatibility (#5028)

* Retrieve Model

* OpenAI Delete Model

* Retrieve Middleware

* Remove Delete from Branch

* Update Test

* Middleware Test File

* Function name

* Cleanup

* Test Update

* Test Update

---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

996bb1b8

01 Jul, 2024 6 commits
- err on insecure path · 88bcd79b
  Michael Yang authored Jun 30, 2024
  
  88bcd79b
- use kvs to detect embedding models · da8e2a04
  Michael Yang authored Jun 14, 2024
  
  da8e2a04
- add capabilities · a30915bd
  Michael Yang authored Jun 11, 2024
  
  a30915bd
- rename templates to template · 58e3fff3
  Michael Yang authored Jun 10, 2024
  
  58e3fff3
- remove ManifestV2 · 3f0b309a
  Michael Yang authored Jun 10, 2024
  
  3f0b309a
- Fix case for NumCtx · cff3f44f
  Daniel Hiltgen authored Jul 01, 2024
  
  cff3f44f
27 Jun, 2024 1 commit
- zip: prevent extracting files into parent dirs (#5314) · 123a722a
  Michael Yang authored Jun 26, 2024
  
  123a722a
25 Jun, 2024 1 commit

llm: speed up gguf decoding by a lot (#5246) · cb42e607

Blake Mizerany authored Jun 24, 2024

Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.

cb42e607

21 Jun, 2024 4 commits

Sort the ps output · 642cee13

Daniel Hiltgen authored Jun 21, 2024

Provide consistent ordering for the ps command - longest duration listed first

642cee13

Disable concurrency for AMD + Windows · 9929751c

Daniel Hiltgen authored Jun 19, 2024

Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.

9929751c

Enable concurrency by default · 17b7186c

Daniel Hiltgen authored May 06, 2024

This adjusts our default settings to enable multiple models and parallel
requests to a single model. Users can still override these by the same
env var settings as before. Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s). As before, multiple models will only load
concurrently if they fully fit in VRAM.

17b7186c

fix: quantization with template · e835ef18
Michael Yang authored Jun 21, 2024

e835ef18