Commits · d290e87513664be8ca3120348614d124991ccb86 · OpenDAS / ollama

16 Jul, 2024 1 commit

add suffix support to generate endpoint · d290e875

Michael Yang authored Jun 20, 2024

this change is triggered by the presence of "suffix", particularly
useful for code completion tasks

d290e875

15 Jul, 2024 1 commit
- tools · d02bbebb
  Michael Yang authored Jun 20, 2024
  
  d02bbebb
05 Jul, 2024 1 commit
- update message processing · 269ed6e6
  Michael Yang authored Jun 17, 2024
  
  269ed6e6
01 Jul, 2024 4 commits
- use kvs to detect embedding models · da8e2a04
  Michael Yang authored Jun 14, 2024
  
  da8e2a04
- add capabilities · a30915bd
  Michael Yang authored Jun 11, 2024
  
  a30915bd
- rename templates to template · 58e3fff3
  Michael Yang authored Jun 10, 2024
  
  58e3fff3
- remove ManifestV2 · 3f0b309a
  Michael Yang authored Jun 10, 2024
  
  3f0b309a
25 Jun, 2024 1 commit

llm: speed up gguf decoding by a lot (#5246) · cb42e607

Blake Mizerany authored Jun 24, 2024

Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.

cb42e607

21 Jun, 2024 1 commit
- fix: quantization with template · e835ef18
  Michael Yang authored Jun 21, 2024
  
  e835ef18
13 Jun, 2024 1 commit
- server: remove jwt decoding error (#5027) · 1fd236d1
  Jeffrey Morgan authored Jun 13, 2024
  
  1fd236d1
12 Jun, 2024 1 commit

fix: multiple templates when creating from model · c16f8af9

Michael Yang authored Jun 12, 2024

multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template

c16f8af9

07 Jun, 2024 1 commit
- fix create model when template detection errors · 030e765e
  Michael Yang authored Jun 07, 2024
  
  030e765e
06 Jun, 2024 1 commit
- detect chat template from KV · 9b6c2e6e
  Michael Yang authored Jun 03, 2024
  
  9b6c2e6e
05 Jun, 2024 1 commit
- server: skip blob verification for already verified blobs · de5beb06
  Blake Mizerany authored May 24, 2024
  
  de5beb06
04 Jun, 2024 5 commits
- update create handler to use model.Name · d61ef8b9
  Michael Yang authored May 08, 2024
  
  d61ef8b9
- gofmt, goimports · 6297f856
  Michael Yang authored Jun 04, 2024
  
  6297f856
- lint · e40145a3
  Michael Yang authored May 21, 2024
  
  e40145a3
- nolintlint · 8ffb5174
  Michael Yang authored May 21, 2024
  
  8ffb5174
- replace x/exp/slices with slices · 04f3c12b
  Michael Yang authored May 21, 2024
  
  04f3c12b
24 May, 2024 1 commit
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
20 May, 2024 4 commits
- fix quantize file types · 807d0927
  Michael Yang authored May 17, 2024
  
  807d0927
- tidy intermediate blobs · f36f1d6b
  Michael Yang authored May 20, 2024
  
  f36f1d6b
- cache and reuse intermediate blobs · 3520c0e4
  Michael Yang authored May 10, 2024
```
particularly useful for zipfiles and f16s
```
  3520c0e4
- Move the parser back + handle utf16 files (#4533) · ccdf0b2a
  Patrick Devine authored May 20, 2024
  
  ccdf0b2a
14 May, 2024 1 commit
- remove DeleteModel · b8772a35
  Michael Yang authored May 08, 2024
  
  b8772a35
09 May, 2024 1 commit
- prune partial downloads (#4272) · 302d7fdb
  Jeffrey Morgan authored May 09, 2024
  
  302d7fdb
08 May, 2024 2 commits
- routes: fix show llava models · b25976ae
  Michael Yang authored May 08, 2024
  
  b25976ae
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
07 May, 2024 1 commit
- update list handler to use model.Name · 548a7df0
  Michael Yang authored Apr 17, 2024
  
  548a7df0
06 May, 2024 5 commits
- only quantize language models · d2454603
  Michael Yang authored Apr 25, 2024
  
  d2454603
- no iterator · 4d0d0fa3
  Michael Yang authored Apr 25, 2024
  
  4d0d0fa3
- rebase · 7ffe4573
  Michael Yang authored Apr 24, 2024
  
  7ffe4573
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
05 May, 2024 1 commit

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200

01 May, 2024 2 commits
- rename parser to model/file · 119589fc
  Michael Yang authored Apr 30, 2024
  
  119589fc
- use parser.Format instead of templating modelfile · 9cf0f2e9
  Michael Yang authored Apr 26, 2024
  
  9cf0f2e9
30 Apr, 2024 1 commit

prompt to display and add local ollama keys to account (#3717) · 0a7fdbe5

Bruce MacDonald authored Apr 30, 2024

- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied

0a7fdbe5

29 Apr, 2024 1 commit
- fix copying model to itself (#4019) · 586672f4
  Jeffrey Morgan authored Apr 28, 2024
  
  586672f4
26 Apr, 2024 1 commit
- types/model: overhaul Name and Digest types (#3924) · 37f9c8ad
  Blake Mizerany authored Apr 26, 2024
  
  37f9c8ad