Commits · b2f00aa9771d44a1423a2e2f23c5218f1bbc834d · OpenDAS / ollama

06 May, 2024 11 commits
- close zip files · b2f00aa9
  Michael Yang authored May 06, 2024
  
  b2f00aa9
- s/DisplayLongest/String/ · f5e8b207
  Michael Yang authored May 01, 2024
  
  f5e8b207
- only quantize language models · d2454603
  Michael Yang authored Apr 25, 2024
  
  d2454603
- no iterator · 4d0d0fa3
  Michael Yang authored Apr 25, 2024
  
  4d0d0fa3
- rebase · 7ffe4573
  Michael Yang authored Apr 24, 2024
  
  7ffe4573
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- update tests · a7248f6e
  Michael Yang authored Apr 16, 2024
  
  a7248f6e
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
- Skip scheduling cancelled requests, always reload unloaded runners (#4189) · c9f98622
  Jeffrey Morgan authored May 06, 2024
  
  c9f98622
- Fix stale test logic · 0a954e50
  Daniel Hiltgen authored May 06, 2024
```
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
```
  0a954e50
- unload in critical section (#4187) · dfa2f32c
  Jeffrey Morgan authored May 05, 2024
  
  dfa2f32c
05 May, 2024 4 commits
- Centralize server config handling · f56aa200
  Daniel Hiltgen authored May 04, 2024
```
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
```
  f56aa200
- allocate a large enough kv cache for all parallel requests (#4162) · 942c9792
  Jeffrey Morgan authored May 05, 2024
  
  942c9792
- validate the format of the digest when getting the model path (#4175) · 2a21363b
  Patrick Devine authored May 05, 2024
  
  2a21363b
- Make maximum pending request configurable · 20f6c065
  Daniel Hiltgen authored May 03, 2024
```
This also bumps up the default to be 50 queued requests
instead of 10.
```
  20f6c065
03 May, 2024 1 commit
- Soften timeouts on sched unit tests · 9a32c514
  Daniel Hiltgen authored May 03, 2024
```
This gives us more headroom on the scheduler tests to tamp
down some flakes.
```
  9a32c514
01 May, 2024 6 commits
- server: target invalid · 45b6a12e
  Michael Yang authored May 01, 2024
  
  45b6a12e
- log when the waiting for the process to stop to help debug when other tasks... · 63c76368
  Mark Ward authored Apr 29, 2024
```
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
```
  63c76368
- fix runner expire during active use. Clearing the expire timer as it is used.... · f4a73d57
  Mark Ward authored Apr 28, 2024
```
fix runner expire during active use.  Clearing the expire timer as it is used.  Allowing the finish to assign an expire timer so that the runner will expire after no use.
```
  f4a73d57
- rename parser to model/file · 119589fc
  Michael Yang authored Apr 30, 2024
  
  119589fc
- use parser.Format instead of templating modelfile · 9cf0f2e9
  Michael Yang authored Apr 26, 2024
  
  9cf0f2e9
- refactor modelfile parser · c0a00f68
  Michael Yang authored Apr 22, 2024
  
  c0a00f68
30 Apr, 2024 1 commit

prompt to display and add local ollama keys to account (#3717) · 0a7fdbe5

Bruce MacDonald authored Apr 30, 2024

- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied

0a7fdbe5

29 Apr, 2024 1 commit
- fix copying model to itself (#4019) · 586672f4
  Jeffrey Morgan authored Apr 28, 2024
  
  586672f4
28 Apr, 2024 1 commit

Fix concurrency for CPU mode · d6e3b645

Daniel Hiltgen authored Apr 28, 2024

Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads. This adds that back, along with test coverage.

This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.

d6e3b645

26 Apr, 2024 2 commits
- return code `499` when user cancels request while a model is loading (#3955) · bb31def0
  Jeffrey Morgan authored Apr 26, 2024
  
  bb31def0
- types/model: overhaul Name and Digest types (#3924) · 37f9c8ad
  Blake Mizerany authored Apr 26, 2024
  
  37f9c8ad
25 Apr, 2024 2 commits
- Reload model if `num_gpu` changes (#3920) · 00b0699c
  Jeffrey Morgan authored Apr 25, 2024
```
* reload model if `num_gpu` changes

* dont reload on -1

* fix tests
```
  00b0699c
- Adjust context size for parallelism · b123be5b
  Daniel Hiltgen authored Apr 25, 2024
  
  b123be5b
24 Apr, 2024 4 commits
- Restructure loading conditional chain · 36a6dacc
  Bryce Reitano authored Apr 24, 2024
  
  36a6dacc
- Provide variable ggml for TestLoad · ceb0e26e
  Bryce Reitano authored Apr 24, 2024
  
  ceb0e26e
- Move ggml loading to when we attempt fitting · 284e02be
  Bryce Reitano authored Apr 24, 2024
  
  284e02be
- update copy to use model.Name · 592dae31
  Michael Yang authored Apr 16, 2024
  
  592dae31
23 Apr, 2024 2 commits

Harden sched TestLoad · d8851cb7
Daniel Hiltgen authored Apr 23, 2024
```
Give the go routine a moment to deliver the expired event
```
d8851cb7

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

21 Apr, 2024 1 commit
- chore: use errors.New to replace fmt.Errorf will much better (#3789) · 62be2050
  Cheng authored Apr 21, 2024
  
  62be2050
15 Apr, 2024 2 commits
- Add llama2 / torch models for `ollama create` (#3607) · 9f8691c6
  Patrick Devine authored Apr 15, 2024
  
  9f8691c6
- Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653) · a0b8a32e
  Jeffrey Morgan authored Apr 15, 2024
```
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
```
  a0b8a32e
10 Apr, 2024 1 commit

server: provide helpful workaround hint when stalling on pull (#3584) · a7b431e7

Blake Mizerany authored Apr 10, 2024

This is a quick fix to help users who are stuck on the "pull" step at
99%.

In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736.

a7b431e7

08 Apr, 2024 1 commit
- cgo quantize · 9502e566
  Michael Yang authored Apr 05, 2024
  
  9502e566