Commits · 302d7fdbf3dc59ed8ef6a2e25c0b052435009cb0 · OpenDAS / ollama

09 May, 2024 7 commits
- prune partial downloads (#4272) · 302d7fdb
  Jeffrey Morgan authored May 09, 2024
  
  302d7fdb
- Fix race in shutdown logic · 3ae2f441
  Daniel Hiltgen authored May 09, 2024
```
Ensure the runners are terminated
```
  3ae2f441
- Wait for GPU free memory reporting to converge · 354ad925
  Daniel Hiltgen authored May 09, 2024
```
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
```
  354ad925
- Record more GPU information · 8727a9c1
  Daniel Hiltgen authored May 07, 2024
```
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
```
  8727a9c1
- add done_reason to the api (#4235) · cfa84b84
  Bruce MacDonald authored May 09, 2024
  
  cfa84b84
- routes: skip invalid filepaths · a7ee84fc
  Michael Yang authored May 09, 2024
  
  a7ee84fc
- use model defaults for `num_gqa`, `rope_frequency_base ` and `rope_frequency_scale` (#1983) · d5eec16d
  Jeffrey Morgan authored May 09, 2024
  
  d5eec16d
08 May, 2024 5 commits
- Add preflight OPTIONS handling and update CORS config (#4086) · cef45fea
  Bruce MacDonald authored May 08, 2024
```
* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go
```
  cef45fea
- routes: fix show llava models · b25976ae
  Michael Yang authored May 08, 2024
  
  b25976ae
- skip hidden files in list models handler (#4247) · 8cbd3e75
  Bruce MacDonald authored May 07, 2024
  
  8cbd3e75
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
- fix invalid destination error message · dc9b1111
  Bruce MacDonald authored May 07, 2024
  
  dc9b1111
07 May, 2024 1 commit
- update list handler to use model.Name · 548a7df0
  Michael Yang authored Apr 17, 2024
  
  548a7df0
06 May, 2024 12 commits
- close server on receiving signal (#4213) · 39d9d22c
  Jeffrey Morgan authored May 06, 2024
  
  39d9d22c
- close zip files · b2f00aa9
  Michael Yang authored May 06, 2024
  
  b2f00aa9
- s/DisplayLongest/String/ · f5e8b207
  Michael Yang authored May 01, 2024
  
  f5e8b207
- only quantize language models · d2454603
  Michael Yang authored Apr 25, 2024
  
  d2454603
- no iterator · 4d0d0fa3
  Michael Yang authored Apr 25, 2024
  
  4d0d0fa3
- rebase · 7ffe4573
  Michael Yang authored Apr 24, 2024
  
  7ffe4573
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- update tests · a7248f6e
  Michael Yang authored Apr 16, 2024
  
  a7248f6e
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
- Skip scheduling cancelled requests, always reload unloaded runners (#4189) · c9f98622
  Jeffrey Morgan authored May 06, 2024
  
  c9f98622
- Fix stale test logic · 0a954e50
  Daniel Hiltgen authored May 06, 2024
```
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
```
  0a954e50
- unload in critical section (#4187) · dfa2f32c
  Jeffrey Morgan authored May 05, 2024
  
  dfa2f32c
05 May, 2024 4 commits
- Centralize server config handling · f56aa200
  Daniel Hiltgen authored May 04, 2024
```
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
```
  f56aa200
- allocate a large enough kv cache for all parallel requests (#4162) · 942c9792
  Jeffrey Morgan authored May 05, 2024
  
  942c9792
- validate the format of the digest when getting the model path (#4175) · 2a21363b
  Patrick Devine authored May 05, 2024
  
  2a21363b
- Make maximum pending request configurable · 20f6c065
  Daniel Hiltgen authored May 03, 2024
```
This also bumps up the default to be 50 queued requests
instead of 10.
```
  20f6c065
03 May, 2024 1 commit
- Soften timeouts on sched unit tests · 9a32c514
  Daniel Hiltgen authored May 03, 2024
```
This gives us more headroom on the scheduler tests to tamp
down some flakes.
```
  9a32c514
01 May, 2024 6 commits
- server: target invalid · 45b6a12e
  Michael Yang authored May 01, 2024
  
  45b6a12e
- log when the waiting for the process to stop to help debug when other tasks... · 63c76368
  Mark Ward authored Apr 29, 2024
```
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
```
  63c76368
- fix runner expire during active use. Clearing the expire timer as it is used.... · f4a73d57
  Mark Ward authored Apr 28, 2024
```
fix runner expire during active use.  Clearing the expire timer as it is used.  Allowing the finish to assign an expire timer so that the runner will expire after no use.
```
  f4a73d57
- rename parser to model/file · 119589fc
  Michael Yang authored Apr 30, 2024
  
  119589fc
- use parser.Format instead of templating modelfile · 9cf0f2e9
  Michael Yang authored Apr 26, 2024
  
  9cf0f2e9
- refactor modelfile parser · c0a00f68
  Michael Yang authored Apr 22, 2024
  
  c0a00f68
30 Apr, 2024 1 commit

prompt to display and add local ollama keys to account (#3717) · 0a7fdbe5

Bruce MacDonald authored Apr 30, 2024

- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied

0a7fdbe5

29 Apr, 2024 1 commit
- fix copying model to itself (#4019) · 586672f4
  Jeffrey Morgan authored Apr 28, 2024
  
  586672f4
28 Apr, 2024 1 commit

Fix concurrency for CPU mode · d6e3b645

Daniel Hiltgen authored Apr 28, 2024

Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads. This adds that back, along with test coverage.

This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.

d6e3b645

26 Apr, 2024 1 commit
- return code `499` when user cancels request while a model is loading (#3955) · bb31def0
  Jeffrey Morgan authored Apr 26, 2024
  
  bb31def0