Commits · a2fc933fed2e05266aff324deb2d35933563a575 · OpenDAS / ollama

"vscode:/vscode.git/clone" did not exist on "7d87dcec6a55f9615a683756a684897b02f2755a"

14 May, 2024 5 commits
- update delete handler to use model.Name · a2fc933f
  Michael Yang authored Apr 17, 2024
  
  a2fc933f
- Fixed the API endpoint /api/tags when the model list is empty. (#4424) · 798b107f
  Ryo Machida authored May 15, 2024
```
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
```
  798b107f
- Remove VRAM convergence check for windows · ec231a79
  Daniel Hiltgen authored May 14, 2024
```
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
```
  ec231a79
- don't abort when an invalid model name is used in /save (#4416) · 7ca71a6b
  Patrick Devine authored May 13, 2024
  
  7ca71a6b
- Ollama `ps` command for showing currently loaded models (#4327) · 68459888
  Patrick Devine authored May 13, 2024
  
  68459888
12 May, 2024 2 commits
- Revert "use post token" · 4ec7445a
  jmorganca authored May 11, 2024
```
This reverts commit 0fec3525.
```
  4ec7445a
- use post token · 0fec3525
  Michael Yang authored May 11, 2024
  
  0fec3525
10 May, 2024 5 commits
- Fix envconfig unit test · 824ee544
  Daniel Hiltgen authored May 10, 2024
  
  824ee544
- Always use the sorted list of GPUs · 4142c3ef
  Daniel Hiltgen authored May 10, 2024
```
Make sure the first GPU has the most free space
```
  4142c3ef
- Use `--quantize` flag and `quantize` api parameter (#4321) · 6602e793
  Jeffrey Morgan authored May 10, 2024
```
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me>

---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  6602e793
- Don't clamp ctx size in `PredictServerFit` (#4317) · bb6fd022
  Jeffrey Morgan authored May 10, 2024
```
* dont clamp ctx size in `PredictServerFit`

* minimum 4 context

* remove context warning
```
  bb6fd022
- fix(routes): skip bad manifests · e0363717
  Michael Yang authored May 09, 2024
  
  e0363717
09 May, 2024 7 commits
- prune partial downloads (#4272) · 302d7fdb
  Jeffrey Morgan authored May 09, 2024
  
  302d7fdb
- Fix race in shutdown logic · 3ae2f441
  Daniel Hiltgen authored May 09, 2024
```
Ensure the runners are terminated
```
  3ae2f441
- Wait for GPU free memory reporting to converge · 354ad925
  Daniel Hiltgen authored May 09, 2024
```
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
```
  354ad925
- Record more GPU information · 8727a9c1
  Daniel Hiltgen authored May 07, 2024
```
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
```
  8727a9c1
- add done_reason to the api (#4235) · cfa84b84
  Bruce MacDonald authored May 09, 2024
  
  cfa84b84
- routes: skip invalid filepaths · a7ee84fc
  Michael Yang authored May 09, 2024
  
  a7ee84fc
- use model defaults for `num_gqa`, `rope_frequency_base ` and `rope_frequency_scale` (#1983) · d5eec16d
  Jeffrey Morgan authored May 09, 2024
  
  d5eec16d
08 May, 2024 5 commits
- Add preflight OPTIONS handling and update CORS config (#4086) · cef45fea
  Bruce MacDonald authored May 08, 2024
```
* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go
```
  cef45fea
- routes: fix show llava models · b25976ae
  Michael Yang authored May 08, 2024
  
  b25976ae
- skip hidden files in list models handler (#4247) · 8cbd3e75
  Bruce MacDonald authored May 07, 2024
  
  8cbd3e75
- skip if same quantization · eeb69526
  Michael Yang authored May 07, 2024
  
  eeb69526
- fix invalid destination error message · dc9b1111
  Bruce MacDonald authored May 07, 2024
  
  dc9b1111
07 May, 2024 1 commit
- update list handler to use model.Name · 548a7df0
  Michael Yang authored Apr 17, 2024
  
  548a7df0
06 May, 2024 12 commits
- close server on receiving signal (#4213) · 39d9d22c
  Jeffrey Morgan authored May 06, 2024
  
  39d9d22c
- close zip files · b2f00aa9
  Michael Yang authored May 06, 2024
  
  b2f00aa9
- s/DisplayLongest/String/ · f5e8b207
  Michael Yang authored May 01, 2024
  
  f5e8b207
- only quantize language models · d2454603
  Michael Yang authored Apr 25, 2024
  
  d2454603
- no iterator · 4d0d0fa3
  Michael Yang authored Apr 25, 2024
  
  4d0d0fa3
- rebase · 7ffe4573
  Michael Yang authored Apr 24, 2024
  
  7ffe4573
- comments · 01811c17
  Michael Yang authored Apr 23, 2024
  
  01811c17
- update tests · a7248f6e
  Michael Yang authored Apr 16, 2024
  
  a7248f6e
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
- Skip scheduling cancelled requests, always reload unloaded runners (#4189) · c9f98622
  Jeffrey Morgan authored May 06, 2024
  
  c9f98622
- Fix stale test logic · 0a954e50
  Daniel Hiltgen authored May 06, 2024
```
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
```
  0a954e50
- unload in critical section (#4187) · dfa2f32c
  Jeffrey Morgan authored May 05, 2024
  
  dfa2f32c
05 May, 2024 3 commits
- Centralize server config handling · f56aa200
  Daniel Hiltgen authored May 04, 2024
```
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
```
  f56aa200
- allocate a large enough kv cache for all parallel requests (#4162) · 942c9792
  Jeffrey Morgan authored May 05, 2024
  
  942c9792
- validate the format of the digest when getting the model path (#4175) · 2a21363b
  Patrick Devine authored May 05, 2024
  
  2a21363b