Commits · c2714fcbfd600c2a13efbc42bab95b49b0b4fa33 · OpenDAS / ollama

14 May, 2024 5 commits
- routes: use Manifests for ListHandler · c2714fcb
  Michael Yang authored May 06, 2024
  
  c2714fcb
- update delete handler to use model.Name · a2fc933f
  Michael Yang authored Apr 17, 2024
  
  a2fc933f
- Fixed the API endpoint /api/tags when the model list is empty. (#4424) · 798b107f
  Ryo Machida authored May 15, 2024
```
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
```
  798b107f
- don't abort when an invalid model name is used in /save (#4416) · 7ca71a6b
  Patrick Devine authored May 13, 2024
  
  7ca71a6b
- Ollama `ps` command for showing currently loaded models (#4327) · 68459888
  Patrick Devine authored May 13, 2024
  
  68459888
10 May, 2024 2 commits
- Use `--quantize` flag and `quantize` api parameter (#4321) · 6602e793
  Jeffrey Morgan authored May 10, 2024
```
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me>

---------
Co-authored-by: Michael Yang <mxyng@pm.me>
```
  6602e793
- fix(routes): skip bad manifests · e0363717
  Michael Yang authored May 09, 2024
  
  e0363717
09 May, 2024 5 commits
- Fix race in shutdown logic · 3ae2f441
  Daniel Hiltgen authored May 09, 2024
```
Ensure the runners are terminated
```
  3ae2f441
- Record more GPU information · 8727a9c1
  Daniel Hiltgen authored May 07, 2024
```
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
```
  8727a9c1
- add done_reason to the api (#4235) · cfa84b84
  Bruce MacDonald authored May 09, 2024
  
  cfa84b84
- routes: skip invalid filepaths · a7ee84fc
  Michael Yang authored May 09, 2024
  
  a7ee84fc
- use model defaults for `num_gqa`, `rope_frequency_base ` and `rope_frequency_scale` (#1983) · d5eec16d
  Jeffrey Morgan authored May 09, 2024
  
  d5eec16d
08 May, 2024 3 commits

Add preflight OPTIONS handling and update CORS config (#4086) · cef45fea

Bruce MacDonald authored May 08, 2024

* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go

cef45fea

skip hidden files in list models handler (#4247) · 8cbd3e75
Bruce MacDonald authored May 07, 2024

8cbd3e75
fix invalid destination error message · dc9b1111
Bruce MacDonald authored May 07, 2024

dc9b1111

07 May, 2024 1 commit
- update list handler to use model.Name · 548a7df0
  Michael Yang authored Apr 17, 2024
  
  548a7df0
06 May, 2024 2 commits
- close server on receiving signal (#4213) · 39d9d22c
  Jeffrey Morgan authored May 06, 2024
  
  39d9d22c
- quantize any fp16/fp32 model · 9685c345
  Michael Yang authored Apr 12, 2024
```
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
```
  9685c345
05 May, 2024 2 commits

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200

Make maximum pending request configurable · 20f6c065
Daniel Hiltgen authored May 03, 2024
```
This also bumps up the default to be 50 queued requests
instead of 10.
```
20f6c065

01 May, 2024 3 commits
- server: target invalid · 45b6a12e
  Michael Yang authored May 01, 2024
  
  45b6a12e
- rename parser to model/file · 119589fc
  Michael Yang authored Apr 30, 2024
  
  119589fc
- use parser.Format instead of templating modelfile · 9cf0f2e9
  Michael Yang authored Apr 26, 2024
  
  9cf0f2e9
26 Apr, 2024 1 commit
- return code `499` when user cancels request while a model is loading (#3955) · bb31def0
  Jeffrey Morgan authored Apr 26, 2024
  
  bb31def0
24 Apr, 2024 1 commit
- update copy to use model.Name · 592dae31
  Michael Yang authored Apr 16, 2024
  
  592dae31
23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

15 Apr, 2024 1 commit
- Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653) · a0b8a32e
  Jeffrey Morgan authored Apr 15, 2024
```
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
```
  a0b8a32e
08 Apr, 2024 2 commits
- cgo quantize · 9502e566
  Michael Yang authored Apr 05, 2024
  
  9502e566
- no blob create if already exists · e1c9a2a0
  Michael Yang authored Apr 05, 2024
  
  e1c9a2a0
02 Apr, 2024 1 commit
- Revert options as a ref in the server · 6589eb8a
  Daniel Hiltgen authored Apr 02, 2024
  
  6589eb8a
01 Apr, 2024 2 commits

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

update memory calcualtions · 91b3e4d2
Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
91b3e4d2

27 Mar, 2024 1 commit
- fix: trim quotes on OLLAMA_ORIGINS · af8a8a6b
  Michael Yang authored Mar 27, 2024
  
  af8a8a6b
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
15 Mar, 2024 1 commit

server: replace blob prefix separator from ':' to '-' (#3146) · 703684a8

Blake Mizerany authored Mar 14, 2024

This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.

703684a8

13 Mar, 2024 1 commit
- Default Keep Alive environment variable (#3094) · 47cfe58a
  Patrick Devine authored Mar 13, 2024
```
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
```
  47cfe58a
09 Mar, 2024 4 commits
- Finish unwinding idempotent payload logic · 4a5c9b80
  Daniel Hiltgen authored Mar 08, 2024
```
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
```
  4a5c9b80
- separate out `isLocalIP` · 5b3fad96
  Jeffrey Morgan authored Mar 09, 2024
  
  5b3fad96
- simplify host checks · bfec2c6e
  Jeffrey Morgan authored Mar 08, 2024
  
  bfec2c6e
- add additional allowed hosts · 5c143af7
  Jeffrey Morgan authored Mar 08, 2024
  
  5c143af7