- 04 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 24 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 20 May, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
particularly useful for zipfiles and f16s
-
Patrick Devine authored
-
- 16 May, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 15 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 14 May, 2024 7 commits
-
-
Patrick Devine authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Ryo Machida authored
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty. * Update server/routes.go --------- Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Patrick Devine authored
-
Patrick Devine authored
-
- 10 May, 2024 2 commits
-
-
Jeffrey Morgan authored
* rename `--quantization` to `--quantize` * backwards * Update api/types.go Co-authored-by:
Michael Yang <mxyng@pm.me> --------- Co-authored-by:
Michael Yang <mxyng@pm.me>
-
Michael Yang authored
-
- 09 May, 2024 5 commits
-
-
Daniel Hiltgen authored
Ensure the runners are terminated
-
Daniel Hiltgen authored
This cleans up the logging for GPU discovery a bit, and can serve as a foundation to report GPU information in a future UX.
-
Bruce MacDonald authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 08 May, 2024 3 commits
-
-
Bruce MacDonald authored
* Add preflight OPTIONS handling and update CORS config - Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling. - Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set. * allow auth, content-type, and user-agent headers * Update routes.go
-
Bruce MacDonald authored
-
Bruce MacDonald authored
-
- 07 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}
-
- 05 May, 2024 2 commits
-
-
Daniel Hiltgen authored
This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs
-
Daniel Hiltgen authored
This also bumps up the default to be 50 queued requests instead of 10.
-
- 01 May, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 26 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 24 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 15 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading * use `unload` in signal handler
-
- 08 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 02 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 01 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-