- 09 May, 2024 7 commits
-
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Ensure the runners are terminated
-
Daniel Hiltgen authored
The GPU drivers take a while to update their free memory reporting, so we need to wait until the values converge with what we're expecting before proceeding to start another runner in order to get an accurate picture.
-
Daniel Hiltgen authored
This cleans up the logging for GPU discovery a bit, and can serve as a foundation to report GPU information in a future UX.
-
Bruce MacDonald authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 08 May, 2024 5 commits
-
-
Bruce MacDonald authored
* Add preflight OPTIONS handling and update CORS config - Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling. - Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set. * allow auth, content-type, and user-agent headers * Update routes.go
-
Michael Yang authored
-
Bruce MacDonald authored
-
Michael Yang authored
-
Bruce MacDonald authored
-
- 07 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 12 commits
-
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32} -
Jeffrey Morgan authored
-
Daniel Hiltgen authored
The model processing was recently changed to be deferred but this test scenario hadn't been adjusted for that change in behavior.
-
Jeffrey Morgan authored
-
- 05 May, 2024 4 commits
-
-
Daniel Hiltgen authored
This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs
-
Jeffrey Morgan authored
-
Patrick Devine authored
-
Daniel Hiltgen authored
This also bumps up the default to be 50 queued requests instead of 10.
-
- 03 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This gives us more headroom on the scheduler tests to tamp down some flakes.
-
- 01 May, 2024 6 commits
-
-
Michael Yang authored
-
Mark Ward authored
log when the waiting for the process to stop to help debug when other tasks execute during this wait. expire timer clear the timer reference because it will not be reused. close will clean up expireTimer if calling code has not already done this.
-
Mark Ward authored
fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 30 Apr, 2024 1 commit
-
-
Bruce MacDonald authored
- return descriptive error messages when unauthorized to create blob or push a model - display the local public key associated with the request that was denied
-
- 29 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 28 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
Prior refactoring passes accidentally removed the logic to bypass VRAM checks for CPU loads. This adds that back, along with test coverage. This also fixes loaded map access in the unit test to be behind the mutex which was likely the cause of various flakes in the tests.
-
- 26 Apr, 2024 1 commit
-
-
Jeffrey Morgan authored
-