- 10 May, 2024 7 commits
-
-
Michael Yang authored
add phi2 mem
-
Michael Yang authored
-
Jeffrey Morgan authored
* dont clamp ctx size in `PredictServerFit` * minimum 4 context * remove context warning
-
Daniel Hiltgen authored
Bump VRAM buffer back up
-
Daniel Hiltgen authored
Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.
-
Michael Yang authored
-
Michael Yang authored
-
- 09 May, 2024 27 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
fix typo
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Michael Yang authored
only forward some env vars
-
Michael Yang authored
log clean up
-
Daniel Hiltgen authored
Fix race in shutdown logic
-
Daniel Hiltgen authored
Ensure the runners are terminated
-
Zander Lewis authored
-
Michael Yang authored
-
Daniel Hiltgen authored
Wait for GPU free memory reporting to converge
-
Daniel Hiltgen authored
The GPU drivers take a while to update their free memory reporting, so we need to wait until the values converge with what we're expecting before proceeding to start another runner in order to get an accurate picture.
-
Michael Yang authored
-
Daniel Hiltgen authored
Record more GPU information
-
Daniel Hiltgen authored
This cleans up the logging for GPU discovery a bit, and can serve as a foundation to report GPU information in a future UX.
-
Daniel Hiltgen authored
Harden subprocess reaping
-
Bruce MacDonald authored
-
Michael Yang authored
routes: skip invalid filepaths
-
Michael Yang authored
-
Daniel Hiltgen authored
-
tusharhero authored
-
J S authored
-
Daniel Hiltgen authored
Doc container usage and workaround for nvidia errors
-
Daniel Hiltgen authored
-
Jeffrey Morgan authored
-
Carlos Gamez authored
Updated sample code as per warning notification from the package maintainers
-
jmorganca authored
-
- 08 May, 2024 6 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Add GPU usage
-
Daniel Hiltgen authored
Detect noexec and report a better error
-
Daniel Hiltgen authored
This records more GPU usage information for eventual UX inclusion.
-
Bruce MacDonald authored
* Add preflight OPTIONS handling and update CORS config - Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling. - Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set. * allow auth, content-type, and user-agent headers * Update routes.go
-
Michael Yang authored
routes: fix show llava models
-