- 14 May, 2024 11 commits
-
-
Patrick Devine authored
-
Patrick Devine authored
-
Michael Yang authored
count memory up to NumGPU if set by user
-
Patrick Devine authored
-
Ryo Machida authored
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty. * Update server/routes.go --------- Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Daniel Hiltgen authored
Remove VRAM convergence check for windows
-
Daniel Hiltgen authored
The APIs we query are optimistic on free space, and windows pages VRAM, so we don't have to wait to see reported usage recover on unload
-
Patrick Devine authored
-
Josh authored
Update `LlamaScript` to point to new link from Legacy link.
-
Patrick Devine authored
-
Patrick Devine authored
-
- 13 May, 2024 6 commits
-
-
Josh authored
removed inconsistent punctuation
-
Josh Yan authored
-
Michael Yang authored
-
Michael Yang authored
-
Josh Yan authored
-
睡觉型学渣 authored
* Correct typos. * Correct typos.
-
- 12 May, 2024 4 commits
-
-
Zander Lewis authored
Still used Legacy link.
-
Michael Yang authored
use post token
-
Michael Yang authored
-
- 11 May, 2024 6 commits
-
-
Jeffrey Morgan authored
-
todashuta authored
-
Michael Yang authored
-
Daniel Hiltgen authored
Fix envconfig unit test
-
Patrick Devine authored
-
- 10 May, 2024 13 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Fall back to CPU runner with zero layers
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Integration fixes
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Always use the sorted list of GPUs
-
Daniel Hiltgen authored
Make sure the first GPU has the most free space
-
Jeffrey Morgan authored
* rename `--quantization` to `--quantize` * backwards * Update api/types.go Co-authored-by:
Michael Yang <mxyng@pm.me> --------- Co-authored-by:
Michael Yang <mxyng@pm.me>
-
Michael Yang authored
add phi2 mem
-
Michael Yang authored
-
Jeffrey Morgan authored
* dont clamp ctx size in `PredictServerFit` * minimum 4 context * remove context warning
-
Daniel Hiltgen authored
Bump VRAM buffer back up
-
Daniel Hiltgen authored
Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.
-