- 14 May, 2024 5 commits
-
-
Daniel Hiltgen authored
The APIs we query are optimistic on free space, and windows pages VRAM, so we don't have to wait to see reported usage recover on unload
-
Patrick Devine authored
-
Josh authored
Update `LlamaScript` to point to new link from Legacy link.
-
Patrick Devine authored
-
Patrick Devine authored
-
- 13 May, 2024 4 commits
- 12 May, 2024 4 commits
-
-
Zander Lewis authored
Still used Legacy link.
-
Michael Yang authored
use post token
-
Michael Yang authored
-
- 11 May, 2024 6 commits
-
-
Jeffrey Morgan authored
-
todashuta authored
-
Michael Yang authored
-
Daniel Hiltgen authored
Fix envconfig unit test
-
Patrick Devine authored
-
- 10 May, 2024 15 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Fall back to CPU runner with zero layers
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Integration fixes
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Always use the sorted list of GPUs
-
Daniel Hiltgen authored
Make sure the first GPU has the most free space
-
Jeffrey Morgan authored
* rename `--quantization` to `--quantize` * backwards * Update api/types.go Co-authored-by:
Michael Yang <mxyng@pm.me> --------- Co-authored-by:
Michael Yang <mxyng@pm.me>
-
Michael Yang authored
add phi2 mem
-
Michael Yang authored
-
Jeffrey Morgan authored
* dont clamp ctx size in `PredictServerFit` * minimum 4 context * remove context warning
-
Daniel Hiltgen authored
Bump VRAM buffer back up
-
Daniel Hiltgen authored
Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.
-
Michael Yang authored
-
Michael Yang authored
-
- 09 May, 2024 6 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
fix typo
-
Jeffrey Morgan authored
-
Michael Yang authored
-
Michael Yang authored
only forward some env vars
-
Michael Yang authored
log clean up
-