- 29 May, 2024 1 commit
-
-
Michael Yang authored
-
- 28 May, 2024 2 commits
-
-
Daniel Hiltgen authored
On some systems, 1 minute isn't sufficient to finish the load after it hits 100% This creates 2 distinct timers, although they're both set to the same value for now so we can refine the timeouts further.
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
- 25 May, 2024 1 commit
-
-
Daniel Hiltgen authored
If the client closes the connection before we finish loading the model we abort, so lets make the log message clearer why to help users understand this failure mode
-
- 24 May, 2024 4 commits
-
-
Michael Yang authored
Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
Michael Yang authored
-
Patrick Devine authored
-
Wang,Zhe authored
-
- 23 May, 2024 4 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
This doesn't expose a UX yet, but wires the initial server portion of progress reporting during load
-
Bruce MacDonald authored
Co-authored-by:ManniX-ITA <20623405+mann1x@users.noreply.github.com>
-
Jeffrey Morgan authored
* put flash attention behind flag for now * add test * remove print * up timeout for sheduler tests
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-
- 20 May, 2024 6 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
jmorganca authored
-
Josh Yan authored
-
Sam authored
* feat: enable flash attention if supported * feat: enable flash attention if supported * feat: enable flash attention if supported * feat: add flash_attn support
-
- 16 May, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 15 May, 2024 3 commits
-
-
Daniel Hiltgen authored
Windows already implements these, carry over to linux.
-
Patrick Devine authored
-
Daniel Hiltgen authored
Only dump env vars we care about in the logs
-
- 14 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 13 May, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 11 May, 2024 1 commit
-
- 10 May, 2024 3 commits
-
-
Daniel Hiltgen authored
-
Michael Yang authored
-
Jeffrey Morgan authored
* dont clamp ctx size in `PredictServerFit` * minimum 4 context * remove context warning
-
- 09 May, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Bruce MacDonald authored
-
Daniel Hiltgen authored
-
- 08 May, 2024 2 commits
-
-
Daniel Hiltgen authored
This records more GPU usage information for eventual UX inclusion.
-
Michael Yang authored
-
- 07 May, 2024 2 commits
-
-
Daniel Hiltgen authored
This will bubble up a much more informative error message if noexec is preventing us from running the subprocess
-
Michael Yang authored
-
- 06 May, 2024 1 commit
-
-
Michael Yang authored
-