- 24 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 23 May, 2024 4 commits
-
-
Michael Yang authored
-
Daniel Hiltgen authored
This doesn't expose a UX yet, but wires the initial server portion of progress reporting during load
-
Bruce MacDonald authored
Co-authored-by:ManniX-ITA <20623405+mann1x@users.noreply.github.com>
-
Jeffrey Morgan authored
* put flash attention behind flag for now * add test * remove print * up timeout for sheduler tests
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-
- 20 May, 2024 6 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
jmorganca authored
-
Josh Yan authored
-
Sam authored
* feat: enable flash attention if supported * feat: enable flash attention if supported * feat: enable flash attention if supported * feat: add flash_attn support
-
- 16 May, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 15 May, 2024 3 commits
-
-
Daniel Hiltgen authored
Windows already implements these, carry over to linux.
-
Patrick Devine authored
-
Daniel Hiltgen authored
Only dump env vars we care about in the logs
-
- 14 May, 2024 1 commit
-
-
Patrick Devine authored
-
- 13 May, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 11 May, 2024 1 commit
-
- 10 May, 2024 3 commits
-
-
Daniel Hiltgen authored
-
Michael Yang authored
-
Jeffrey Morgan authored
* dont clamp ctx size in `PredictServerFit` * minimum 4 context * remove context warning
-
- 09 May, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Bruce MacDonald authored
-
Daniel Hiltgen authored
-
- 08 May, 2024 2 commits
-
-
Daniel Hiltgen authored
This records more GPU usage information for eventual UX inclusion.
-
Michael Yang authored
-
- 07 May, 2024 2 commits
-
-
Daniel Hiltgen authored
This will bubble up a much more informative error message if noexec is preventing us from running the subprocess
-
Michael Yang authored
-
- 06 May, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32} -
Daniel Hiltgen authored
Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* fix llava models not working after first request * individual requests only for llava models
-
- 05 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs
-
- 04 May, 2024 1 commit
-
-
Michael Yang authored
-
- 01 May, 2024 1 commit
-
-
Mark Ward authored
-