- 28 Aug, 2024 8 commits
-
-
Michael Yang authored
fix(test): do not clobber models directory
-
Michael Yang authored
fix: validate modelpath
-
Michael Yang authored
-
Patrick Devine authored
-
Michael Yang authored
detect chat template from configs that contain lists
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
- 27 Aug, 2024 11 commits
-
-
Daniel Hiltgen authored
-
Michael Yang authored
-
Patrick Devine authored
-
Patrick Devine authored
-
Daniel Hiltgen authored
-
Sean Khatiri authored
-
Patrick Devine authored
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Jeffrey Morgan authored
-
- 25 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
The numa flag may be having a performance impact on multi-socket systems with GPU loads
-
- 23 Aug, 2024 6 commits
-
-
Daniel Hiltgen authored
The recent cuda variant changes uncovered a bug in ByLibrary which failed to group by common variant for GPU types.
-
Michael Yang authored
update faq
-
Michael Yang authored
-
Patrick Devine authored
-
Daniel Hiltgen authored
During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.
-
Daniel Hiltgen authored
Define changed recently and this slipped through the cracks with the old name.
-
- 22 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
* Fix embeddings memory corruption The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings. * Fix embed integration test assumption The token eval count has changed with recent llama.cpp bumps (0.3.5+)
-
- 21 Aug, 2024 8 commits
-
-
Michael Yang authored
convert: update llama conversion for llama3.1
-
Michael Yang authored
-
Michael Yang authored
convert gemma2
-
Michael Yang authored
convert bert model from safetensors
-
Michael Yang authored
fix: chmod new layer to 0o644 when creating it
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 20 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
We're over budget for github's maximum release artifact size with rocm + 2 cuda versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can be extracted into the same location as the main bundle.
-
- 19 Aug, 2024 4 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Fix overlapping artifact name on CI
-
Daniel Hiltgen authored
-