- 02 Aug, 2024 1 commit
-
-
Cyrus Leung authored
-
- 31 Jul, 2024 1 commit
-
-
Cyrus Leung authored
-
- 10 Jul, 2024 1 commit
-
-
Benjamin Muskalla authored
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 13 Jun, 2024 1 commit
-
-
youkaichao authored
-
- 03 Jun, 2024 2 commits
-
-
Kaiyang Chen authored
-
Cyrus Leung authored
-
- 29 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 22 May, 2024 1 commit
-
-
Michael Goin authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 02 May, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
- 29 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 25 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 18 Apr, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
-
- 12 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 25 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 22 Feb, 2024 1 commit
-
-
Massimiliano Pronesti authored
-
- 23 Jan, 2024 1 commit
-
-
Simon Mo authored
-
- 20 Nov, 2023 1 commit
-
-
Simon Mo authored
-
- 31 Oct, 2023 1 commit
-
-
Cade Daniel authored
-
- 02 Oct, 2023 1 commit
-
-
Zhuohan Li authored
-
- 03 Jul, 2023 1 commit
-
-
Zhuohan Li authored
-