- 16 Oct, 2024 1 commit
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 11 Oct, 2024 1 commit
-
-
youkaichao authored
Co-authored-by:Brendan Wong <bjwpokemon@gmail.com>
-
- 07 Oct, 2024 1 commit
-
-
youkaichao authored
-
- 03 Sep, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 30 Aug, 2024 1 commit
-
-
afeldman-nm authored
-
- 27 Aug, 2024 1 commit
-
-
Megha Agarwal authored
Co-authored-by:Alexander Matveev <alexm@neuralmagic.com>
-
- 04 Aug, 2024 1 commit
-
-
youkaichao authored
-
- 20 Jul, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 11 Jul, 2024 1 commit
-
-
Robert Shaw authored
Co-authored-by:Zifei Tong <zifeitong@gmail.com>
-
- 02 Jul, 2024 1 commit
-
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 05 Jun, 2024 1 commit
-
-
zifeitong authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 26 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 21 Apr, 2024 1 commit
-
-
GeauxEric authored
Co-authored-by:
Yun Ding <yunding@nvidia.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-