- 21 Jan, 2025 1 commit
-
-
Aleksandr Malyshev authored
Signed-off-by:
maleksan85 <maleksan@amd.com> Co-authored-by:
maleksan85 <maleksan@amd.com>
-
- 20 Jan, 2025 1 commit
-
-
Yuan Tang authored
Signed-off-by:Yuan Tang <terrytangyuan@gmail.com>
-
- 06 Jan, 2025 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 23 Dec, 2024 1 commit
-
-
Rafael Vasquez authored
Signed-off-by:Rafael Vasquez <rafvasq21@gmail.com>
-
- 05 Dec, 2024 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 11 Nov, 2024 1 commit
-
-
Robert Shaw authored
Signed-off-by:
Nick Hill <nickhill@us.ibm.com> Signed-off-by:
rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
- 07 Nov, 2024 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
- 06 Nov, 2024 1 commit
-
-
Aaron Pham authored
Signed-off-by:Aaron Pham <contact@aarnphm.xyz>
-
- 24 Oct, 2024 1 commit
-
-
youkaichao authored
Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 16 Oct, 2024 1 commit
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 11 Oct, 2024 2 commits
-
-
Wallas Henrique authored
Signed-off-by:Wallas Santos <wallashss@ibm.com>
-
youkaichao authored
Co-authored-by:Brendan Wong <bjwpokemon@gmail.com>
-
- 07 Oct, 2024 1 commit
-
-
youkaichao authored
-
- 06 Oct, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
Co-authored-by:Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 01 Oct, 2024 1 commit
-
-
Lily Liu authored
-
- 28 Sep, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
-
- 27 Sep, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
Co-authored-by:Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 25 Sep, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 03 Sep, 2024 2 commits
-
-
Alexander Matveev authored
-
Woosuk Kwon authored
-
- 31 Aug, 2024 1 commit
-
-
Robert Shaw authored
-
- 30 Aug, 2024 1 commit
-
-
afeldman-nm authored
-
- 27 Aug, 2024 1 commit
-
-
Megha Agarwal authored
Co-authored-by:Alexander Matveev <alexm@neuralmagic.com>
-
- 21 Aug, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Fei <dfdfcai4@gmail.com>
-
- 04 Aug, 2024 1 commit
-
-
youkaichao authored
-
- 20 Jul, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 11 Jul, 2024 1 commit
-
-
Robert Shaw authored
Co-authored-by:Zifei Tong <zifeitong@gmail.com>
-
- 02 Jul, 2024 1 commit
-
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 11 Jun, 2024 1 commit
-
-
Nick Hill authored
-
- 05 Jun, 2024 1 commit
-
-
zifeitong authored
-
- 29 May, 2024 1 commit
-
-
Junichi Sato authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 03 May, 2024 1 commit
-
-
Cade Daniel authored
-
- 26 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 21 Apr, 2024 1 commit
-
-
GeauxEric authored
Co-authored-by:
Yun Ding <yunding@nvidia.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-