- 31 Jul, 2024 1 commit
-
-
Cyrus Leung authored
-
- 30 Jul, 2024 1 commit
-
-
Nick Hill authored
-
- 24 Jul, 2024 2 commits
-
-
Antoni Baum authored
-
William Lin authored
-
- 23 Jul, 2024 2 commits
-
-
youkaichao authored
-
Cody Yu authored
-
- 22 Jul, 2024 1 commit
-
-
Jiaxin Shan authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 18 Jul, 2024 1 commit
-
-
youkaichao authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 17 Jul, 2024 1 commit
-
-
Cody Yu authored
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 04 Jul, 2024 2 commits
-
-
Cyrus Leung authored
-
Lily Liu authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-
- 03 Jul, 2024 2 commits
-
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
Cyrus Leung authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
ywang96 <ywang@roblox.com> Co-authored-by:
xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by:
Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 02 Jul, 2024 3 commits
-
-
Mor Zusman authored
Signed-off-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by:
Erez Schwartz <erezs@ai21.com> Co-authored-by:
Mor Zusman <morz@ai21.com> Co-authored-by:
tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by:
Tomer Asida <tomera@ai21.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 28 Jun, 2024 3 commits
-
-
Lily Liu authored
Co-authored-by:LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
-
Cody Yu authored
-
Cyrus Leung authored
Co-authored-by:ywang96 <ywang@roblox.com>
-
- 27 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
Nick Hill authored
Co-authored-by:Abhinav Goyal <abhinav.goyal@flipkart.com>
-
- 26 Jun, 2024 1 commit
-
-
Stephanie Wang authored
Signed-off-by:
Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by:
Stephanie <swang@anyscale.com> Co-authored-by:
Stephanie <swang@anyscale.com>
-
- 21 Jun, 2024 2 commits
-
-
rohithkrn authored
-
Joshua Rosenkranz authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Davis Wertheimer <Davis.Wertheimer@ibm.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 13 Jun, 2024 1 commit
-
-
youkaichao authored
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
-
- 12 Jun, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by:
Sanger Steel <sangersteel@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 11 Jun, 2024 1 commit
-
-
Nick Hill authored
-
- 09 Jun, 2024 1 commit
-
-
youkaichao authored
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint (#5074)
-
- 07 Jun, 2024 1 commit
-
-
Antoni Baum authored
-
- 04 Jun, 2024 1 commit
-
-
Toshiki Kataoka authored
-
- 03 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 30 May, 2024 1 commit
-
-
Hyunsung Lee authored
-
- 28 May, 2024 1 commit
-
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 22 May, 2024 2 commits
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 16 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-