- 02 Jul, 2024 1 commit
-
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 01 Jul, 2024 1 commit
-
-
youkaichao authored
-
- 29 Jun, 2024 2 commits
-
-
Woosuk Kwon authored
-
Woosuk Kwon authored
-
- 28 Jun, 2024 6 commits
-
-
Lily Liu authored
Co-authored-by:LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
-
Cody Yu authored
-
Ilya Lavrenov authored
-
Cyrus Leung authored
Co-authored-by:ywang96 <ywang@roblox.com>
-
Isotr0py authored
-
Woosuk Kwon authored
-
- 27 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
Nick Hill authored
Co-authored-by:Abhinav Goyal <abhinav.goyal@flipkart.com>
-
- 26 Jun, 2024 4 commits
-
-
Woosuk Kwon authored
-
Woosuk Kwon authored
-
Woosuk Kwon authored
-
Stephanie Wang authored
Signed-off-by:
Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by:
Stephanie <swang@anyscale.com> Co-authored-by:
Stephanie <swang@anyscale.com>
-
- 25 Jun, 2024 4 commits
-
-
Woosuk Kwon authored
-
Matt Wong authored
-
Woosuk Kwon authored
-
Jie Fu (傅杰) authored
-
- 21 Jun, 2024 2 commits
-
-
rohithkrn authored
-
Joshua Rosenkranz authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Davis Wertheimer <Davis.Wertheimer@ibm.com>
-
- 17 Jun, 2024 1 commit
-
-
Kunshang Ji authored
Co-authored-by:
Jiang Li <jiang1.li@intel.com> Co-authored-by:
Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by:
Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 13 Jun, 2024 1 commit
-
-
youkaichao authored
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
-
- 12 Jun, 2024 3 commits
-
-
Isotr0py authored
-
Travis Johnson authored
Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by:
Sanger Steel <sangersteel@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
Woosuk Kwon authored
-
- 11 Jun, 2024 1 commit
-
-
Nick Hill authored
-
- 09 Jun, 2024 1 commit
-
-
youkaichao authored
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint (#5074)
-
- 07 Jun, 2024 1 commit
-
-
Antoni Baum authored
-
- 04 Jun, 2024 1 commit
-
-
Toshiki Kataoka authored
-
- 03 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 30 May, 2024 1 commit
-
-
Hyunsung Lee authored
-
- 29 May, 2024 1 commit
-
-
youkaichao authored
-
- 28 May, 2024 2 commits
-
-
Robert Shaw authored
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 22 May, 2024 2 commits
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-