- 03 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 25 May, 2024 1 commit
-
-
Eric Xihui Lin authored
Co-authored-by:
beagleski <yunanzhang@microsoft.com> Co-authored-by:
bapatra <bapatra@microsoft.com> Co-authored-by:
Barun Patra <codedecde@users.noreply.github.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 22 May, 2024 1 commit
-
-
sasha0552 authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 09 May, 2024 1 commit
-
-
Hao Zhang authored
Co-authored-by:
Dash Desai <1723932+iamontheinet@users.noreply.github.com> Co-authored-by:
Aurick Qiao <qiao@aurick.net> Co-authored-by:
Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by:
Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com>
-
- 02 May, 2024 1 commit
-
-
youkaichao authored
-
- 30 Apr, 2024 1 commit
-
-
fuchen.ljl authored
-
- 27 Apr, 2024 1 commit
-
-
Prashant Gupta authored
Signed-off-by:
Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by:
Travis Johnson <tjohnson31415@gmail.com>
-
- 26 Apr, 2024 2 commits
-
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
Cyrus Leung authored
-
- 25 Apr, 2024 1 commit
-
-
Nick Hill authored
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Apr, 2024 1 commit
-
-
Antoni Baum authored
-
- 12 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
Michael Feil authored
Co-authored-by:Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 11 Apr, 2024 1 commit
-
-
Nick Hill authored
-
- 04 Apr, 2024 1 commit
-
-
Tao He authored
Signed-off-by:Tao He <sighingnow@gmail.com>
-
- 01 Apr, 2024 1 commit
-
-
Nick Hill authored
Some simplifications made for clarity. Also moves detokenization-related functions from tokenizer.py to detokenizer.py.
-
- 30 Mar, 2024 1 commit
-
-
youkaichao authored
-
- 29 Mar, 2024 1 commit
-
-
Roy authored
-
- 27 Mar, 2024 1 commit
-
-
Megha Agarwal authored
-
- 25 Mar, 2024 2 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:MeloYang <meloyang05@gmail.com>
-
- 21 Mar, 2024 2 commits
-
-
Woosuk Kwon authored
Co-authored-by:
Roy <jasonailu87@gmail.com> Co-authored-by:
Roger Meier <r.meier@siemens.com>
-
Lalit Pradhan authored
-
- 20 Mar, 2024 1 commit
-
-
Nick Hill authored
-
- 15 Mar, 2024 1 commit
-
-
Antoni Baum authored
-
- 11 Mar, 2024 2 commits
-
-
Zhuohan Li authored
-
Roy authored
-
- 29 Feb, 2024 1 commit
-
-
Seonghyeon authored
-
- 27 Feb, 2024 1 commit
-
-
Roy authored
-
- 19 Feb, 2024 1 commit
-
-
Isotr0py authored
-
- 18 Feb, 2024 1 commit
-
-
Mark Mozolewski authored
-
- 14 Feb, 2024 1 commit
-
-
Roy authored
-
- 13 Feb, 2024 3 commits
-
-
Philipp Moritz authored
Co-authored-by:Roy <jasonailu87@gmail.com>
-
Philipp Moritz authored
This reverts commit 5c976a7e.
-
Roy authored
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-
- 17 Dec, 2023 1 commit
-
-
Woosuk Kwon authored
-