- 22 May, 2024 2 commits
- 21 May, 2024 1 commit
-
-
Kante Yin authored
Signed-off-by:kerthcet <kerthcet@gmail.com>
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 15 May, 2024 2 commits
-
-
zifeitong authored
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
- 14 May, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
- 13 May, 2024 1 commit
-
-
Sanger Steel authored
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208)
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 08 May, 2024 1 commit
-
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
- 04 May, 2024 1 commit
-
-
DearPlanet authored
-
- 03 May, 2024 2 commits
-
-
Michael Goin authored
-
SangBin Cho authored
-
- 01 May, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
- 27 Apr, 2024 1 commit
-
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 23 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 21 Apr, 2024 1 commit
-
-
GeauxEric authored
Co-authored-by:
Yun Ding <yunding@nvidia.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 20 Apr, 2024 2 commits
-
-
Noam Gat authored
-
Harry Mellor authored
Co-authored-by:Harry Mellor <hmellor@oxts.com>
-
- 18 Apr, 2024 1 commit
-
-
Michael Goin authored
-
- 16 Apr, 2024 2 commits
-
-
Antoni Baum authored
-
Noam Gat authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-
- 14 Apr, 2024 1 commit
-
-
Sanger Steel authored
-
- 11 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 09 Apr, 2024 1 commit
-
-
Cade Daniel authored
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
-
- 03 Apr, 2024 2 commits
-
-
Adrian Abeyta authored
Co-authored-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
HaiShaw <hixiao@gmail.com> Co-authored-by:
AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by:
mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by:
ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by:
guofangze <guofangze@kuaishou.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Cade Daniel authored
Co-authored-by:Lily Liu <lilyliupku@gmail.com>
-
- 02 Apr, 2024 1 commit
-
-
bigPYJ1151 authored
Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com> Co-authored-by:
Yuan Zhou <yuan.zhou@intel.com>
-
- 01 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 28 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
Cade Daniel authored
-
- 25 Mar, 2024 3 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-
TianYu GUO authored
-
- 22 Mar, 2024 1 commit
-
-
Thomas Parnell authored
Co-authored-by:Jan van Lunteren <jvl@zurich.ibm.com>
-
- 20 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 15 Mar, 2024 1 commit
-
-
Antoni Baum authored
-
- 04 Mar, 2024 2 commits
-
-
Antoni Baum authored
Co-authored-by:Avnish Narayan <avnish@anyscale.com>
-
Philipp Moritz authored
Co-authored-by:Roger Wang <136131678+ywang96@users.noreply.github.com>
-