- 25 May, 2024 3 commits
-
-
Lily Liu authored
-
youkaichao authored
-
Eric Xihui Lin authored
Co-authored-by:
beagleski <yunanzhang@microsoft.com> Co-authored-by:
bapatra <bapatra@microsoft.com> Co-authored-by:
Barun Patra <codedecde@users.noreply.github.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 24 May, 2024 2 commits
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
Robert Shaw authored
Co-authored-by:Cody Yu <hao.yu.cody@gmail.com>
-
- 23 May, 2024 4 commits
-
-
Elisei Smirnov authored
Co-authored-by:Elisei Smirnov <el.smirnov@innopolis.university>
-
Dipika Sikka authored
Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Alexander Matveev authored
-
- 22 May, 2024 7 commits
-
-
Cody Yu authored
-
Philipp Moritz authored
-
Nick Hill authored
-
raywanb authored
-
Cody Yu authored
The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
-
SangBin Cho authored
-
sasha0552 authored
-
- 21 May, 2024 5 commits
-
-
Isotr0py authored
-
Kante Yin authored
Signed-off-by:kerthcet <kerthcet@gmail.com>
-
Isotr0py authored
-
HUANG Fei authored
-
Antoni Baum authored
-
- 20 May, 2024 5 commits
-
-
Aurick Qiao authored
-
Mor Zusman authored
Allow dummy load format for fp8, torch.uniform_ doesn't support FP8 at the moment Co-authored-by:Mor Zusman <morz@ai21.com>
-
Wenwei Zhang authored
-
Cyrus Leung authored
-
Woosuk Kwon authored
-
- 19 May, 2024 2 commits
-
-
Alexander Matveev authored
-
Cyrus Leung authored
-
- 18 May, 2024 2 commits
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
alexeykondrat authored
-
- 17 May, 2024 4 commits
-
-
eigenLiu authored
-
Jinzhen Lin authored
-
Alexei-V-Ivanov-AMD authored
[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797)
-
bofeng huang authored
-
- 16 May, 2024 6 commits
-
-
Woosuk Kwon authored
-
Tyler Michael Smith authored
-
youkaichao authored
-
youkaichao authored
-
Hongxia Yang authored
-
Alexander Matveev authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic.com>
-