- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 15 May, 2024 2 commits
-
-
zifeitong authored
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
- 14 May, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
- 13 May, 2024 2 commits
-
-
Sanger Steel authored
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208)
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 08 May, 2024 1 commit
-
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
- 06 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 04 May, 2024 2 commits
-
-
DearPlanet authored
-
Cody Yu authored
-
- 03 May, 2024 4 commits
-
-
Cade Daniel authored
-
Michael Goin authored
-
SangBin Cho authored
-
DefTruth authored
-
- 02 May, 2024 1 commit
-
-
youkaichao authored
-
- 01 May, 2024 4 commits
-
-
Roy authored
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
Robert Shaw authored
-
harrywu authored
-
- 30 Apr, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
- 28 Apr, 2024 2 commits
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
DefTruth authored
-
- 27 Apr, 2024 4 commits
-
-
Nick Hill authored
Co-authored-by:DefTruth <31974251+deftruth@users.noreply.github.com>
-
Roy authored
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
Roy authored
-
- 26 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 25 Apr, 2024 2 commits
- 23 Apr, 2024 2 commits
-
-
Cade Daniel authored
-
SangBin Cho authored
-
- 22 Apr, 2024 1 commit
-
-
Tao He authored
Signed-off-by:Tao He <sighingnow@gmail.com>
-
- 21 Apr, 2024 1 commit
-
-
GeauxEric authored
Co-authored-by:
Yun Ding <yunding@nvidia.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 20 Apr, 2024 2 commits
-
-
Noam Gat authored
-
Harry Mellor authored
Co-authored-by:Harry Mellor <hmellor@oxts.com>
-
- 19 Apr, 2024 2 commits
-
-
Ronen Schaffer authored
-
Simon Mo authored
-