- 07 Jun, 2024 1 commit
-
-
limingshu authored
-
- 03 Jun, 2024 1 commit
-
-
Kaiyang Chen authored
-
- 21 May, 2024 1 commit
-
-
Antoni Baum authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 13 May, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 1 commit
-
-
youkaichao authored
-
- 07 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-
- 02 May, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 26 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 22 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 12 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
Zhuohan Li authored
-
- 11 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 05 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 03 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 01 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 28 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
Cade Daniel authored
-
- 25 Mar, 2024 2 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Thomas Parnell authored
Co-authored-by:Jan van Lunteren <jvl@zurich.ibm.com>
-
- 20 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 15 Mar, 2024 1 commit
-
-
Tao He authored
Signed-off-by:
Tao He <sighingnow@gmail.com> Co-authored-by:
simon-mo <simon.mo@hey.com>
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 05 Mar, 2024 1 commit
-
-
Nick Hill authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 22 Feb, 2024 1 commit
-
-
Massimiliano Pronesti authored
-
- 21 Feb, 2024 2 commits
-
-
Nick Hill authored
-
Antoni Baum authored
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-
- 21 Jan, 2024 1 commit
-
-
Nick Hill authored
-
- 18 Jan, 2024 1 commit
-
-
ljss authored
-