- 01 Jun, 2024 1 commit
-
-
Zhuohan Li authored
-
- 29 May, 2024 1 commit
-
-
afeldman-nm authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
-
- 28 May, 2024 1 commit
-
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 24 May, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
- 21 May, 2024 1 commit
-
-
Antoni Baum authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 13 May, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 1 commit
-
-
youkaichao authored
-
- 07 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-
- 02 May, 2024 3 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
SangBin Cho authored
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
-
- 01 May, 2024 2 commits
-
-
leiwen83 authored
Co-authored-by:
Lei Wen <wenlei03@qiyi.com> Co-authored-by:
Sage Moore <sagemoore@utexas.edu>
-
Pastel! authored
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 27 Apr, 2024 1 commit
-
-
Caio Mendes authored
-
- 26 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 23 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
-
- 22 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 15 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 12 Apr, 2024 3 commits
-
-
SangBin Cho authored
-
Zhuohan Li authored
-
Michael Feil authored
Co-authored-by:Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 11 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 07 Apr, 2024 1 commit
-
-
youkaichao authored
-
- 05 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 03 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 02 Apr, 2024 1 commit
-
-
Michael Goin authored
-
- 01 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 28 Mar, 2024 3 commits
-
-
Simon Mo authored
-
SangBin Cho authored
-
Cade Daniel authored
-
- 25 Mar, 2024 2 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-