- 01 Aug, 2024 1 commit
-
-
youkaichao authored
-
- 30 Jul, 2024 2 commits
-
-
youkaichao authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Nick Hill authored
-
- 19 Jul, 2024 1 commit
-
-
Antoni Baum authored
-
- 16 Jul, 2024 1 commit
-
-
Mor Zusman authored
Co-authored-by:Mor Zusman <morz@ai21.com>
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 02 Jul, 2024 3 commits
-
-
Mor Zusman authored
Signed-off-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by:
Erez Schwartz <erezs@ai21.com> Co-authored-by:
Mor Zusman <morz@ai21.com> Co-authored-by:
tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by:
Tomer Asida <tomera@ai21.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Alexander Matveev authored
-
- 27 Jun, 2024 1 commit
-
-
youkaichao authored
-
- 15 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
leiwen83 authored
Signed-off-by:
Lei Wen <wenlei03@qiyi.com> Co-authored-by:
Lei Wen <wenlei03@qiyi.com>
-
- 12 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 09 Jun, 2024 1 commit
-
-
Bla_ckB authored
-
- 07 Jun, 2024 1 commit
-
-
limingshu authored
-
- 03 Jun, 2024 1 commit
-
-
Kaiyang Chen authored
-
- 01 Jun, 2024 1 commit
-
-
Zhuohan Li authored
-
- 29 May, 2024 1 commit
-
-
afeldman-nm authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
-
- 28 May, 2024 1 commit
-
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 24 May, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
- 21 May, 2024 1 commit
-
-
Antoni Baum authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 13 May, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 1 commit
-
-
youkaichao authored
-
- 07 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-
- 02 May, 2024 3 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
SangBin Cho authored
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
-
- 01 May, 2024 2 commits
-
-
leiwen83 authored
Co-authored-by:
Lei Wen <wenlei03@qiyi.com> Co-authored-by:
Sage Moore <sagemoore@utexas.edu>
-
Pastel! authored
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 27 Apr, 2024 1 commit
-
-
Caio Mendes authored
-
- 26 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 23 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
-
- 22 Apr, 2024 1 commit
-
-
SangBin Cho authored
-