- 19 Aug, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Aug, 2024 1 commit
-
-
Mahesh Keralapura authored
[Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)
-
- 14 Aug, 2024 1 commit
-
-
William Lin authored
-
- 09 Aug, 2024 2 commits
-
-
Mahesh Keralapura authored
-
Alexander Matveev authored
-
- 08 Aug, 2024 1 commit
-
-
Rui Qiao authored
Signed-off-by:Rui Qiao <ruisearch42@gmail.com>
-
- 06 Aug, 2024 1 commit
-
-
afeldman-nm authored
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) Co-authored-by:
Andrew Feldman <afeld2012@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 01 Aug, 2024 1 commit
-
-
youkaichao authored
-
- 30 Jul, 2024 2 commits
-
-
youkaichao authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Nick Hill authored
-
- 16 Jul, 2024 1 commit
-
-
Mor Zusman authored
Co-authored-by:Mor Zusman <morz@ai21.com>
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 02 Jul, 2024 2 commits
-
-
Mor Zusman authored
Signed-off-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by:
Erez Schwartz <erezs@ai21.com> Co-authored-by:
Mor Zusman <morz@ai21.com> Co-authored-by:
tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by:
Tomer Asida <tomera@ai21.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
- 12 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 09 Jun, 2024 1 commit
-
-
Bla_ckB authored
-
- 07 Jun, 2024 1 commit
-
-
limingshu authored
-
- 03 Jun, 2024 1 commit
-
-
Kaiyang Chen authored
-
- 21 May, 2024 1 commit
-
-
Antoni Baum authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 13 May, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 1 commit
-
-
youkaichao authored
-
- 07 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-
- 02 May, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 26 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 22 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 12 Apr, 2024 2 commits
-
-
SangBin Cho authored
-
Zhuohan Li authored
-
- 11 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 05 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 03 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 01 Apr, 2024 1 commit
-
-
Cade Daniel authored
-