- 19 Aug, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Aug, 2024 1 commit
-
-
Mahesh Keralapura authored
[Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)
-
- 14 Aug, 2024 1 commit
-
-
William Lin authored
-
- 09 Aug, 2024 4 commits
-
-
Cade Daniel authored
-
Mahesh Keralapura authored
-
Alexander Matveev authored
-
Alexander Matveev authored
-
- 08 Aug, 2024 2 commits
-
-
Zach Zheng authored
-
Rui Qiao authored
Signed-off-by:Rui Qiao <ruisearch42@gmail.com>
-
- 06 Aug, 2024 2 commits
-
-
afeldman-nm authored
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) Co-authored-by:
Andrew Feldman <afeld2012@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
xiaobochen123 authored
-
- 02 Aug, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 01 Aug, 2024 1 commit
-
-
youkaichao authored
-
- 30 Jul, 2024 2 commits
-
-
youkaichao authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Nick Hill authored
-
- 19 Jul, 2024 1 commit
-
-
Antoni Baum authored
-
- 16 Jul, 2024 1 commit
-
-
Mor Zusman authored
Co-authored-by:Mor Zusman <morz@ai21.com>
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 02 Jul, 2024 3 commits
-
-
Mor Zusman authored
Signed-off-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by:
Erez Schwartz <erezs@ai21.com> Co-authored-by:
Mor Zusman <morz@ai21.com> Co-authored-by:
tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by:
Tomer Asida <tomera@ai21.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Alexander Matveev authored
-
- 27 Jun, 2024 1 commit
-
-
youkaichao authored
-
- 15 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
leiwen83 authored
Signed-off-by:
Lei Wen <wenlei03@qiyi.com> Co-authored-by:
Lei Wen <wenlei03@qiyi.com>
-
- 12 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 09 Jun, 2024 1 commit
-
-
Bla_ckB authored
-
- 07 Jun, 2024 1 commit
-
-
limingshu authored
-
- 03 Jun, 2024 1 commit
-
-
Kaiyang Chen authored
-
- 01 Jun, 2024 1 commit
-
-
Zhuohan Li authored
-
- 29 May, 2024 1 commit
-
-
afeldman-nm authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
-
- 28 May, 2024 1 commit
-
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 24 May, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
- 21 May, 2024 1 commit
-
-
Antoni Baum authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 13 May, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 1 commit
-
-
youkaichao authored
-
- 07 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-