- 14 Jan, 2025 1 commit
-
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
- 02 Jan, 2025 1 commit
-
-
bjmsong authored
Signed-off-by:
bjmsong <bjmsong@126.com> Co-authored-by:
bjmsong <bjmsong@126.com>
-
- 07 Dec, 2024 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 22 Nov, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 18 Nov, 2024 1 commit
-
-
Isotr0py authored
Signed-off-by:Isotr0py <2037008807@qq.com>
-
- 17 Nov, 2024 1 commit
-
-
Roger Wang authored
Signed-off-by:Roger Wang <ywang@roblox.com>
-
- 11 Nov, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 09 Nov, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 06 Nov, 2024 2 commits
-
-
Joe Runde authored
Signed-off-by:Joe Runde <Joseph.Runde@ibm.com>
-
Aaron Pham authored
Signed-off-by:Aaron Pham <contact@aarnphm.xyz>
-
- 28 Oct, 2024 1 commit
-
-
wangshuai09 authored
Signed-off-by:wangshuai09 <391746016@qq.com>
-
- 24 Oct, 2024 1 commit
-
-
Yongzao authored
Signed-off-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com>
-
- 04 Oct, 2024 1 commit
-
-
Murali Andoorveedu authored
Signed-off-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by:
Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by:
DarkLight1337 <tlleungac@connect.ust.hk>
-
- 03 Oct, 2024 1 commit
-
-
Shawn Tan authored
Co-authored-by:Nick Hill <nickhill@us.ibm.com>
-
- 26 Sep, 2024 1 commit
-
-
Roger Wang authored
-
- 17 Sep, 2024 1 commit
-
-
Joe Runde authored
Signed-off-by:Joe Runde <Joseph.Runde@ibm.com>
-
- 02 Sep, 2024 1 commit
-
-
Shawn Tan authored
Co-authored-by:Nick Hill <nickhill@us.ibm.com>
-
- 30 Aug, 2024 1 commit
-
-
afeldman-nm authored
-
- 13 Aug, 2024 1 commit
-
-
Cyrus Leung authored
-
- 05 Aug, 2024 1 commit
-
-
Isotr0py authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 01 Aug, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 31 Jul, 2024 1 commit
-
-
Alphi authored
Co-authored-by:
hezhihui <hzh7269@modelbest.cn> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 25 Jul, 2024 1 commit
-
-
Alphi authored
-
- 23 Jul, 2024 1 commit
-
-
Michael Goin authored
-
- 19 Jul, 2024 1 commit
-
-
Robert Shaw authored
-
- 18 Jul, 2024 1 commit
-
-
Michael Goin authored
-
- 17 Jul, 2024 1 commit
-
-
Wushi Dong authored
original title: [Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization
-
- 16 Jul, 2024 1 commit
-
-
Michael Goin authored
-
- 15 Jul, 2024 1 commit
-
-
youkaichao authored
-
- 02 Jul, 2024 2 commits
-
-
Qubitium-ModelCloud authored
Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com> Co-authored-by:
ZX <zx@lbx.dev>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
- 27 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
Cyrus Leung authored
-
- 01 Jun, 2024 1 commit
-
-
chenqianfzh authored
-
- 27 May, 2024 1 commit
-
-
Zhuohan Li authored
Co-authored-by:
rsnm2 <rshaw@neuralmagic.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 23 May, 2024 1 commit
-
-
Dipika Sikka authored
Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 22 May, 2024 2 commits
-
-
Philipp Moritz authored
-
Cody Yu authored
The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 13 May, 2024 1 commit
-
-
Woosuk Kwon authored
-