- 01 Feb, 2025 1 commit
-
-
Lucas Wilkinson authored
This PR implements the Deepseek V3 support by performing matrix absorption the fp8 weights --------- Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
simon-mo <simon.mo@hey.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Tyler Michael Smith <tysmith@redhat.com> Co-authored-by:
Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
-
- 31 Jan, 2025 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by:
simon-mo <xmo@berkeley.edu> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
simon-mo <simon.mo@hey.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Tyler Michael Smith <tysmith@redhat.com> Co-authored-by:
Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by:
simon-mo <xmo@berkeley.edu>
-
- 23 Jan, 2025 2 commits
-
-
Gregory Shtrasberg authored
Signed-off-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
Micah Williamson <micah.williamson@amd.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkinson@neuralmagic.com>
-
- 22 Jan, 2025 2 commits
- 15 Jan, 2025 1 commit
-
-
Rui Qiao authored
Signed-off-by:Rui Qiao <ruisearch42@gmail.com>
-
- 30 Dec, 2024 1 commit
-
-
zhuwenwen authored
-
- 27 Dec, 2024 1 commit
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 17 Dec, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
- 14 Dec, 2024 1 commit
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 11 Dec, 2024 2 commits
-
-
Rui Qiao authored
Signed-off-by:
Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal> Signed-off-by:
Rui Qiao <ruisearch42@gmail.com> Co-authored-by:
Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal>
-
王敏 authored
-
- 10 Dec, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 07 Dec, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 04 Dec, 2024 1 commit
-
-
Daniele authored
Signed-off-by:
Daniele Trifirò <dtrifiro@redhat.com> Signed-off-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com>
-
- 29 Nov, 2024 1 commit
-
-
zhuwenwen authored
-
- 27 Nov, 2024 1 commit
-
-
zhuwenwen authored
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path instead of Hugging Face Hub
-
- 26 Nov, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 21 Nov, 2024 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 19 Nov, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 17 Nov, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 11 Nov, 2024 1 commit
-
-
Robert Shaw authored
Signed-off-by:
Nick Hill <nickhill@us.ibm.com> Signed-off-by:
rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
- 08 Nov, 2024 1 commit
-
-
Luka Govedič authored
Signed-off-by:
luka <luka@neuralmagic.com> Co-authored-by:
youkaichao <youkaichao@126.com>
-
- 07 Nov, 2024 1 commit
-
-
litianjian authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by:
litianjian <litianjian@bytedance.com> Co-authored-by:
DarkLight1337 <tlleungac@connect.ust.hk>
-
- 30 Oct, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 24 Oct, 2024 1 commit
-
-
zhuwenwen authored
-
- 23 Oct, 2024 1 commit
-
-
Flex Wang authored
[Misc] Add an env var VLLM_LOGGING_PREFIX, if set, it will be prepend to all logging messages (#9590)
-
- 22 Oct, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 19 Oct, 2024 1 commit
-
-
Thomas Parnell authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com>
-
- 17 Oct, 2024 4 commits
-
-
Luka Govedič authored
-
Kuntai Du authored
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
-
Lucas Wilkinson authored
-
zhuwenwen authored
-
- 10 Oct, 2024 2 commits
-
-
youkaichao authored
-
sroy745 authored
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
-
- 09 Oct, 2024 1 commit
-
-
youkaichao authored
-
- 07 Oct, 2024 1 commit
-
-
youkaichao authored
-
- 02 Oct, 2024 1 commit
-
-
Sergey Shlyapnikov authored
-
- 27 Sep, 2024 1 commit
-
-
youkaichao authored
-