- 05 Feb, 2025 4 commits
-
-
Aleksandr Malyshev authored
Signed-off-by:
Aleksandr Malyshev <maleksan@amd.com> Co-authored-by:
Aleksandr Malyshev <maleksan@amd.com>
-
Aviv Keshet authored
Signed-off-by:Aviv Keshet <akeshet@scaledcognition.com>
-
Lucas Wilkinson authored
Signed-off-by:
simon-mo <xmo@berkeley.edu> Signed-off-by:
Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by:
simon-mo <xmo@berkeley.edu>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 04 Feb, 2025 13 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Sophie du Couédic authored
Signed-off-by:Sophie du Couédic <sop@zurich.ibm.com>
-
Kero Liang authored
Signed-off-by:imkero <kerorek@outlook.com>
-
Michael Greenbaum authored
Signed-off-by:
Michael Greenbaum <mgreenbaum@microsoft.com> Co-authored-by:
Michael Greenbaum <mgreenbaum@microsoft.com>
-
Isotr0py authored
Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Isotr0py <2037008807@qq.com>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Hongxia Yang authored
Signed-off-by:
Hongxia Yang <hongxia.yang@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com>
-
Kyle Sayers authored
-
Thomas Parnell authored
Signed-off-by:Thomas Parnell <tpa@zurich.ibm.com>
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 03 Feb, 2025 15 commits
-
-
Cody Yu authored
Signed-off-by:Cody Yu <hao.yu.cody@gmail.com>
-
Cody Yu authored
Signed-off-by:Cody Yu <hao.yu.cody@gmail.com>
-
kushanam authored
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
Tyler Michael Smith authored
Signed-off-by:Tyler Michael Smith <tyler@neuralmagic.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Arthur authored
# Adds support for `transformers` as a backend Following https://github.com/huggingface/transformers/pull/35235 , a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
youkaichao authored
fixes problems like https://github.com/vllm-project/vllm/pull/12635 and https://github.com/vllm-project/vllm/pull/12636 and https://github.com/vllm-project/vllm/pull/12565 --------- Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
Srikanth Srinivas authored
Fix to AWQ quant loading of the new R1 model The new optimized MoE kernels for a large number of experts `moe_wn16` uses AWQ quant which requires the attention layers to be in 16bit The current merge has broken this, and the `get_quant_method` must return None for it to work correctly again --------- Signed-off-by:
Srikanth Srinivas <srikanth@astrum.ai> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Beim <beim2015@outlook.com> Signed-off-by:
rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by:
mgoin <michael@neuralmagic.com> Signed-off-by:
npanpaliya <nishidha.panpaliya@partner.ibm.com> Signed-off-by:
Aleksandr Malyshev <maleksan@amd.com> Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by:
simon-mo <xmo@berkeley.edu> Signed-off-by:
Cody Yu <hao.yu.cody@gmail.com> Signed-off-by:
Chen Zhang <zhangch99@outlook.com> Signed-off-by:
Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by:
Ryan N <ryan.nguyen@centml.ai> Signed-off-by:
Brian Dellabetta <bdellabe@redhat.com> Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> Signed-off-by:
Rahul Tuli <rahul@neuralmagic.com> Signed-off-by:
Russell Bryant <rbryant@redhat.com> Signed-off-by:
simon-mo <simon.mo@hey.com> Signed-off-by:
Vicente Herrera <vicenteherrera@vicenteherrera.com> Signed-off-by:
Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by:
Shawn Du <shawnd200@outlook.com> Signed-off-by:
Kunshang Ji <kunshang.ji@intel.com> Signed-off-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Beim <805908499@qq.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
mgoin <michael@neuralmagic.com> Co-authored-by:
simon-mo <xmo@berkeley.edu> Co-authored-by:
Nishidha <nishidha.panpaliya@partner.ibm.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by:
Aleksandr Malyshev <maleksan@amd.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
simon-mo <simon.mo@hey.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Tyler Michael Smith <tysmith@redhat.com> Co-authored-by:
Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by:
Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
Kevin H. Luu <kevin@anyscale.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by:
Ryan Nguyen <96593302+xpbowler@users.noreply.github.com> Co-authored-by:
Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Co-authored-by:
fade_away <1028552010@qq.com> Co-authored-by:
weilong.yu <weilong.yu@shopee.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com> Co-authored-by:
Eldar Kurtic <eldarkurtic314@gmail.com> Co-authored-by:
Rahul Tuli <rahul@neuralmagic.com> Co-authored-by:
Russell Bryant <rbryant@redhat.com> Co-authored-by:
Vicente Herrera <vicenteherrera@vicenteherrera.com> Co-authored-by:
Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by:
Shawn Du <shawnd200@outlook.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com> Co-authored-by:
youkaichao <youkaichao@gmail.com>
-
Eldar Kurtic authored
Thanks @kylesayrs for catching this!
-
youkaichao authored
When people use deepseek models, they find that they need to solve cv2 version conflict, see https://zhuanlan.zhihu.com/p/21064432691 . I added the check, and make all imports of `cv2` lazy. --------- Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
Yang Chen authored
sgl_moe_align_block_size is based on: https://github.com/sgl-project/sglang/commit/ded9fcd09a43d5e7d5bb31a2bc3e9fc21bf65d2a moe_align_block_size is based on: https://github.com/sgl-project/sglang/commit/ba5112ff691d791a9e38c6c71f59324a5fcb49d0 Signed-off-by:
Yang Chen <yangche@fb.com>
-
Zhuohan Li authored
-
youkaichao authored
As more and more people are trying deepseek models with multi-node inference, https://github.com/vllm-project/vllm/issues/7815 becomes more frequent. Let's give clear message to users. Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
- 02 Feb, 2025 6 commits
-
-
Russell Bryant authored
- **Add SPDX license headers to python source files** - **Check for SPDX headers using pre-commit** commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by:Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by:
Russell Bryant <rbryant@redhat.com> --------- Signed-off-by:
Russell Bryant <rbryant@redhat.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
Shawn Du authored
As mentioned in RFC https://github.com/vllm-project/vllm/issues/12254 , this PR achieves the task: combine allocate_slots and append_slots. There should be no functionality change, except that in decode, also raise exception when num_tokens is zero (like prefill), and change the unit test case accordingly. @comaniac @rickyyx @WoosukKwon @youkaichao @heheda12345 @simon-mo --------- Signed-off-by:
Shawn Du <shawnd200@outlook.com>
-
Woosuk Kwon authored
A small optimization to avoid creating a new `ConstantList` every time `request.kv_block_hashes` is used. Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Russell Bryant authored
I noticed during testing that I was getting a lot of these deprecation warnings about `local_lora_path`: ``` DeprecationWarning: The 'lora_local_path' attribute is deprecated and will be removed in a future version. Please use 'lora_path' instead. ``` The check used for emitting this warning was always True, even when the parameter was not actually specified. It will always be in `__struct_fields__`. We should be checking for a non-None value, instead. Signed-off-by:Russell Bryant <rbryant@redhat.com> Signed-off-by:
Russell Bryant <rbryant@redhat.com>
-
Jinzhen Lin authored
Fix https://github.com/vllm-project/vllm/issues/12647 The `get_quant_method` of `moe_wna16` always return moe method, GPTQ-based linear method or AWQ-based linear method, even when the target module is attention layer. https://github.com/vllm-project/vllm/blob/baeded25699f9f4851843306f27f685c4d4ee7c5/vllm/attention/layer.py#L86-L92 Signed-off-by:
Jinzhen Lin <linjinzhen@hotmail.com>
-
- 01 Feb, 2025 2 commits
-
-
Vicente Herrera authored
Word "evolved" was mistyped Signed-off-by:
Vicente Herrera <vicenteherrera@vicenteherrera.com> --------- Signed-off-by:
Vicente Herrera <vicenteherrera@vicenteherrera.com>
-
Michael Goin authored
-