- 05 Feb, 2025 13 commits
-
-
Akash kaothalkar authored
[Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_for_pytorch'' for --tensor-parallel-size more than 1 (#12546)
-
Michael Goin authored
-
Nick Hill authored
-
Harry Mellor authored
-
Michael Goin authored
-
Kyle Sayers authored
Signed-off-by:
mgoin <michael@neuralmagic.com> Signed-off-by:
Kyle Sayers <kylesayrs@gmail.com> Co-authored-by:
mgoin <michael@neuralmagic.com>
-
Dipika Sikka authored
-
Isotr0py authored
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Aleksandr Malyshev authored
Signed-off-by:
Aleksandr Malyshev <maleksan@amd.com> Co-authored-by:
Aleksandr Malyshev <maleksan@amd.com>
-
Aviv Keshet authored
Signed-off-by:Aviv Keshet <akeshet@scaledcognition.com>
-
Lucas Wilkinson authored
Signed-off-by:
simon-mo <xmo@berkeley.edu> Signed-off-by:
Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by:
simon-mo <xmo@berkeley.edu>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 04 Feb, 2025 13 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Sophie du Couédic authored
Signed-off-by:Sophie du Couédic <sop@zurich.ibm.com>
-
Kero Liang authored
Signed-off-by:imkero <kerorek@outlook.com>
-
Michael Greenbaum authored
Signed-off-by:
Michael Greenbaum <mgreenbaum@microsoft.com> Co-authored-by:
Michael Greenbaum <mgreenbaum@microsoft.com>
-
Isotr0py authored
Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Isotr0py <2037008807@qq.com>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Hongxia Yang authored
Signed-off-by:
Hongxia Yang <hongxia.yang@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com>
-
Kyle Sayers authored
-
Thomas Parnell authored
Signed-off-by:Thomas Parnell <tpa@zurich.ibm.com>
-
Michael Goin authored
Signed-off-by:mgoin <michael@neuralmagic.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 03 Feb, 2025 14 commits
-
-
Cody Yu authored
Signed-off-by:Cody Yu <hao.yu.cody@gmail.com>
-
Cody Yu authored
Signed-off-by:Cody Yu <hao.yu.cody@gmail.com>
-
kushanam authored
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
Tyler Michael Smith authored
Signed-off-by:Tyler Michael Smith <tyler@neuralmagic.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Arthur authored
# Adds support for `transformers` as a backend Following https://github.com/huggingface/transformers/pull/35235 , a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
youkaichao authored
fixes problems like https://github.com/vllm-project/vllm/pull/12635 and https://github.com/vllm-project/vllm/pull/12636 and https://github.com/vllm-project/vllm/pull/12565 --------- Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
Srikanth Srinivas authored
Fix to AWQ quant loading of the new R1 model The new optimized MoE kernels for a large number of experts `moe_wn16` uses AWQ quant which requires the attention layers to be in 16bit The current merge has broken this, and the `get_quant_method` must return None for it to work correctly again --------- Signed-off-by:
Srikanth Srinivas <srikanth@astrum.ai> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Beim <beim2015@outlook.com> Signed-off-by:
rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by:
mgoin <michael@neuralmagic.com> Signed-off-by:
npanpaliya <nishidha.panpaliya@partner.ibm.com> Signed-off-by:
Aleksandr Malyshev <maleksan@amd.com> Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by:
simon-mo <xmo@berkeley.edu> Signed-off-by:
Cody Yu <hao.yu.cody@gmail.com> Signed-off-by:
Chen Zhang <zhangch99@outlook.com> Signed-off-by:
Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by:
Ryan N <ryan.nguyen@centml.ai> Signed-off-by:
Brian Dellabetta <bdellabe@redhat.com> Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> Signed-off-by:
Rahul Tuli <rahul@neuralmagic.com> Signed-off-by:
Russell Bryant <rbryant@redhat.com> Signed-off-by:
simon-mo <simon.mo@hey.com> Signed-off-by:
Vicente Herrera <vicenteherrera@vicenteherrera.com> Signed-off-by:
Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by:
Shawn Du <shawnd200@outlook.com> Signed-off-by:
Kunshang Ji <kunshang.ji@intel.com> Signed-off-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Beim <805908499@qq.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
mgoin <michael@neuralmagic.com> Co-authored-by:
simon-mo <xmo@berkeley.edu> Co-authored-by:
Nishidha <nishidha.panpaliya@partner.ibm.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by:
Aleksandr Malyshev <maleksan@amd.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
simon-mo <simon.mo@hey.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Tyler Michael Smith <tysmith@redhat.com> Co-authored-by:
Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by:
Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
Kevin H. Luu <kevin@anyscale.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by:
Ryan Nguyen <96593302+xpbowler@users.noreply.github.com> Co-authored-by:
Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Co-authored-by:
fade_away <1028552010@qq.com> Co-authored-by:
weilong.yu <weilong.yu@shopee.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com> Co-authored-by:
Eldar Kurtic <eldarkurtic314@gmail.com> Co-authored-by:
Rahul Tuli <rahul@neuralmagic.com> Co-authored-by:
Russell Bryant <rbryant@redhat.com> Co-authored-by:
Vicente Herrera <vicenteherrera@vicenteherrera.com> Co-authored-by:
Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by:
Shawn Du <shawnd200@outlook.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com> Co-authored-by:
youkaichao <youkaichao@gmail.com>
-
Eldar Kurtic authored
Thanks @kylesayrs for catching this!
-
youkaichao authored
When people use deepseek models, they find that they need to solve cv2 version conflict, see https://zhuanlan.zhihu.com/p/21064432691 . I added the check, and make all imports of `cv2` lazy. --------- Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
Yang Chen authored
sgl_moe_align_block_size is based on: https://github.com/sgl-project/sglang/commit/ded9fcd09a43d5e7d5bb31a2bc3e9fc21bf65d2a moe_align_block_size is based on: https://github.com/sgl-project/sglang/commit/ba5112ff691d791a9e38c6c71f59324a5fcb49d0 Signed-off-by:
Yang Chen <yangche@fb.com>
-
Zhuohan Li authored
-