Commits · 3eccb64e7e2a76c34904eaa4e5d2bd6181a20ec9 · OpenDAS / vllm_cscc

30 Jan, 2026 1 commit
- set MOE_NN=0, VLLM_USE_FUSED_RMS_ROPE=0, VLLM_USE_FUSE_SILU_AND_MUL=0 and VLLM_W8A8_BACKEND=1 · 3eccb64e
  zhuwenwen authored Jan 30, 2026
  
  3eccb64e
23 Jan, 2026 1 commit
- [Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746) · f46d576c
  Isotr0py authored Jan 22, 2026
```
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit 8ebf271b)
```
  f46d576c
16 Jan, 2026 2 commits

解决custom cudagraph模式需要拷贝的问题，需要配合dtk进行使用。 · f1bc9890

zhuwenwen authored Jan 16, 2026

区分pcie和hglink custom allreduce的使用
vllm：export VLLM_CUSTOM_CACHE=1
dtk：export HIP_KERNEL_EVENT_SYSTENFENCE=1

set VLLM_USE_FUSED_RMS_ROPE=1
add SUPPORT_MOE_MARLIN_W16A16 to use moe marlin on bw
support fa kvcache fp8 (todo: add VLLM_USE_QUERY_QUANT to not use q quant)
update moe_align_block_size

f1bc9890

Switch default w8a8 gemm impl to blaslt. · f06d1125
zhuwenwen authored Jan 16, 2026
```
fix _forward_encoder_attention
remove medusa
set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
```
f06d1125

11 Jan, 2026 1 commit
- [CPU][BugFix] Disable AOT Compile for CPU (#32037) · 9103ed16
  Fadi Arafeh authored Jan 11, 2026
```
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
```
  9103ed16
09 Jan, 2026 3 commits
- [1/N][Attention] Restructure attention: move files (#31916) · 2612ba92
  Matthew Bonanni authored Jan 09, 2026
```
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
```
  2612ba92
- [UX] Add vLLM model inspection view (#29450) · d5ec6c05
  Michael Goin authored Jan 09, 2026
```
Signed-off-by: mgoin <mgoin64@gmail.com>
```
  d5ec6c05
- [ROCm][PD] add moriio kv connector. (#29304) · 4505849b
  inkcherry authored Jan 09, 2026
```
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
```
  4505849b
07 Jan, 2026 2 commits
- [Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213) · cc6dafae
  Kate Cheng authored Jan 07, 2026
```
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
```
  cc6dafae
- remove USE_FUSED_RMS_QUANT and USE_FUSED_SILU_MUL_QUANT · 60b37c6b
  zhuwenwen authored Jan 07, 2026
  
  60b37c6b
05 Jan, 2026 1 commit
- [Bug] Revert torch warning fix (#31585) · af9a7ec2
  Wentao Ye authored Jan 05, 2026
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  af9a7ec2
19 Dec, 2025 2 commits
- Make engine core client handshake timeout configurable (#27444) · 1ab52135
  Seiji Eicher authored Dec 19, 2025
```
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
```
  1ab52135
- fix moe params and run error · 22c6c645
  zhuwenwen authored Dec 19, 2025
  
  22c6c645
18 Dec, 2025 3 commits

Remove all2all backend envvar (#30363) · 41b6f920

Elizabeth Thomas authored Dec 18, 2025

Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

41b6f920

skip static_scaled_fp8_quant and set VLLM_USE_BYTECODE_HOOK=0 · 8d0e36b5
zhuwenwen authored Dec 18, 2025

8d0e36b5

[Metrics] Model FLOPs Utilization estimation (#30738) · a0b782f9

SungMinCho authored Dec 17, 2025


Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

a0b782f9

17 Dec, 2025 2 commits

[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809) · 9db1db59
Zhengxu Chen authored Dec 17, 2025
```
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
```
9db1db59

Synchronize the modifications from the 12th to the 17th: · b66c8e4b

zhuwenwen authored Dec 17, 2025

修复CompressedTensorsLinearMethod中的w4a16的冲突问题
feat(moe): add Marlin W16A16 fused MoE behind VLLM_USE_MARLIN_W16A16_MOE
replace the fp8_mqa_logits and fp8_paged_mqa_logits interfaces in deepgemm with mqa_logits and paged_mqa_logits from lightop

b66c8e4b

16 Dec, 2025 1 commit
- [Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627) · 9fec0e13
  Lucas Wilkinson authored Dec 16, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
```
  9fec0e13
13 Dec, 2025 1 commit
- fix optional error · b8ef3436
  zhuwenwen authored Dec 13, 2025
  
  b8ef3436
12 Dec, 2025 1 commit
- [Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532) · 3e41992f
  Lucas Wilkinson authored Dec 12, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
```
  3e41992f
11 Dec, 2025 2 commits
- [Chore] Fix torch precision warning (#30428) · d6464f26
  Wentao Ye authored Dec 10, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  d6464f26
- [Deprecation] Remove deprecated task, seed and MM settings (#30397) · 7e24e5d4
  Cyrus Leung authored Dec 11, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  7e24e5d4
10 Dec, 2025 1 commit

[Perf] Enable environment cache in EngineCore to enable the feature for... · 9f042ba2

Jialin Ouyang authored Dec 10, 2025


[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

9f042ba2

09 Dec, 2025 3 commits

[Cleanup] Refactor profiling env vars into a CLI config (#29912) · e858bfe0

Benjamin Chislett authored Dec 09, 2025

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

e858bfe0

[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix... · 83319b44

Wentao Ye authored Dec 09, 2025


[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897)
Signed-off-by: yewentao256 <zhyanwentao@126.com>

83319b44

[moe] Allow disabling DP chunking (#29936) · 9d6235ca
Ming Yang authored Dec 08, 2025
```
Signed-off-by: Ming Yang <minos.future@gmail.com>
```
9d6235ca

04 Dec, 2025 1 commit

[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718) · 842aba50

dtc authored Dec 04, 2025


Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>

842aba50

03 Dec, 2025 4 commits

[CI] fix docker image build by specifying merge-base commit id when... · 1109f982

Shengqi Chen authored Dec 04, 2025


[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>

1109f982

[Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671) · b5407869

Elizabeth Thomas authored Dec 03, 2025


Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Jane Xu <janeyx@meta.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>

b5407869

[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452) · f5d3d93c
Amr Mahdi authored Dec 03, 2025
```
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
```
f5d3d93c
add VLLM_USE_OPT_RESHAPE_AND_CACHE、VLLM_USE_FUSE_SILU_AND_MUL and... · 15a55773
zhuwenwen authored Dec 03, 2025
```
add VLLM_USE_OPT_RESHAPE_AND_CACHE、VLLM_USE_FUSE_SILU_AND_MUL and VLLM_USE_TOPK_RENORM for qwen3-30b
```
15a55773

02 Dec, 2025 3 commits
- [responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413) · 52cb349f
  Andrew Xia authored Dec 02, 2025
```
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
```
  52cb349f
- add VLLM_USE_V32_ENCODE to use encoding_dsv32.py · ba7bcccd
  zhuwenwen authored Dec 02, 2025
  
  ba7bcccd
- [CI] Renovation of nightly wheel build & generation (take 2) (#29838) · 4b612664
  Shengqi Chen authored Dec 02, 2025
```
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
```
  4b612664
01 Dec, 2025 4 commits
- Revert #29787 and #29690 (#29815) · 1336a1ea
  Kevin H. Luu authored Dec 01, 2025
  
  1336a1ea
- [CI] Renovation of nightly wheel build & generation (#29690) · 36db0a35
  Shengqi Chen authored Dec 01, 2025
```
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
```
  36db0a35
- Make PyTorch profiler gzip and CUDA time dump configurable (#29568) · 1ab8fc81
  Yifei Zhang authored Dec 01, 2025
```
Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
```
  1ab8fc81
- [MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141) · f72a817b
  Shu Wang authored Nov 30, 2025
```
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
```
  f72a817b
29 Nov, 2025 1 commit

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722) · 1656ad37

Jinzhen Lin authored Nov 29, 2025


Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>

1656ad37