Commits · 0d6ccf68fa2c439e17d02f26c4044ed5df7f7099 · OpenDAS / vllm_cscc · GitLab

03 Feb, 2026 32 commits
- [P/D] rework mooncake connector and introduce its bootstrap server (#31034) · 0d6ccf68
  dtc authored Feb 04, 2026
```
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
```
  0d6ccf68
- [Bugfix] Fix startup hang for Granite Speech (#33699) · 18e7cbbb
  Cyrus Leung authored Feb 03, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  18e7cbbb
- [Voxtral models] Skip warm-up to skip confusing error message in warm-up (#33576) · f0d52517
  Patrick von Platen authored Feb 03, 2026
```
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
  f0d52517
- [MM] Pass `prefix` parameter to MMEncoderAttention (#33674) · 5c4f2dd6
  Shanshan Shen authored Feb 03, 2026
```
Signed-off-by: shen-shanshan <467638484@qq.com>
```
  5c4f2dd6
- [Bugfix] Do not add extra \n for image-only cases when constructing multimodal... · f3d8a346
  wang.yuqi authored Feb 03, 2026
```
[Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. (#33647)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
```
  f3d8a346
- Feat/add nemotron nano v3 tests (#33345) · 4bc913ae
  shaharmor98 authored Feb 03, 2026
  
  4bc913ae
- [Bugfix][Async][Connector] avoid vllm-side double free during async scheduling... · fbb3cf69
  Kuntai Du authored Feb 03, 2026
```
[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
```
  fbb3cf69
- Document NixlConnector backend selection via kv_connector_extra_config (#33552) · 2df2b349
  Krish Gupta authored Feb 03, 2026
```
Signed-off-by: KrxGu <krishom70@gmail.com>
```
  2df2b349
- Fix Gemma3n audio encoder for Transformers v5 (#33673) · 2a8d84e6
  Harry Mellor authored Feb 03, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  2a8d84e6
- [Models] Intern-S1-Pro (#33636) · a3acfa10
  zxy authored Feb 03, 2026
```
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
```
  a3acfa10
- Fix Gemma3 GGUF for Transformers v5 (#33683) · be8168ff
  Harry Mellor authored Feb 03, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  be8168ff
- Fix offline test for Transformers v5 (#33682) · f6af3462
  Harry Mellor authored Feb 03, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  f6af3462
- [Bugfix] fix qwen3-asr response error (#33644) · ceab70c8
  Song Zhixin authored Feb 03, 2026
```
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
```
  ceab70c8
- [Misc] Update default image format of `encode_base64` (#33656) · 52683ccb
  Cyrus Leung authored Feb 03, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  52683ccb
- [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM... · e346e2d0
  Michael Goin authored Feb 03, 2026
```
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620)
Signed-off-by: mgoin <mgoin64@gmail.com>
```
  e346e2d0
- [Refactor] Clean up pooling serial utils (#33665) · 83449a5f
  Cyrus Leung authored Feb 03, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  83449a5f
- [Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token (#33642) · dad2d6a5
  Lucas Hänke de Cansino authored Feb 03, 2026
```
Signed-off-by: l4b4r4b4b4 <lucas.cansino@mail.de>
```
  dad2d6a5
- [CI/Build] Investigate torchrun distributed tests hanging issue (#33650) · 32e84fa1
  Isotr0py authored Feb 03, 2026
```
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
```
  32e84fa1
- [torch.compile] Document the workaround to standalone_compile failing (#33571) · fd9c83d0
  Richard Zou authored Feb 02, 2026
```
Signed-off-by: Richard Zou <zou3519@gmail.com>
```
  fd9c83d0
- [Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535) · b95cc501
  杨朱 · Kiki authored Feb 03, 2026
```
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
```
  b95cc501
- [Minor] Some code simplification in `scheduler.py` (#33597) · 61397891
  Nick Hill authored Feb 02, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  61397891
- [Misc] Remove deprecated profiler environment variables (#33536) · ef248ff7
  杨朱 · Kiki authored Feb 03, 2026
```
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
```
  ef248ff7
- [XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379) · e1060448
  Kunshang Ji authored Feb 03, 2026
```
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
```
  e1060448
- [Bugfix] Interleaved thinking keeps compatibility with reasoning_content (#33635) · bf001da4
  Chauncey authored Feb 03, 2026
```
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Koushik Dutta <koushd@gmail.com>
```
  bf001da4
- [CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles (#33553) · a0a984ac
  杨朱 · Kiki authored Feb 03, 2026
```
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
```
  a0a984ac
- Fix quantized Falcon-H1 model loading issues (#32728) · f1cb9b55
  Shengliang Xu authored Feb 02, 2026
```
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
  f1cb9b55
- [Frontend] Add sampling parameters to Responses API (#32609) · 4c4b6f7a
  Daniel Mescheder authored Feb 03, 2026
```
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
```
  4c4b6f7a
- [Bugfix] Fix mm budget setting for Qwen Omni models (#33634) · 10546f92
  Roger Wang authored Feb 02, 2026
```
Signed-off-by: Roger Wang <hey@rogerw.io>
```
  10546f92
- [Feature][CPU Backend]: Optimize ARM vectorization backend (#30329) · e69c990c
  Radu Salavat authored Feb 03, 2026
```
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
```
  e69c990c
- [torch.compile] Don't do the fast moe cold start optimization if there is... · 5eac9a1b
  Richard Zou authored Feb 02, 2026
```
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624)
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
```
  5eac9a1b
- [CI/Build] add directions for CPU image upload to Docker Hub (#32032) · 1b60b45d
  Nathan Weinberg authored Feb 02, 2026
```
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Signed-off-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
```
  1b60b45d
- [BugFix] DPMetadata raises assert error for dense model (#32739) · 4b3803d1
  Dezhan authored Feb 02, 2026
```
Co-authored-by: Dezhan Tu <dztu@meta.com>
```
  4b3803d1
02 Feb, 2026 8 commits
- [Voxtral Realtime] Introduce global log mel max (#33574) · 5019c59d
  Patrick von Platen authored Feb 02, 2026
```
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
```
  5019c59d
- fix cutlass_3x_gemm_fp8_blockwise on sm103a (#32224) · 089cd4f0
  Lain authored Feb 02, 2026
```
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
```
  089cd4f0
- fix memory for online fp8 quantization with streaming weight load (#31914) · 0130223b
  Vasiliy Kuznetsov authored Feb 02, 2026
```
Signed-off-by: vasiliy <vasiliy@fb.com>
```
  0130223b
- [UX] Format attention backend log line (#33570) · 5d1aef30
  Matthew Bonanni authored Feb 02, 2026
```
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
```
  5d1aef30
- Reduce the kernel overhead when num of active loras is smaller than max... · ffe1fc7a
  yugong333 authored Feb 02, 2026
```
  Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
```
  ffe1fc7a
- Update huggingface-hub again (#33567) · 8b7346d5
  Harry Mellor authored Feb 02, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  8b7346d5
- Remove incorrect tokenizer info test (#33565) · 6141ebe0
  Harry Mellor authored Feb 02, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  6141ebe0
- [Model] Use mm_position to compute mrope positions for GLM-4.xV (#33039) · 199e3cb4
  Yang Liu authored Feb 02, 2026
```
Signed-off-by: Yang <lymailforjob@gmail.com>
```
  199e3cb4