Commits · d0510f08feaa155c4d99f01667e1b5673652478c · change / sglang · GitLab

18 Jul, 2025 8 commits
- Revert "Fix different device type adjustment in PP" (#8141) · d0510f08
  Sai Enduri authored Jul 18, 2025
  
  d0510f08
- Hicache Storage Layer Prototype (#7704) · 9d33fcfb
  Zhiqiang Xie authored Jul 18, 2025
  
  9d33fcfb
- [Quantization][w8a8_int8] Fix weight loading issue for w8a8_int8 path with... · 7891bac1
  jianan-gu authored Jul 18, 2025
```
[Quantization][w8a8_int8] Fix weight loading issue for w8a8_int8 path with "ignore" layer list in quantization config (#7820)
```
  7891bac1
- [CPU][Llama4] Fix Llama4 MoE inputs with "apply_router_weight_on_input" (#7889) · 48c1fa7b
  jianan-gu authored Jul 18, 2025
  
  48c1fa7b
- load draft model fix (#7506) · 8aa5ae6b
  yilian49 authored Jul 18, 2025
  
  8aa5ae6b
- Feat: Support Granite 3.0 MoE in SGLang (#7959) · 8a323557
  Minglei Zhu authored Jul 17, 2025
  
  8a323557
- [Fix][Ready]Fix register spilling in cutlass nvfp4 gemm kernel on Blackwell (#8127) · 6e92da8f
  Qi Yuhang authored Jul 18, 2025
  
  6e92da8f
- refactor: simply MultimodalTokens logic (#7924) · e1020dc5
  Mick authored Jul 18, 2025
  
  e1020dc5
17 Jul, 2025 12 commits
- feat: add production metric for retracted requests due to insufficient kvcache (#7030) · 3586b4ce
  Zhao Chen authored Jul 18, 2025
```
Signed-off-by: Zhao Chen <zhaochen.zju@gmail.com>
```
  3586b4ce
- [Hunyuan]: Fix Dense Model Support (#8117) · 42960214
  Asher authored Jul 18, 2025
```
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
```
  42960214
- fix: update HostKVCache init to report correct msg when available memory is not enough (#8102) · 01857fab
  Ziqi Fan authored Jul 17, 2025
  
  01857fab
- Super tiny fix typo (#8046) · 519ff5c8
  fzyzcjy authored Jul 17, 2025
  
  519ff5c8
- [kernel] opt moe align block kernel by block/warp scan algorithm (#7884) · af1cc8fe
  Yuan Luo authored Jul 17, 2025
  
  af1cc8fe
- Refactor: move all quantization-related code to `srt/layer/quantization` (#7989) · 49b87774
  Cheng Wan authored Jul 17, 2025
  
  49b87774
- [ci] recover 8-gpu deepep test (#8105) · 02404a1e
  Cheng Wan authored Jul 17, 2025
  
  02404a1e
- [Fix] ensure DeepGEMM is only enabled for FP8_W8A8 models (#8110) · 5c08a36c
  hzh0425 authored Jul 17, 2025
  
  5c08a36c
- [ci] disable memory imbalance check for draft worker (#8108) · 9069884b
  Cheng Wan authored Jul 16, 2025
  
  9069884b
- [ci] limit cmake build nproc (#8100) · 8a7a7770
  Simo Lin authored Jul 16, 2025
  
  8a7a7770
- feat: add tp_rank, pp_rank and dp_rank labels for scheduler metrics (#7597) · 795668dc
  Yingchun Lai authored Jul 17, 2025
```
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
```
  795668dc
- refactor: unify names of the feature field of MultimodalDataItem (#8075) · 4395c87a
  Mick authored Jul 17, 2025
  
  4395c87a
16 Jul, 2025 11 commits
- [1/n] chore: decouple quantization implementation from vLLM dependency (#7992) · c28ad199
  Peng Zhang authored Jul 17, 2025
  
  c28ad199
- [Feature] Layer-wise Prefill (#7634) · 570d3343
  Xiaoze Fan authored Jul 17, 2025
```
Signed-off-by: jason-fxz <jason341132@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
```
  570d3343
- [misc] update nvshmem and pin deepEP commit hash (#8098) · d9eb5efc
  Simo Lin authored Jul 16, 2025
  
  d9eb5efc
- fix greenctx stream compability (#8090) · 6dc4af49
  Peng Zhang authored Jul 16, 2025
  
  6dc4af49
- Fix CI xeon test with triton 3.3.1 (#8086) · b188a89a
  YanbingJiang authored Jul 16, 2025
  
  b188a89a
- Revert "feat: replace Decord with video_reader-rs" (#8077) · 497efe74
  Mick authored Jul 16, 2025
  
  497efe74
- Use device_group for all_gather when disabling overlap scheduling (#8001) · 69f453e5
  Qiaolin Yu authored Jul 15, 2025
  
  69f453e5
- Fix different device type adjustment in PP (#7760) · 3bc43c68
  Qiaolin Yu authored Jul 15, 2025
  
  3bc43c68
- update transformers to 4.53.2 (#8029) · 7498522f
  Xinyuan Tong authored Jul 15, 2025
```
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
```
  7498522f
- remove kv_a.congigous in DeepseekV2AttentionMLA (#8058) · 194841e3
  strgrb authored Jul 16, 2025
```
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
```
  194841e3
- feat: replace Decord with video_reader-rs (#5163) · ebff5fcb
  kozo authored Jul 16, 2025
```
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
```
  ebff5fcb
15 Jul, 2025 9 commits
- Update amd docker image. (#8045) · f06bd210
  Sai Enduri authored Jul 15, 2025
```
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
```
  f06bd210
- H20 tune config for Kimi (#8047) · 14f1f151
  yhang authored Jul 16, 2025
  
  14f1f151
- concurrently load weights of DeepseekV2ForCausalLM (#7943) · 38216cf0
  Albert authored Jul 16, 2025
```
Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>
```
  38216cf0
- fix: resolve arm build issue (#8052) · 4a883795
  Yineng Zhang authored Jul 15, 2025
  
  4a883795
- Fix the input tools format and history tool_calls in OpenAI API (#6556) · f1f1d1d4
  jiawei authored Jul 15, 2025
  
  f1f1d1d4
- fix: remove redundant rotary embedding cache recomputation in MiniCPM (#8022) · 9120e83d
  Xinyuan Tong authored Jul 15, 2025
  
  9120e83d
- feat: update multimodal data handling in engine entrypoint (#8002) · 6e923dbd
  Xinyuan Tong authored Jul 15, 2025
```
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
```
  6e923dbd
- [feat]Support fusion kernel for constructing quant input and scale factor for... · c268c11c
  Qi Yuhang authored Jul 15, 2025
```
[feat]Support fusion kernel for constructing quant input and scale factor for fp8_blockwise_scaled_grouped_mm (#8023)
```
  c268c11c
- Update CODEOWNERS (#8044) · e6d59884
  Chang Su authored Jul 14, 2025
  
  e6d59884