- 23 Mar, 2026 1 commit
-
-
guanyu1 authored
-
- 21 Mar, 2026 1 commit
-
-
yangql authored
关闭sparse_mla的num_head到64/128的pad,以及添加控制fp8_use_mixed_batch模式的环境变量控制,FP8_USE_MIXED_BATCH,默认为false,为分离模式
-
- 18 Mar, 2026 2 commits
-
-
laibao authored
新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制 fused sum+mul+add 开关。 在 DeepseekV2MoE 中增加 fused 路径,预计算 shared_output,并下传 iqis 与 routed_scaling_factor。 扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口,统一透传 i_q/i_s/shared_output/routed_scaling_factor。 同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现,支持在内核侧完成 sum+mul+add。
-
yangql authored
-
- 17 Mar, 2026 1 commit
-
-
王敏 authored
-
- 16 Mar, 2026 3 commits
- 15 Mar, 2026 1 commit
-
-
fanwl authored
- Add VLLM_V1_USE_FA_UNIFIED_ATTN_2D 环境变量 - 0: Triton attention, 1: FA unified attention
-
- 12 Mar, 2026 6 commits
- 09 Mar, 2026 2 commits
- 07 Mar, 2026 1 commit
-
-
wanglong3 authored
-
- 06 Mar, 2026 3 commits
- 26 Feb, 2026 1 commit
-
-
laibao authored
新增 VLLM_V1_USE_REDUCED_TOPK_TOPP_SAMPLER 开关并补充适用场景说明 在 V1 GPU 输入批预计算 max_top_k/has_any_no_top_k,native sampler 满足条件时走 reduced fast path,异常自动回退
-
- 24 Feb, 2026 2 commits
-
-
laibao authored
新增 router_capture 工具,用于按 num_tokens/rank 过滤并落盘 MoE router logits 在 Qwen3MoeSparseMoeBlock 中接入采集调用,并在 torch.compile 场景下自动跳过 补充 VLLM_MOE_ROUTER_CAPTURE* 环境变量
-
laibao authored
- 新增环境变量 `VLLM_V1_FAST_TOKEN_ID_COPY`(默认关闭) - 在 `CachedRequestState` 中缓存 int32 的 prompt token ids(numpy 数组) - 开启后在 `InputBatch` 中使用 `np.copyto` 拷贝 prompt/output token ids
-
- 11 Feb, 2026 2 commits
- 10 Feb, 2026 1 commit
-
-
lixh authored
-
- 09 Feb, 2026 2 commits
- 06 Feb, 2026 4 commits
- 05 Feb, 2026 1 commit
-
-
zhuwenwen authored
-
- 04 Feb, 2026 3 commits
- 28 Jan, 2026 1 commit
-
-
Roger Wang authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by:
Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit b539f988)
-
- 27 Jan, 2026 1 commit
-
-
Roger Wang authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by:
Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 26 Jan, 2026 1 commit
-
-
dolpm authored
Signed-off-by:dolpm <34420038+dolpm@users.noreply.github.com>
-