- 14 Apr, 2026 3 commits
-
-
laibao authored
-
laibao authored
fix: - 修复 Step3p5 MTP 在加载 checkpoint 时对可选标量参数的识别逻辑,将 q/k/v zero_point 纳入 optional 参数集合,避免参数校验与加载不一致。 revert: - 回退 EAGLE 中针对 MTP shared_head.head 强制复用 target lm_head 的逻辑,避免与当前 Step3p5 MTP 权重结构产生冲突。 目的: - 降低 Step3p5 MTP 在权重加载阶段的兼容性问题,减少由于 lm_head 共享路径不一致导致的异常行为,方便后续排查和协作。
- 10 Apr, 2026 1 commit
-
-
xuxz authored
-
- 03 Apr, 2026 2 commits
- 02 Apr, 2026 1 commit
-
-
xuxz authored
-
- 01 Apr, 2026 1 commit
-
-
王敏 authored
-
- 26 Mar, 2026 2 commits
- 24 Mar, 2026 2 commits
- 23 Mar, 2026 1 commit
-
-
guanyu1 authored
-
- 21 Mar, 2026 4 commits
- 19 Mar, 2026 1 commit
-
-
王敏 authored
-
- 18 Mar, 2026 1 commit
-
-
yangql authored
-
- 17 Mar, 2026 2 commits
- 16 Mar, 2026 3 commits
- 15 Mar, 2026 1 commit
-
-
fanwl authored
- Add VLLM_V1_USE_FA_UNIFIED_ATTN_2D 环境变量 - 0: Triton attention, 1: FA unified attention
-
- 12 Mar, 2026 5 commits
- 11 Mar, 2026 2 commits
- 09 Mar, 2026 1 commit
-
-
yangql authored
-
- 04 Mar, 2026 2 commits
- 03 Mar, 2026 1 commit
-
-
zhuwenwen authored
-
- 02 Mar, 2026 1 commit
-
-
王敏 authored
-
- 26 Feb, 2026 1 commit
-
-
laibao authored
新增 VLLM_V1_USE_REDUCED_TOPK_TOPP_SAMPLER 开关并补充适用场景说明 在 V1 GPU 输入批预计算 max_top_k/has_any_no_top_k,native sampler 满足条件时走 reduced fast path,异常自动回退
-
- 24 Feb, 2026 1 commit
-
-
laibao authored
- 新增环境变量 `VLLM_V1_FAST_TOKEN_ID_COPY`(默认关闭) - 在 `CachedRequestState` 中缓存 int32 的 prompt token ids(numpy 数组) - 开启后在 `InputBatch` 中使用 `np.copyto` 拷贝 prompt/output token ids
-
- 16 Feb, 2026 1 commit
-
-
Rayyyyy authored
-