- 28 Jan, 2026 1 commit
-
-
laibao authored
- 缺失请求统一延后重新加入,避免同一轮重复写入 - 抢占恢复用覆盖写入(add_row),正常运行用增量追加(append_row) - 保持正常请求的追加语义不变
-
- 26 Jan, 2026 1 commit
-
-
zhuwenwen authored
-
- 23 Jan, 2026 1 commit
-
-
zhuwenwen authored
-
- 17 Jan, 2026 3 commits
- 16 Jan, 2026 1 commit
-
-
zhuwenwen authored
add VLLM_USE_FUSED_CACHE_QUANT_BMM_MLA to use fused rmsnorm + contiguous + rope(for dpsk-v3) + concat_and_cache_mla + q quant, control bmm(todo) + cat +mla (fp8)
-
- 14 Jan, 2026 2 commits
- 13 Jan, 2026 3 commits
- 12 Jan, 2026 2 commits
- 09 Jan, 2026 1 commit
-
-
jujl1 authored
-
- 07 Jan, 2026 1 commit
-
-
laibao authored
- repeat_counts/CPU 元数据为 numpy/array-like 时会在 repeat_interleave/.to() 崩溃 - 统一转换为 CPU torch.Tensor 后再扩展并拷到 GPU
-
- 06 Jan, 2026 1 commit
-
-
jujl1 authored
-
- 04 Jan, 2026 2 commits
- 31 Dec, 2025 3 commits
- 29 Dec, 2025 1 commit
-
-
yangql authored
-
- 26 Dec, 2025 1 commit
-
-
zhuwenwen authored
-
- 25 Dec, 2025 2 commits
- 24 Dec, 2025 3 commits
- 23 Dec, 2025 2 commits
- 22 Dec, 2025 2 commits
- 21 Dec, 2025 1 commit
-
-
yangql authored
-
- 19 Dec, 2025 2 commits
- 18 Dec, 2025 4 commits