"tests/models/decoder_only/language/test_aqlm.py" did not exist on "2b7949c1c2e34de41d9cfc84dd0e377cc6bd58c2"
- 24 Nov, 2025 1 commit
-
-
jujl1 authored
-
- 15 Nov, 2025 1 commit
-
-
王敏 authored
-
- 13 Nov, 2025 6 commits
-
-
zhuwenwen authored
restore the default settings of disable_cascade_attn add VLLM_USE_OPT_ZEROS to replace triton_ (torch.zeros) set default_max_num_batched_tokens = 10240 update qwen3_moe of layernorm
-
zhuwenwen authored
解决w8a8 pp16开启marlin的oom问题
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
- 在环境变量中引入 `VLLM_SCHED_ENABLE_MINIMAL_INJECTION` 以控制流水线并行调度的最小注入。 - 调整 Scheduler 逻辑以使用新的最小注入功能。 - 更新调度逻辑以利用输出占位符,确保在解码过程中避免 0-token 停滞。 - 增强 Scheduler,根据批次队列状态管理最小进度注入。
-
王敏 authored
-
- 10 Nov, 2025 1 commit
-
-
王敏 authored
-
- 08 Nov, 2025 1 commit
-
-
王敏 authored
-
- 07 Nov, 2025 4 commits
- 06 Nov, 2025 2 commits
- 04 Nov, 2025 1 commit
-
-
zhuwenwen authored
-
- 03 Nov, 2025 3 commits
- 01 Nov, 2025 1 commit
-
-
王敏 authored
-
- 31 Oct, 2025 1 commit
-
-
zhuwenwen authored
-
- 29 Oct, 2025 3 commits
- 28 Oct, 2025 1 commit
-
-
maxiao1 authored
-
- 27 Oct, 2025 1 commit
-
-
王敏 authored
-
- 24 Oct, 2025 1 commit
-
-
zhuwenwen authored
support prefix cache on kme fix the error in test_moe caused by moe align not supporting 511 and 211 multi-modal switching to torch implementation on z100l&k100
-
- 20 Oct, 2025 1 commit
-
-
maxiao1 authored
-
- 17 Oct, 2025 3 commits
- 15 Oct, 2025 7 commits
- 14 Oct, 2025 1 commit
-
-
zhuwenwen authored
-