[feat]添加VLLM_SPEC_DECODE_EAGER环境变量,用于选择draft model是否强制使用eager模式,在hygon cpu上ds3 mtp提升较大 See merge request dcutoolkit/deeplearing/vllm!91
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models.
V0.7.2 dev custom See merge request dcutoolkit/deeplearing/vllm!90
[fix]修复fused_moe.py中fused_moe接口未初始化moe_ep_size导致的deekseek等模型报错 See merge request dcutoolkit/deeplearing/vllm!89
V0.7.2 dev yangql See merge request dcutoolkit/deeplearing/vllm!85
V0.7.2 dev deepseek v3/r1 block-int8量化支持 See merge request dcutoolkit/deeplearing/vllm!83