• zhuwenwen's avatar
    Synchronize the modifications from the 12th to the 17th: · b66c8e4b
    zhuwenwen authored
    修复CompressedTensorsLinearMethod中的w4a16的冲突问题
    feat(moe): add Marlin W16A16 fused MoE behind VLLM_USE_MARLIN_W16A16_MOE
    replace the fp8_mqa_logits and fp8_paged_mqa_logits interfaces in deepgemm with mqa_logits and paged_mqa_logits from lightop
    b66c8e4b
deepseek_v2.py 69 KB