"vscode:/vscode.git/clone" did not exist on "e29059407251c071a75b1b1d89471326add28b90"
  • zhuwenwen's avatar
    support fa kvcache fp8, add VLLM_USE_QUERY_QUANT to not use q quant(todo) · 0dfb30d5
    zhuwenwen authored
    [opt] 优化epsp代码, 零消耗添加epsp
    update VLLM_USE_FUSED_RMS_ROPE=0 (default). for qwen3, VLLM_USE_FUSED_RMS_ROPE=1 (default)
    feat(moe/marlin): Marlin W16A16 MoE 自动探测并预打包(去掉手动开关)
    perf(qwen3): 融合 q/k RMSNorm + RoPE
    fused_moe_fp8接入lmslim
    0dfb30d5
deepseek_v2.py 76.1 KB