"vscode:/vscode.git/clone" did not exist on "334c715e0f4f4de2d3de90bd0b9bba59df143eda"
  • zhuwenwen's avatar
    support fa kvcache fp8, add VLLM_USE_QUERY_QUANT to not use q quant(todo) · 0dfb30d5
    zhuwenwen authored
    [opt] 优化epsp代码, 零消耗添加epsp
    update VLLM_USE_FUSED_RMS_ROPE=0 (default). for qwen3, VLLM_USE_FUSED_RMS_ROPE=1 (default)
    feat(moe/marlin): Marlin W16A16 MoE 自动探测并预打包(去掉手动开关)
    perf(qwen3): 融合 q/k RMSNorm + RoPE
    fused_moe_fp8接入lmslim
    0dfb30d5
__init__.py 107 KB