• laibao's avatar
    perf(qwen3): 融合 q/k RMSNorm + RoPE · 7cd7bf8a
    laibao authored
    新增 VLLM_USE_FUSED_RMS_ROPE 分支,走 fused 路径
    注册 torch.ops.vllm.rms_rotary_embedding_fuse(direct_register_custom_op)
    cos_sin_cache 自动转 device/dtype 并缓存,避免每次重复拷贝
    7cd7bf8a
qwen3.py 15.5 KB