x接入mla_cat算子仅在nmz和kvcache-fp8情况下生效,默认关闭,开启需要export VLLM_USE_CAT_MLA=1 See merge request dcutoolkit/deeplearing/vllm!513
Attach a file by drag & drop or click to upload