问下DCU版LMDeploy是否支持以下量化特性?
想基本一个baseline版本继续优化,能否告知目前dcu版支持的程度?下一步打算支持的版本?是否支持LMDeploy github上列的这些量化方法
Effective Quantization: LMDeploy supports weight-only and k/v quantization, and the 4-bit inference performance is 2.4x higher than FP16. The quantization quality has been confirmed via OpenCompass evaluation.
Excellent Compatibility: LMDeploy supports KV Cache Quant, AWQ and Automatic Prefix Caching to be used simultaneously.