Unverified Commit 683c3fe9 authored by Zhihao Lin's avatar Zhihao Lin Committed by GitHub
Browse files

[Doc] Fix quantization docs link (#367)

parent d065f3e4
...@@ -219,11 +219,11 @@ pip install deepspeed ...@@ -219,11 +219,11 @@ pip install deepspeed
LMDeploy uses [AWQ](https://arxiv.org/abs/2306.00978) algorithm for model weight quantization LMDeploy uses [AWQ](https://arxiv.org/abs/2306.00978) algorithm for model weight quantization
[Click here](./docs/zh_cn/w4a16.md) to view the test results for weight int4 usage. [Click here](./docs/en/w4a16.md) to view the test results for weight int4 usage.
#### KV Cache INT8 Quantization #### KV Cache INT8 Quantization
[Click here](./docs/zh_cn/kv_int8.md) to view the usage method, implementation formula, and test results for kv int8. [Click here](./docs/en/kv_int8.md) to view the usage method, implementation formula, and test results for kv int8.
> **Warning**<br /> > **Warning**<br />
> runtime Tensor Parallel for quantilized model is not available. Please setup `--tp` on `deploy` to enable static TP. > runtime Tensor Parallel for quantilized model is not available. Please setup `--tp` on `deploy` to enable static TP.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment