更新awq 相关readme

1884fd60 · gaoqiong · 8efb9210 · 1884fd60
Commit 1884fd60 authored May 29, 2024 by gaoqiong
Hide whitespace changes
Inline Side-by-side

Showing with 14 additions and 1 deletion

README.md README.md +14 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ cd dist && pip3 install lmdeploy*
 # <model_format> 保存输出的目标路径（默认./workspace）
 # <tp> 用于张量并行的GPU数量应该是2^n
-lmdeploy convert --model_name ${model_name} --model_path ${model_path} --model_format ${model_format} --tokenizer_path ${tokenizer_path} --dst_path ${dst_path} --tp ${tp}
+lmdeploy convert ${model_name} ${model_path} --model_format ${model_format} --tokenizer_path ${tokenizer_path} --dst_path ${dst_path} --tp ${tp}
 ```
 ### 运行
 #### bash界面运行
@@ -148,6 +148,19 @@ api-server的详细使用可以参照[这里](docs/zh_cn/restful_api.md)的文
 codellama模型的部署可以参照[codellama](docs/zh_cn/supported_models/codellama.md)
+### AWQ 量化推理
+本版本支持量化推理功能，步骤如下：
+```bash
+#step1:模型转换：
+lmdeploy convert ${model_name} ${model_path} --model_format awq --group-size ${group_size} --tp ${tp}
+#step1:模型运行
+lmdeploy chat turbomind ./workspace --tp ${tp}
+```
+注意事项：
+1.该版本暂时仅支持tp=1 单卡量化推理；             
+2.该版本量化推理功能仅支持先通过convert模型转换为turbomind格式，然后进行推理运行，暂时不知道hf模型直接量化推理；               
+3.该版本暂时不支持通过数据集进行量化功能，需要在别处获取量化模型；                       
 ## result
 ![qwen推理](docs/dcu/interlm.gif)