Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Lmdeploy
Commits
1884fd60
Commit
1884fd60
authored
May 29, 2024
by
gaoqiong
Browse files
更新awq 相关readme
parent
8efb9210
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
14 additions
and
1 deletion
+14
-1
README.md
README.md
+14
-1
No files found.
README.md
View file @
1884fd60
...
...
@@ -99,7 +99,7 @@ cd dist && pip3 install lmdeploy*
# <model_format> 保存输出的目标路径(默认./workspace)
# <tp> 用于张量并行的GPU数量应该是2^n
lmdeploy convert
--model_name ${model_name} --model_path
${model_path} --model_format ${model_format} --tokenizer_path ${tokenizer_path} --dst_path ${dst_path} --tp ${tp}
lmdeploy convert
${model_name}
${model_path} --model_format ${model_format} --tokenizer_path ${tokenizer_path} --dst_path ${dst_path} --tp ${tp}
```
### 运行
#### bash界面运行
...
...
@@ -148,6 +148,19 @@ api-server的详细使用可以参照[这里](docs/zh_cn/restful_api.md)的文
codellama模型的部署可以参照
[
codellama
](
docs/zh_cn/supported_models/codellama.md
)
### AWQ 量化推理
本版本支持量化推理功能,步骤如下:
```
bash
#step1:模型转换:
lmdeploy convert
${
model_name
}
${
model_path
}
--model_format
awq
--group-size
${
group_size
}
--tp
${
tp
}
#step1:模型运行
lmdeploy chat turbomind ./workspace
--tp
${
tp
}
```
注意事项:
1.
该版本暂时仅支持tp=1 单卡量化推理;
2.
该版本量化推理功能仅支持先通过convert模型转换为turbomind格式,然后进行推理运行,暂时不知道hf模型直接量化推理;
3.
该版本暂时不支持通过数据集进行量化功能,需要在别处获取量化模型;
## result

...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment