Update README.md

9f496f97 · xiabo · 0d240803 · 9f496f97
Commit 9f496f97 authored Nov 22, 2023 by xiabo
Hide whitespace changes
Inline Side-by-side

Showing with 109 additions and 34 deletions

README.md README.md +109 -34

No files found.
--- a/README.md
+++ b/README.md
@@ -13,6 +13,17 @@ LMDeploy 由 [MMDeploy](https://github.com/open-mmlab/mmdeploy) 和 [MMRazor](ht
 persistent batch 推理：进一步优化模型执行效率。
 LMdeploy官方github地址:[https://github.com/InternLM/lmdeploy](https://github.com/InternLM/lmdeploy)
+## 支持模型
+|     模型     | 模型并行 | FP16 | KV INT8 |
+| :----------: | :------: | :--: | :-----: |
+|    Llama     |   Yes    | Yes  |   Yes   |
+|    Llama2    |   Yes    | Yes  |   Yes   |
+| InternLM-7B  |   Yes    | Yes  |   Yes   |
+| InternLM-20B |   Yes    | Yes  |   Yes   |
+|   QWen-7B    |   Yes    | Yes  |   Yes   |
+|   QWen-14B   |   Yes    | Yes  |   Yes   |
+| Baichuan-7B  |   Yes    | Yes  |   Yes   |
+| Baichuan2-7B |   Yes    | Yes  |   No    |
 ## 安装
@@ -21,22 +32,25 @@ LMdeploy官方github地址:[https://github.com/InternLM/lmdeploy](https://github
 #### 编译环境准备
 提供2种环境准备方式：
-1. 基于光源pytorch基础镜像环境：镜像下载地址：[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch)，根据pytorch、python、dtk及系统下载对应的镜像版本。
+1. 基于光源pytorch基础镜像环境：镜像下载地址：[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch)，根据pytorch、python、dtk及系统下载对应的镜像版本。(若安装过慢，可以添加源:pip3 install xxx -i  https://pypi.tuna.tsinghua.edu.cn/simple/)
 ```shell
-pip install -r requirements.txt
+pip3 install -r requirements.txt
-pip install transformers==4.33.2
+pip3 install transformers==4.33.2
-pip install urllib3==1.24
+pip3 install urllib3==1.24
+pip3 install wheel
 yum install rapidjson
 ```
 2. 基于现有python环境：安装pytorch，pytorch whl包下载目录：[https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.04](https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.04)，根据python、dtk版本,下载对应pytorch的whl包。安装命令如下：
 ```shell
-pip install torch* (下载的torch的whl包)
+pip3 install torch* (下载的torch的whl包)
-pip install -r requirements.txt
+pip3 install -r requirements.txt
-pip install transformers==4.33.2
+pip3 install transformers==4.33.2
-pip install urllib3==1.24
+pip3 install urllib3==1.24
+pip3 install wheel
 yum install rapidjson
 ```
+注：需要GCC版本>=9.0
 #### 源码编译安装
 - 代码下载
@@ -60,58 +74,119 @@ cd dist && pip3 install lmdeploy*
 ```
 ## 模型服务
-### 部署 [LLaMA-2](https://github.com/facebookresearch/llama) 服务
+### 部署 [LLaMA](https://huggingface.co/huggyllama) 服务
+请从[这里](https://huggingface.co/huggyllama) 下载 llama 模型，参考如下命令部署服务：
+以7B为例：
+```
+1、模型转换
+# <model_name> 模型的名字 （'llama', 'internlm', 'vicuna', 'internlm-chat-7b', 'internlm-chat', 'internlm-chat-7b-8k', 'internlm-chat-20b', 'internlm-20b', 'baichuan-7b', 'baichuan2-7b', 'llama2', 'qwen-7b', 'qwen-14b',）
+# <model_path> 模型路径
+# <model_format> 模型的格式 （'llama', 'hf', 'qwen'）
+# <tokenizer_path> tokenizer模型的路径（默认None,会去model_path里面找qwen.tiktoken）
+# <model_format> 保存输出的目标路径（默认./workspace）
+# <tp> 用于张量并行的GPU数量应该是2^n
+mdeploy convert --model_name llama --model_path /path/to/model --model_format llama --tokenizer_path None --dst_path ./workspace_llama --tp 1
+2、运行
+# bash界面运行
+lmdeploy chat turbomind --model_path ./workspace_llama --tp 1     # 输入问题后执行2次回车进行推理
+# 在服务器界面运行：
+在bash端运行：
+# <model_path_or_server> 部署模型的路径或tritonserver URL或restful api URL。前者用于与gradio直接运行服务。后者用于默认情况下使用tritonserver运行。如果输入URL是restful api。请启用另一个标志“restful_api”。
+# <server_name> gradio服务器的ip地址
+# <server_port> gradio服务器的ip的端口
+# <batch_size> 于直接运行Turbomind的batch大小 （默认32）
+# <tp> 用于张量并行的GPU数量应该是2^n （和模型转换的时候保持一致）
+# <restful_api> modelpath_or_server的标志（默认是False）
+lmdeploy serve gradio --model_path_or_server ./workspace_llama --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
+在网页上输入{ip}:{pord}即可进行对话
+```
+### 部署 [llama2](https://huggingface.co/meta-llama) 服务
 请从[这里](https://huggingface.co/meta-llama) 下载 llama2 模型，参考如下命令部署服务：
 以7B为例：
 ```
 1、模型转换
-python3 -m lmdeploy.serve.turbomind.deploy llama2 path/to/chinese-llama2-7b-hf hf path/to/chinese-llama2-7b-hf/tokenizer.model ./workspace_llama
+mdeploy convert --model_name llama2 --model_path /path/to/model --model_format hf --tokenizer_path None --dst_path ./workspace_llama2 --tp 1  # 
 2、运行
- 在命令行界面运行：
+# bash界面运行
-python3 -m lmdeploy.turbomind.chat ./workspace_llama
+lmdeploy chat turbomind --model_path ./workspace_llama2 --tp 1
- 在服务器界面运行：
+# 在服务器界面运行：
-python3 -m lmdeploy.serve.gradio.app ./workspace_llama 10.6.10.67
+在bash端运行：
-打开网页输入10.6.10.67:6006
+lmdeploy serve gradio --model_path_or_server ./workspace_llama2 --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
+在网页上输入{ip}:{pord}即可进行对话
 ```
 ### 部署 [internlm](https://huggingface.co/internlm/) 服务
-请从[这里](https://huggingface.co/internlm) 下载 llama2 模型，参考如下命令部署服务：
+请从[这里](https://huggingface.co/internlm) 下载 internlm 模型，参考如下命令部署服务：
 以7B为例：
 ```
 1、模型转换
-python3 -m lmdeploy.serve.turbomind.deploy path/to/internlm-chat-7b internlm-chat-7b hf None ./workspace_intern
+mdeploy convert --model_name model_name --model_path /path/to/model --model_format hf --tokenizer_path None --dst_path ./workspace_intern --tp 1  # 根据模型的类型选择model_name是internlm-chat还是internlm
 2、运行
- 在命令行界面运行：
+# bash界面运行
-python3 -m lmdeploy.turbomind.chat ./workspace_intern
+lmdeploy chat turbomind --model_path ./workspace_intern --tp 1
- 在服务器界面运行：
+# 在服务器界面运行：
-python3 -m lmdeploy.serve.gradio.app ./workspace_intern 10.6.10.67
+在bash端运行：
-打开网页输入10.6.10.67:6006
+lmdeploy serve gradio --model_path_or_server ./workspace_intern --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
+在网页上输入{ip}:{pord}即可进行对话
 ```
 ### 部署 [baichuan](https://huggingface.co/baichuan-inc) 服务
 请从[这里](https://huggingface.co/baichuan-inc) 下载 baichuan 模型，参考如下命令部署服务：
 以7B为例：
 ```
 1、模型转换
-python3 -m lmdeploy.serve.turbomind.deploy baichuan2-7b-chat baichuan2-7b-chat hf baichuan2-7b-chat/tokenizer.model ./workspace_baichuan
+mdeploy convert --model_name baichuan-7b --model_path /path/to/model --model_format hf --tokenizer_path None --dst_path ./workspace_baichuan --tp 1
 2、运行
- 在命令行界面运行：
+# bash界面运行
-python3 -m lmdeploy.turbomind.chat ./workspace_baichuan
+lmdeploy chat turbomind --model_path ./workspace_baichuan --tp 1
- 在服务器界面运行：
+# 在服务器界面运行：
-python3 -m lmdeploy.serve.gradio.app ./workspace_baichuan 10.6.10.67
+在bash端运行：
-打开网页输入10.6.10.67:6006
+lmdeploy serve gradio --model_path_or_server ./workspace_baichuan --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
+在网页上输入{ip}:{pord}即可进行对话
+```
+### 部署 [baichuan2](https://huggingface.co/baichuan-inc) 服务
+请从[这里](https://huggingface.co/baichuan-inc) 下载 baichuan2 模型，参考如下命令部署服务：
+以7B为例：
+```
+1、模型转换
+mdeploy convert --model_name baichuan2-7b --model_path /path/to/model --model_format hf --tokenizer_path None --dst_path ./workspace_baichuan2 --tp 1
+2、运行
+# bash界面运行
+lmdeploy chat turbomind --model_path ./workspace_baichuan2 --tp 1
+# 在服务器界面运行：
+在bash端运行：
+lmdeploy serve gradio --model_path_or_server ./workspace_baichuan2 --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
+在网页上输入{ip}:{pord}即可进行对话
 ```
 ### 部署 [qwen](https://huggingface.co/Qwen) 服务
 请从[这里](https://huggingface.co/Qwen) 下载 qwen 模型，参考如下命令部署服务：
 以7B为例：
 ```
 1、模型转换
-python3 -m lmdeploy.serve.turbomind.deploy qwen-7b qwen-7b-chat qwen qwen-7b-chat/tokenizer.model ./workspace_qwen
+mdeploy convert --model_name qwen-7b --model_path /path/to/model --model_format qwen --tokenizer_path None --dst_path ./workspace_qwen --tp 1
 2、运行
- 在命令行界面运行：
+# bash界面运行
-python3 -m lmdeploy.turbomind.chat ./workspace_qwen
+lmdeploy chat turbomind --model_path ./workspace_qwen --tp 1
- 在服务器界面运行：
+# 在服务器界面运行：
-python3 -m lmdeploy.serve.gradio.app ./workspace_qwen 10.6.10.67
+在bash端运行：
-打开网页输入10.6.10.67:6006
+lmdeploy serve gradio --model_path_or_server ./workspace_qwen --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
+在网页上输入{ip}:{pord}即可进行对话
 ```
+## result
+![qwen推理](docs/dcu/qwen推理.gif)
 ### 详细可参考 [docs](./docs/zh_cn/serving.md) 
 ## 版本号查询
 - python -c "import lmdeploy; lmdeploy.\_\_version__"，版本号与官方版本同步，查询该软件的版本号，例如0.0.6；