Add extra_requires to reduce dependencies (#580)

* update reqs * update docs * resolve comments * upgrade pydantic * fix rebase * update doc * update * update * update readme * update * add flash-attn

Add extra_requires to reduce dependencies (#580)
* update reqs * update docs * resolve comments * upgrade pydantic * fix rebase * update doc * update * update * update readme * update * add flash-attn
06125966 · RunningLeon · GitHub · 7b20cfdf · 06125966 · 06125966
Unverified Commit 06125966 authored Nov 10, 2023 by RunningLeon Committed by GitHub Nov 10, 2023
15 changed files
--- a/README.md
+++ b/README.md
@@ -103,6 +103,14 @@ Install lmdeploy with pip ( python 3.8+) or [from source](./docs/en/build.md)
 pip install lmdeploy
 ```

+> **Note**<br />
+> `pip install lmdeploy` can only install the runtime required packages. If users want to run codes from modules like `lmdeploy.lite` and `lmdeploy.serve`, they need to install the extra required packages.
+> For instance, running `pip install lmdeploy[lite]` would install extra dependencies for `lmdeploy.lite` module.
+>
+> - `all`: Install lmdeploy with all dependencies in `requirements.txt`
+> - `lite`: Install lmdeploy with extra dependencies in `requirements/lite.txt`
+> - `serve`: Install lmdeploy with dependencies in `requirements/serve.txt`
+
 ### Deploy InternLM

 #### Get InternLM model
@@ -140,6 +148,9 @@ lmdeploy chat turbomind ./workspace
 #### Serving with gradio

 ```shell
+# install lmdeploy with extra dependencies
+pip install lmdeploy[serve]
+
 lmdeploy serve gradio ./workspace
 ```

@@ -150,6 +161,9 @@ lmdeploy serve gradio ./workspace
 Launch inference server by:

 ```shell
+# install lmdeploy with extra dependencies
+pip install lmdeploy[serve]
+
 lmdeploy serve api_server ./workspace --instance_num 32 --tp 1
 ```

@@ -182,6 +196,7 @@ bash workspace/service_docker_up.sh
 Then, you can communicate with the inference server by command line,

 ```shell
+python3 -m pip install tritonclient[grpc]
 lmdeploy serve triton_client {server_ip_addresss}:33337
 ```


--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -104,6 +104,13 @@ TurboMind 的 output token throughput 超过 2000 token/s, 整体比 DeepSpeed 
 pip install lmdeploy
 ```

+> **Note**<br />
+> `pip install lmdeploy`默认安装runtime依赖包，使用lmdeploy的lite和serve功能时，用户需要安装额外依赖包。例如: `pip install lmdeploy[lite]` 会额外安装`lmdeploy.lite`模块的依赖包
+>
+> - `all`: 安装`lmdeploy`所有依赖包，具体可查看`requirements.txt`
+> - `lite`: 额外安装`lmdeploy.lite`模块的依赖包，具体可查看`requirements/lite.txt`
+> - `serve`: 额外安装`lmdeploy.serve`模块的依赖包，具体可查看`requirements/serve.txt`
+
 ### 部署 InternLM

 #### 获取 InternLM 模型
@@ -140,6 +147,9 @@ lmdeploy chat turbomind ./workspace
 #### 启动 gradio server

 ```shell
+# 安装lmdeploy额外依赖
+pip install lmdeploy[serve]
+
 lmdeploy serve gradio ./workspace
 ```

@@ -150,6 +160,9 @@ lmdeploy serve gradio ./workspace
 使用下面的命令启动推理服务：

 ```shell
+# 安装lmdeploy额外依赖
+pip install lmdeploy[serve]
+
 lmdeploy serve api_server ./workspace --server_name 0.0.0.0 --server_port ${server_port} --instance_num 32 --tp 1
 ```

@@ -182,6 +195,7 @@ bash workspace/service_docker_up.sh
 你可以通过命令行方式与推理服务进行对话：

 ```shell
+python3 -m pip install tritonclient[grpc]
 lmdeploy serve triton_client {server_ip_addresss}:33337
 ```


--- a/docs/en/faq.md
+++ b/docs/en/faq.md
@@ -17,7 +17,7 @@ It may have been caused by the following reasons.
 1. You haven't installed lmdeploy's precompiled package. `_turbomind` is the pybind package of c++ turbomind, which involves compilation. It is recommended that you install the precompiled one.

 ```shell
-pip install lmdeploy
+pip install lmdeploy[all]
 ```

 2. If you have installed it and still encounter this issue, it is probably because you are executing turbomind-related command in the root directory of lmdeploy source code. Switching to another directory will fix it
@@ -26,7 +26,7 @@ pip install lmdeploy

 ### libnccl.so.2 not found

-Make sure you have install lmdeploy (>=v0.0.5) through `pip install lmdeploy`.
+Make sure you have install lmdeploy (>=v0.0.5) through `pip install lmdeploy[all]`.

 If the issue still exists after lmdeploy installation, add the path of `libnccl.so.2` to environment variable LD_LIBRARY_PATH.


--- a/docs/en/supported_models/codellama.md
+++ b/docs/en/supported_models/codellama.md
@@ -26,7 +26,7 @@ Based on the above table, download the model that meets your requirements. Execu

 ```shell
 # install lmdeploy
-python3 -m pip install lmdeploy
+python3 -m pip install lmdeploy[all]

 # convert weight layout
 lmdeploy convert codellama /the/path/of/codellama/model

--- a/docs/en/w4a16.md
+++ b/docs/en/w4a16.md
@@ -5,7 +5,7 @@ LMDeploy supports LLM model inference of 4-bit weight, with the minimum requirem
 Before proceeding with the inference, please ensure that lmdeploy is installed.

 ```shell
-pip install lmdeploy
+pip install lmdeploy[all]
 ```

 ## 4-bit LLM model Inference

--- a/docs/zh_cn/faq.md
+++ b/docs/zh_cn/faq.md
@@ -17,7 +17,7 @@ pip install --upgrade mmengine
 1. 您没有安装 lmdeploy 的预编译包。`_turbomind`是 turbomind c++ 的 pybind部分，涉及到编译。推荐您直接安装预编译包。

 ```
-pip install lmdeploy
+pip install lmdeploy[all]
 ```

 2. 如果已经安装了，还是出现这个问题，请检查下执行目录。不要在 lmdeploy 的源码根目录下执行 python -m lmdeploy.turbomind.\*下的package，换到其他目录下执行。
@@ -26,7 +26,7 @@ pip install lmdeploy

 ### libnccl.so.2 not found

-确保通过 `pip install lmdeploy` 安装了 lmdeploy (>=v0.0.5)。
+确保通过 `pip install lmdeploy[all]` 安装了 lmdeploy (>=v0.0.5)。

 如果安装之后，问题还存在，那么就把`libnccl.so.2`的路径加入到环境变量 LD_LIBRARY_PATH 中。


--- a/docs/zh_cn/supported_models/codellama.md
+++ b/docs/zh_cn/supported_models/codellama.md
@@ -26,7 +26,7 @@

 ```shell
 # 安装 lmdeploy
-python3 -m pip install lmdeploy
+python3 -m pip install lmdeploy[all]

 # 转模型格式
 lmdeploy convert codellama /path/of/codellama/model

--- a/docs/zh_cn/w4a16.md
+++ b/docs/zh_cn/w4a16.md
@@ -5,7 +5,7 @@ LMDeploy 支持 4bit 权重模型的推理，**对 NVIDIA 显卡的最低要求
 在推理之前，请确保安装了 lmdeploy

 ```shell
-pip install lmdeploy
+pip install lmdeploy[all]
 ```

 ## 4bit 权重模型推理

--- a/requirements.txt
+++ b/requirements.txt
-accelerate
-datasets
-fastapi
-fire
-gradio<4.0.0
-mmengine
-numpy
-pybind11
-safetensors
-sentencepiece
-setuptools
-shortuuid
-tiktoken
-torch
-transformers>=4.33.0
-tritonclient[all]
-uvicorn
+-r requirements/build.txt
+-r requirements/runtime.txt
+-r requirements/lite.txt
+-r requirements/serve.txt
--- a/requirements/build.txt
+++ b/requirements/build.txt
+pybind11
+setuptools
--- a/requirements/lite.txt
+++ b/requirements/lite.txt
+accelerate
+datasets
+flash-attn
--- a/requirements/runtime.txt
+++ b/requirements/runtime.txt
+fire
+mmengine
+numpy
+safetensors
+sentencepiece
+tiktoken
+torch
+transformers>=4.33.0
--- a/requirements/serve.txt
+++ b/requirements/serve.txt
+fastapi
+gradio<4.0.0
+pydantic>2.0.0
+shortuuid
+uvicorn
--- a/requirements/test.txt
+++ b/requirements/test.txt
+coverage
+pytest
--- a/setup.py
+++ b/setup.py
@@ -134,7 +134,14 @@ if __name__ == '__main__':
            'lmdeploy': lmdeploy_package_data,
        },
        include_package_data=True,
-        install_requires=parse_requirements('requirements.txt'),
+        setup_requires=parse_requirements('requirements/build.txt'),
+        tests_require=parse_requirements('requirements/test.txt'),
+        install_requires=parse_requirements('requirements/runtime.txt'),
+        extras_require={
+            'all': parse_requirements('requirements.txt'),
+            'lite': parse_requirements('requirements/lite.txt'),
+            'serve': parse_requirements('requirements/serve.txt')
+        },
        has_ext_modules=check_ext_modules,
        classifiers=[
            'Programming Language :: Python :: 3.8',