Unverified Commit c02e281f authored by AllentDan's avatar AllentDan Committed by GitHub
Browse files

[Doc] Update restful api doc (#662)

* update restful_api.md

* add a hint

* repeat 3 time
parent 0fcc3034
...@@ -2,11 +2,16 @@ ...@@ -2,11 +2,16 @@
### Launch Service ### Launch Service
The user can open the http url print by the following command in a browser.
- **Please check the http url for the detailed api usage!!!**
- **Please check the http url for the detailed api usage!!!**
- **Please check the http url for the detailed api usage!!!**
```shell ```shell
lmdeploy serve api_server ./workspace --server_name 0.0.0.0 --server_port ${server_port} --instance_num 32 --tp 1 lmdeploy serve api_server ./workspace --server_name 0.0.0.0 --server_port ${server_port} --instance_num 32 --tp 1
``` ```
Then, the user can open the swagger UI: `http://{server_ip}:{server_port}` for the detailed api usage.
We provide four restful api in total. Three of them are in OpenAI format. We provide four restful api in total. Three of them are in OpenAI format.
- /v1/chat/completions - /v1/chat/completions
...@@ -145,8 +150,8 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port ...@@ -145,8 +150,8 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port
### FAQ ### FAQ
1. When user got `"finish_reason":"length"` which means the session is too long to be continued. 1. When user got `"finish_reason":"length"`, it means the session is too long to be continued. The session length can be
Please add `"renew_session": true` into the next request. modified by passing `--session_len` to api_server.
2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service. 2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service.
......
...@@ -2,13 +2,16 @@ ...@@ -2,13 +2,16 @@
### 启动服务 ### 启动服务
运行脚本 用户将下面命令输出的 http url 复制到浏览器打开,详细查看所有的 API 及其使用方法。
请一定查看`http://{server_ip}:{server_port}`!!!
请一定查看`http://{server_ip}:{server_port}`!!!
请一定查看`http://{server_ip}:{server_port}`!!!
重要的事情说三遍。
```shell ```shell
lmdeploy serve api_server ./workspace 0.0.0.0 --server_port ${server_port} --instance_num 32 --tp 1 lmdeploy serve api_server ./workspace 0.0.0.0 --server_port ${server_port} --instance_num 32 --tp 1
``` ```
然后用户可以打开 swagger UI: `http://{server_ip}:{server_port}` 详细查看所有的 API 及其使用方法。
我们一共提供四个 restful api,其中三个仿照 OpenAI 的形式。 我们一共提供四个 restful api,其中三个仿照 OpenAI 的形式。
- /v1/chat/completions - /v1/chat/completions
...@@ -142,8 +145,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port ...@@ -142,8 +145,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port
### FAQ ### FAQ
1. 当返回结果结束原因为 `"finish_reason":"length"`,这表示回话长度超过最大值。 1. 当返回结果结束原因为 `"finish_reason":"length"`,这表示回话长度超过最大值。如需调整会话支持的最大长度,可以通过启动`api_server`时,设置`--session_len`参数大小。
请添加 `"renew_session": true` 到下一次请求中。
2. 当服务端显存 OOM 时,可以适当减小启动服务时的 `instance_num` 个数 2. 当服务端显存 OOM 时,可以适当减小启动服务时的 `instance_num` 个数
......
...@@ -510,6 +510,9 @@ def main(model_path: str, ...@@ -510,6 +510,9 @@ def main(model_path: str,
instance_num=instance_num, instance_num=instance_num,
tp=tp, tp=tp,
**kwargs) **kwargs)
for i in range(3):
print(f'HINT: Please open \033[93m\033[1mhttp://{server_name}:'
f'{server_port}\033[0m in a browser for detailed api usage!!!')
uvicorn.run(app=app, host=server_name, port=server_port, log_level='info') uvicorn.run(app=app, host=server_name, port=server_port, log_level='info')
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment