Update FAQ for restful api (#319)

* update FAQ for restful api * refine

Update FAQ for restful api (#319)
* update FAQ for restful api * refine
eaccbc0a · AllentDan · GitHub · 452822a4 · eaccbc0a · eaccbc0a
Unverified Commit eaccbc0a authored Aug 30, 2023 by AllentDan Committed by GitHub Aug 30, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 0 deletions

docs/en/restful_api.md docs/en/restful_api.md +7 -0

docs/zh_cn/restful_api.md docs/zh_cn/restful_api.md +9 -0

No files found.
--- a/docs/en/restful_api.md
+++ b/docs/en/restful_api.md
@@ -147,3 +147,10 @@ python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
 2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service.
 3. When the request with the same `instance_id` to `generate` got a empty return value and a negative `tokens`, please consider setting `sequence_start=false` for the second question and the same for the afterwards.
+4. Requests were previously being handled sequentially rather than concurrently. To resolve this issue,
+   - kindly provide unique instance_id values when calling the `generate` API or else your requests may be associated with client IP addresses
+   - additionally, setting `stream=true` enables processing multiple requests simultaneously
+5. Both `generate` api and `v1/chat/completions` upport engaging in multiple rounds of conversation, where input `prompt` or `messages` consists of either single strings or entire chat histories.These inputs are interpreted using multi-turn dialogue modes. However, ff you want to turn the mode of and manage the chat history in clients, please the parameter `sequence_end: true` when utilizing the `generate` function, or specify `renew_session: true` when making use of `v1/chat/completions`
--- a/docs/zh_cn/restful_api.md
+++ b/docs/zh_cn/restful_api.md
@@ -149,3 +149,12 @@ python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
 2. 当服务端显存 OOM 时，可以适当减小启动服务时的 `instance_num` 个数
 3. 当同一个 `instance_id` 的请求给 `generate` 函数后，出现返回空字符串和负值的 `tokens`，应该是第二次问话没有设置 `sequence_start=false`
+4. 如果感觉请求不是并发地被处理，而是一个一个地处理，请设置好以下参数：
+   - 不同的 instance_id 传入 `generate` api。否则，我们将自动绑定会话 id 为请求端的 ip 地址编号。
+   - 设置 `stream=true` 使模型在前向传播时可以允许其他请求进入被处理
+5. `generate` api 和 `v1/chat/completions` 均支持多轮对话。`messages` 或者 `prompt` 参数既可以是一个简单字符串表示用户的单词提问，也可以是一段对话历史。
+   两个 api 都是默认开启多伦对话的，如果你想关闭这个功能，然后在客户端管理会话记录，请设置 `sequence_end: true` 传入 `generate`，或者设置
+   `renew_session: true` 传入 `v1/chat/completions`。