Unverified Commit eaccbc0a authored by AllentDan's avatar AllentDan Committed by GitHub
Browse files

Update FAQ for restful api (#319)

* update FAQ for restful api

* refine
parent 452822a4
...@@ -147,3 +147,10 @@ python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True ...@@ -147,3 +147,10 @@ python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service. 2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service.
3. When the request with the same `instance_id` to `generate` got a empty return value and a negative `tokens`, please consider setting `sequence_start=false` for the second question and the same for the afterwards. 3. When the request with the same `instance_id` to `generate` got a empty return value and a negative `tokens`, please consider setting `sequence_start=false` for the second question and the same for the afterwards.
4. Requests were previously being handled sequentially rather than concurrently. To resolve this issue,
- kindly provide unique instance_id values when calling the `generate` API or else your requests may be associated with client IP addresses
- additionally, setting `stream=true` enables processing multiple requests simultaneously
5. Both `generate` api and `v1/chat/completions` upport engaging in multiple rounds of conversation, where input `prompt` or `messages` consists of either single strings or entire chat histories.These inputs are interpreted using multi-turn dialogue modes. However, ff you want to turn the mode of and manage the chat history in clients, please the parameter `sequence_end: true` when utilizing the `generate` function, or specify `renew_session: true` when making use of `v1/chat/completions`
...@@ -149,3 +149,12 @@ python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True ...@@ -149,3 +149,12 @@ python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
2. 当服务端显存 OOM 时,可以适当减小启动服务时的 `instance_num` 个数 2. 当服务端显存 OOM 时,可以适当减小启动服务时的 `instance_num` 个数
3. 当同一个 `instance_id` 的请求给 `generate` 函数后,出现返回空字符串和负值的 `tokens`,应该是第二次问话没有设置 `sequence_start=false` 3. 当同一个 `instance_id` 的请求给 `generate` 函数后,出现返回空字符串和负值的 `tokens`,应该是第二次问话没有设置 `sequence_start=false`
4. 如果感觉请求不是并发地被处理,而是一个一个地处理,请设置好以下参数:
- 不同的 instance_id 传入 `generate` api。否则,我们将自动绑定会话 id 为请求端的 ip 地址编号。
- 设置 `stream=true` 使模型在前向传播时可以允许其他请求进入被处理
5. `generate` api 和 `v1/chat/completions` 均支持多轮对话。`messages` 或者 `prompt` 参数既可以是一个简单字符串表示用户的单词提问,也可以是一段对话历史。
两个 api 都是默认开启多伦对话的,如果你想关闭这个功能,然后在客户端管理会话记录,请设置 `sequence_end: true` 传入 `generate`,或者设置
`renew_session: true` 传入 `v1/chat/completions`
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment