restful_api.md 5.22 KB
Newer Older
AllentDan's avatar
AllentDan committed
1
2
3
4
5
# Restful API

### Launch Service

```shell
6
lmdeploy serve api_server ./workspace --server_name 0.0.0.0 --server_port ${server_port} --instance_num 32 --tp 1
AllentDan's avatar
AllentDan committed
7
8
```

9
Then, the user can open the swagger UI: `http://{server_ip}:{server_port}` for the detailed api usage.
10
11
12
13
14
15
16
17
18
19
20
We provide four restful api in total. Three of them are in OpenAI format.

- /v1/chat/completions
- /v1/models
- /v1/completions

However, we recommend users try
our own api `/v1/chat/interactive` which provides more arguments for users to modify. The performance is comparatively better.

**Note** please, if you want to launch multiple requests, you'd better set different `session_id` for both
`/v1/chat/completions` and `/v1/chat/interactive` apis. Or, we will set them random values.
AllentDan's avatar
AllentDan committed
21
22
23

### python

24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
We have integrated the client-side functionalities of these services into the `APIClient` class. Below are some examples demonstrating how to invoke the `api_server` service on the client side.

If you want to use the `/v1/chat/completions` endpoint, you can try the following code:

```python
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
model_name = api_client.available_models[0]
messages = [{"role": "user", "content": "Say this is a test!"}]
for item in api_client.chat_completions_v1(model=model_name, messages=messages):
    print(item)
```

For the `/v1/completions` endpoint. If you want to use the `/v1/completions` endpoint, you can try:

```python
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
model_name = api_client.available_models[0]
for item in api_client.completions_v1(model=model_name, prompt='hi'):
    print(item)
```

Lmdeploy supports maintaining session histories on the server for `/v1/chat/interactive` api. We disable the
feature by default.

- On interactive mode, the chat history is kept on the server. In a multiple rounds of conversation, you should set
  `interactive_mode = True` and the same `session_id` (can't be -1, it's the default number) to `/v1/chat/interactive` for requests.
- On normal mode, no chat history is kept on the server.

The interactive mode can be controlled by the `interactive_mode` boolean parameter. The following is an example of normal mode. If you want to experience the interactive mode, simply pass in `interactive_mode=True`.
AllentDan's avatar
AllentDan committed
55
56

```python
57
58
59
60
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
for item in api_client.generate(prompt='hi'):
    print(item)
AllentDan's avatar
AllentDan committed
61
62
```

63
### Java/Golang/Rust
AllentDan's avatar
AllentDan committed
64

65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.
Here is an example:

```shell
$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust

$ ls rust/*
rust/Cargo.toml  rust/git_push.sh  rust/README.md

rust/docs:
ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md
DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md

rust/src:
apis  lib.rs  models
```
AllentDan's avatar
AllentDan committed
81
82
83
84
85
86
87
88

### cURL

cURL is a tool for observing the output of the api.

List Models:

```bash
89
curl http://{server_ip}:{server_port}/v1/models
AllentDan's avatar
AllentDan committed
90
91
```

92
Interactive Chat:
AllentDan's avatar
AllentDan committed
93
94

```bash
95
curl http://{server_ip}:{server_port}/v1/chat/interactive \
AllentDan's avatar
AllentDan committed
96
97
  -H "Content-Type: application/json" \
  -d '{
98
    "prompt": "Hello! How are you?",
99
    "session_id": 1,
100
    "interactive_mode": true
AllentDan's avatar
AllentDan committed
101
102
103
104
105
106
  }'
```

Chat Completions:

```bash
107
curl http://{server_ip}:{server_port}/v1/chat/completions \
AllentDan's avatar
AllentDan committed
108
109
110
  -H "Content-Type: application/json" \
  -d '{
    "model": "internlm-chat-7b",
111
    "messages": [{"role": "user", "content": "Hello! How are you?"}]
AllentDan's avatar
AllentDan committed
112
113
114
  }'
```

115
Text Completions:
AllentDan's avatar
AllentDan committed
116

117
118
119
```shell
curl http://{server_ip}:{server_port}/v1/completions \
  -H 'Content-Type: application/json' \
AllentDan's avatar
AllentDan committed
120
  -d '{
121
122
123
  "model": "llama",
  "prompt": "two steps to build a house:"
}'
AllentDan's avatar
AllentDan committed
124
125
```

126
127
128
129
130
131
### CLI client

There is a client script for restful api server.

```shell
# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
132
lmdeploy serve api_client api_server_url
133
134
135
136
137
138
139
```

### webui

You can also test restful-api through webui.

```shell
140
# api_server_url is what printed in api_server.py, e.g. http://localhost:23333
141
# server_ip and server_port here are for gradio ui
142
143
# example: lmdeploy serve gradio http://localhost:23333 --server_name localhost --server_port 6006
lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port ${gradio_ui_port}
144
145
```

AllentDan's avatar
AllentDan committed
146
147
148
149
150
151
152
### FAQ

1. When user got `"finish_reason":"length"` which means the session is too long to be continued.
   Please add `"renew_session": true` into the next request.

2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service.

153
3. When the request with the same `session_id` to `/v1/chat/interactive` got a empty return value and a negative `tokens`, please consider setting `interactive_mode=false` to restart the session.
AllentDan's avatar
AllentDan committed
154

155
4. The `/v1/chat/interactive` api disables engaging in multiple rounds of conversation by default. The input argument `prompt` consists of either single strings or entire chat histories.