restful_api.md 5.39 KB
Newer Older
AllentDan's avatar
AllentDan committed
1
2
3
4
# Restful API

### Launch Service

5
6
7
8
9
10
The user can open the http url print by the following command in a browser.

- **Please check the http url for the detailed api usage!!!**
- **Please check the http url for the detailed api usage!!!**
- **Please check the http url for the detailed api usage!!!**

AllentDan's avatar
AllentDan committed
11
```shell
12
lmdeploy serve api_server ./workspace --server_name 0.0.0.0 --server_port ${server_port} --instance_num 64 --tp 1
AllentDan's avatar
AllentDan committed
13
14
```

AllentDan's avatar
AllentDan committed
15
We provide some RESTful APIs. Three of them are in OpenAI format.
16
17
18
19
20
21
22
23
24
25

- /v1/chat/completions
- /v1/models
- /v1/completions

However, we recommend users try
our own api `/v1/chat/interactive` which provides more arguments for users to modify. The performance is comparatively better.

**Note** please, if you want to launch multiple requests, you'd better set different `session_id` for both
`/v1/chat/completions` and `/v1/chat/interactive` apis. Or, we will set them random values.
AllentDan's avatar
AllentDan committed
26
27
28

### python

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
We have integrated the client-side functionalities of these services into the `APIClient` class. Below are some examples demonstrating how to invoke the `api_server` service on the client side.

If you want to use the `/v1/chat/completions` endpoint, you can try the following code:

```python
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
model_name = api_client.available_models[0]
messages = [{"role": "user", "content": "Say this is a test!"}]
for item in api_client.chat_completions_v1(model=model_name, messages=messages):
    print(item)
```

For the `/v1/completions` endpoint. If you want to use the `/v1/completions` endpoint, you can try:

```python
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
model_name = api_client.available_models[0]
for item in api_client.completions_v1(model=model_name, prompt='hi'):
    print(item)
```

Lmdeploy supports maintaining session histories on the server for `/v1/chat/interactive` api. We disable the
feature by default.

- On interactive mode, the chat history is kept on the server. In a multiple rounds of conversation, you should set
  `interactive_mode = True` and the same `session_id` (can't be -1, it's the default number) to `/v1/chat/interactive` for requests.
- On normal mode, no chat history is kept on the server.

The interactive mode can be controlled by the `interactive_mode` boolean parameter. The following is an example of normal mode. If you want to experience the interactive mode, simply pass in `interactive_mode=True`.
AllentDan's avatar
AllentDan committed
60
61

```python
62
63
64
65
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
for item in api_client.generate(prompt='hi'):
    print(item)
AllentDan's avatar
AllentDan committed
66
67
```

68
### Java/Golang/Rust
AllentDan's avatar
AllentDan committed
69

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.
Here is an example:

```shell
$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust

$ ls rust/*
rust/Cargo.toml  rust/git_push.sh  rust/README.md

rust/docs:
ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md
DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md

rust/src:
apis  lib.rs  models
```
AllentDan's avatar
AllentDan committed
86
87
88
89
90
91
92
93

### cURL

cURL is a tool for observing the output of the api.

List Models:

```bash
94
curl http://{server_ip}:{server_port}/v1/models
AllentDan's avatar
AllentDan committed
95
96
```

97
Interactive Chat:
AllentDan's avatar
AllentDan committed
98
99

```bash
100
curl http://{server_ip}:{server_port}/v1/chat/interactive \
AllentDan's avatar
AllentDan committed
101
102
  -H "Content-Type: application/json" \
  -d '{
103
    "prompt": "Hello! How are you?",
104
    "session_id": 1,
105
    "interactive_mode": true
AllentDan's avatar
AllentDan committed
106
107
108
109
110
111
  }'
```

Chat Completions:

```bash
112
curl http://{server_ip}:{server_port}/v1/chat/completions \
AllentDan's avatar
AllentDan committed
113
114
115
  -H "Content-Type: application/json" \
  -d '{
    "model": "internlm-chat-7b",
116
    "messages": [{"role": "user", "content": "Hello! How are you?"}]
AllentDan's avatar
AllentDan committed
117
118
119
  }'
```

120
Text Completions:
AllentDan's avatar
AllentDan committed
121

122
123
124
```shell
curl http://{server_ip}:{server_port}/v1/completions \
  -H 'Content-Type: application/json' \
AllentDan's avatar
AllentDan committed
125
  -d '{
126
127
128
  "model": "llama",
  "prompt": "two steps to build a house:"
}'
AllentDan's avatar
AllentDan committed
129
130
```

131
132
133
134
135
136
### CLI client

There is a client script for restful api server.

```shell
# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
137
lmdeploy serve api_client api_server_url
138
139
140
141
142
143
144
```

### webui

You can also test restful-api through webui.

```shell
145
# api_server_url is what printed in api_server.py, e.g. http://localhost:23333
146
# server_ip and server_port here are for gradio ui
147
148
# example: lmdeploy serve gradio http://localhost:23333 --server_name localhost --server_port 6006
lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port ${gradio_ui_port}
149
150
```

AllentDan's avatar
AllentDan committed
151
152
### FAQ

153
154
1. When user got `"finish_reason":"length"`, it means the session is too long to be continued. The session length can be
   modified by passing `--session_len` to api_server.
AllentDan's avatar
AllentDan committed
155
156
157

2. When OOM appeared at the server side, please reduce the number of `instance_num` when lanching the service.

158
3. When the request with the same `session_id` to `/v1/chat/interactive` got a empty return value and a negative `tokens`, please consider setting `interactive_mode=false` to restart the session.
AllentDan's avatar
AllentDan committed
159

160
4. The `/v1/chat/interactive` api disables engaging in multiple rounds of conversation by default. The input argument `prompt` consists of either single strings or entire chat histories.