get_started.md 2.16 KB
Newer Older
zhouxiang's avatar
zhouxiang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Get Started

LMDeploy offers functionalities such as model quantization, offline batch inference, online serving, etc. Each function can be completed with just a few simple lines of code or commands.

## Installation

Install lmdeploy with pip (python 3.8+) or [from source](./build.md)

```shell
pip install lmdeploy
```

The default prebuilt package is compiled on CUDA 11.8. However, if CUDA 12+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.2.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl
```

## Offline batch inference

```python
import lmdeploy
pipe = lmdeploy.pipeline("internlm/internlm-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)
```

For more information on inference pipeline parameters, please refer to [here](./inference/pipeline.md).

## Serving

LMDeploy offers various serving methods, choosing one that best meet your requirements.

- [Serving with openai compatible server](https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html)
- [Serving with docker](https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html#option-2-deploying-with-docker)
- [Serving with gradio](https://lmdeploy.readthedocs.io/en/latest/serving/gradio.html)

## Quantization

LMDeploy provides the following quantization methods. Please visit the following links for the detailed guide

- [4bit weight-only quantization](quantization/w4a16.md)
- [k/v quantization](quantization/kv_int8.md)
- [w8a8 quantization](quantization/w8a8.md)

## Useful Tools

LMDeploy CLI offers the following utilities, helping users experience LLM features conveniently

### Inference with Command line Interface

```shell
lmdeploy chat turbomind internlm/internlm-chat-7b
```

### Serving with Web UI

LMDeploy adopts gradio to develop the online demo.

```shell
# install dependencies
pip install lmdeploy[serve]
# launch gradio server
lmdeploy serve gradio internlm/internlm-chat-7b
```

![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)