"app/vscode:/vscode.git/clone" did not exist on "2ed26f0047ae04f5af07b842906fe66daf7505f6"
flashinfer.md 610 Bytes
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
2
## Flashinfer Mode

Lianmin Zheng's avatar
Lianmin Zheng committed
3
4
[flashinfer](https://github.com/flashinfer-ai/flashinfer) is a kernel library for LLM serving.
It can be used in SGLang runtime to accelerate attention computation.
Lianmin Zheng's avatar
Lianmin Zheng committed
5
6
7

### Install flashinfer

Lianmin Zheng's avatar
Lianmin Zheng committed
8
9
Note: The compilation can take a very long time.

Lianmin Zheng's avatar
Lianmin Zheng committed
10
11
12
13
14
```bash
git submodule update --init --recursive
pip install 3rdparty/flashinfer/python
```

Lianmin Zheng's avatar
Lianmin Zheng committed
15
### Run a Server With Flashinfer Mode
Lianmin Zheng's avatar
Lianmin Zheng committed
16

Lianmin Zheng's avatar
Lianmin Zheng committed
17
Add `--model-mode flashinfer` argument to enable flashinfer when launching a server.
Lianmin Zheng's avatar
Lianmin Zheng committed
18
19
20
21
22

Example:

```bash
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --model-mode flashinfer
Lianmin Zheng's avatar
Lianmin Zheng committed
23
```