README.md 979 Bytes
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Install

```
pip3 install dspy-ai
```

Turn off cache at https://github.com/stanfordnlp/dspy/blob/34d8420383ec752037aa271825c1d3bf391e1277/dsp/modules/cache_utils.py#L10.
```
cache_turn_on = False
```

## Benchmark SGLang
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
```

```
python3 bench_dspy_intro.py --backend sglang
```


## Benchmark TGI
```
docker run --name tgi --rm -ti --gpus all --network host \
  -v /home/ubuntu/model_weights/Llama-2-7b-chat-hf:/Llama-2-7b-chat-hf \
Lianmin Zheng's avatar
Lianmin Zheng committed
26
  ghcr.io/huggingface/text-generation-inference:1.3.0 \
Lianmin Zheng's avatar
Lianmin Zheng committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
  --model-id /Llama-2-7b-chat-hf --num-shard 1  --trust-remote-code \
  --max-input-length 2048 --max-total-tokens 4096 \
  --port 24000
```

```
python3 bench_dspy_intro.py --backend tgi
```



## Benchmark vLLM
```
python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf --disable-log-requests  --port 21000
```

```
python3 bench_dspy_intro.py --backend vllm
```