Unverified Commit 0b861c48 authored by del-zhenwu's avatar del-zhenwu Committed by GitHub
Browse files

[doc] Update benchmark command in w4a16.md (#500)



* [doc] Update benchmark command in w4a16.md

* Update w4a16.md

* Update w4a16.md

add pip install nvidia-ml-py

* [doc] Update w4a16.md

* fix lint error
Signed-off-by: default avatardel-zhenwu <dele.zhenwu@gmail.com>

* [doc] update model_path & prompt_tokens
Signed-off-by: default avatardel-zhenwu <dele.zhenwu@gmail.com>

---------
Signed-off-by: default avatardel-zhenwu <dele.zhenwu@gmail.com>
parent 77a26812
...@@ -62,10 +62,14 @@ Memory (GB) comparison results between 4-bit and 16-bit model with context size ...@@ -62,10 +62,14 @@ Memory (GB) comparison results between 4-bit and 16-bit model with context size
| Llama-2-7B-chat | 15.1 | 6.3 | 16.2 | 7.5 | | Llama-2-7B-chat | 15.1 | 6.3 | 16.2 | 7.5 |
| Llama-2-13B-chat | OOM | 10.3 | OOM | 12.0 | | Llama-2-13B-chat | OOM | 10.3 | OOM | 12.0 |
```
pip install nvidia-ml-py
```
```shell ```shell
python benchmark/profile_generation.py \ python benchmark/profile_generation.py \
./workspace \ --model-path ./workspace \
--concurrency 1 --input_seqlen 1 --output_seqlen 512 --concurrency 1 8 --prompt-tokens 1 512 --completion-tokens 2048 512
``` ```
## 4-bit Weight Quantization ## 4-bit Weight Quantization
......
...@@ -60,10 +60,14 @@ python3 -m lmdeploy.serve.turbomind ./workspace --server_name {ip_addr} ----serv ...@@ -60,10 +60,14 @@ python3 -m lmdeploy.serve.turbomind ./workspace --server_name {ip_addr} ----serv
| Llama-2-7B-chat | 15.1 | 6.3 | 16.2 | 7.5 | | Llama-2-7B-chat | 15.1 | 6.3 | 16.2 | 7.5 |
| Llama-2-13B-chat | OOM | 10.3 | OOM | 12.0 | | Llama-2-13B-chat | OOM | 10.3 | OOM | 12.0 |
```
pip install nvidia-ml-py
```
```shell ```shell
python benchmark/profile_generation.py \ python benchmark/profile_generation.py \
./workspace \ --model-path ./workspace \
--concurrency 1 --input_seqlen 1 --output_seqlen 512 --concurrency 1 8 --prompt-tokens 1 512 --completion-tokens 2048 512
``` ```
## 4bit 权重量化 ## 4bit 权重量化
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment