"docs/git@developer.sourcefind.cn:one/TransferBench.git" did not exist on "5be9a3afafcb803fdd1b559fff41a6a191268c05"
Unverified Commit 27e12477 authored by AllentDan's avatar AllentDan Committed by GitHub
Browse files

update huggingface internlm-chat-7b model url (#546)

parent 0d2a151e
...@@ -109,7 +109,7 @@ pip install lmdeploy ...@@ -109,7 +109,7 @@ pip install lmdeploy
# Make sure you have git-lfs installed (https://git-lfs.com) # Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b /path/to/internlm-chat-7b git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b
# if you want to clone without large files – just their pointers # if you want to clone without large files – just their pointers
# prepend your git clone with the following env var: # prepend your git clone with the following env var:
......
...@@ -110,7 +110,7 @@ pip install lmdeploy ...@@ -110,7 +110,7 @@ pip install lmdeploy
# Make sure you have git-lfs installed (https://git-lfs.com) # Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b /path/to/internlm-chat-7b git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b
# if you want to clone without large files – just their pointers # if you want to clone without large files – just their pointers
# prepend your git clone with the following env var: # prepend your git clone with the following env var:
......
...@@ -69,7 +69,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace ...@@ -69,7 +69,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace
## GPU Memory Test ## GPU Memory Test
The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) model. The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) model.
Testing method: Testing method:
1. Use `deploy.py` to convert the model, modify the maximum concurrency in the `workspace` configuration; adjust the number of requests in `llama_config.ini`. 1. Use `deploy.py` to convert the model, modify the maximum concurrency in the `workspace` configuration; adjust the number of requests in `llama_config.ini`.
...@@ -93,7 +93,7 @@ As can be seen, the fp16 version requires 1030MB of GPU memory for each concurre ...@@ -93,7 +93,7 @@ As can be seen, the fp16 version requires 1030MB of GPU memory for each concurre
## Accuracy Test ## Accuracy Test
The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) command model. The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) command model.
Below is the result of PTQ quantization of `kCacheKVInt8` method with only 128 randomly selected data from the c4 dataset. The accuracy was tested using [opencompass](https://github.com/InternLM/opencompass) before and after quantization. Below is the result of PTQ quantization of `kCacheKVInt8` method with only 128 randomly selected data from the c4 dataset. The accuracy was tested using [opencompass](https://github.com/InternLM/opencompass) before and after quantization.
......
...@@ -69,7 +69,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace ...@@ -69,7 +69,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace
## 显存测试 ## 显存测试
测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) 模型。 测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) 模型。
测试方法: 测试方法:
1. 使用 `deploy.py` 转换模型,修改 `workspace` 配置中的最大并发数;调整 `llama_config.ini` 中的请求数 1. 使用 `deploy.py` 转换模型,修改 `workspace` 配置中的最大并发数;调整 `llama_config.ini` 中的请求数
...@@ -93,7 +93,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace ...@@ -93,7 +93,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace
## 精度测试 ## 精度测试
测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) 指令模型。 测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) 指令模型。
以下是 `kCacheKVInt8` 方法仅从 c4 数据集,随机选择 128 条数据 PTQ 量化。量化前后均使用 [opencompass](https://github.com/InternLM/opencompass) 测试精度。 以下是 `kCacheKVInt8` 方法仅从 c4 数据集,随机选择 128 条数据 PTQ 量化。量化前后均使用 [opencompass](https://github.com/InternLM/opencompass) 测试精度。
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment