update huggingface internlm-chat-7b model url (#546)

27e12477 · AllentDan · GitHub · 0d2a151e · 27e12477 · 27e12477
Unverified Commit 27e12477 authored Oct 12, 2023 by AllentDan Committed by GitHub Oct 12, 2023
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 6 deletions

README.md README.md +1 -1

README_zh-CN.md README_zh-CN.md +1 -1

docs/en/kv_int8.md docs/en/kv_int8.md +2 -2

docs/zh_cn/kv_int8.md docs/zh_cn/kv_int8.md +2 -2

No files found.
--- a/README.md
+++ b/README.md
@@ -109,7 +109,7 @@ pip install lmdeploy

 # Make sure you have git-lfs installed (https://git-lfs.com)
 git lfs install
-git clone https://huggingface.co/internlm/internlm-chat-7b /path/to/internlm-chat-7b
+git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b

 # if you want to clone without large files – just their pointers
 # prepend your git clone with the following env var:

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -110,7 +110,7 @@ pip install lmdeploy

 # Make sure you have git-lfs installed (https://git-lfs.com)
 git lfs install
-git clone https://huggingface.co/internlm/internlm-chat-7b /path/to/internlm-chat-7b
+git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b

 # if you want to clone without large files – just their pointers
 # prepend your git clone with the following env var:

--- a/docs/en/kv_int8.md
+++ b/docs/en/kv_int8.md
@@ -69,7 +69,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace

 ## GPU Memory Test

-The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) model.
+The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) model.
 Testing method:

 1. Use `deploy.py` to convert the model, modify the maximum concurrency in the `workspace` configuration; adjust the number of requests in `llama_config.ini`.
@@ -93,7 +93,7 @@ As can be seen, the fp16 version requires 1030MB of GPU memory for each concurre

 ## Accuracy Test

-The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) command model.
+The test object is the [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) command model.

 Below is the result of PTQ quantization of `kCacheKVInt8` method with only 128 randomly selected data from the c4 dataset. The accuracy was tested using [opencompass](https://github.com/InternLM/opencompass) before and after quantization.


--- a/docs/zh_cn/kv_int8.md
+++ b/docs/zh_cn/kv_int8.md
@@ -69,7 +69,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace

 ## 显存测试

-测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) 模型。
+测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) 模型。
 测试方法：

 1. 使用 `deploy.py` 转换模型，修改 `workspace` 配置中的最大并发数；调整 `llama_config.ini` 中的请求数
@@ -93,7 +93,7 @@ python3 -m lmdeploy.turbomind.chat ./workspace

 ## 精度测试

-测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) 指令模型。
+测试对象为 [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b-v1_1) 指令模型。

 以下是 `kCacheKVInt8` 方法仅从 c4 数据集，随机选择 128 条数据 PTQ 量化。量化前后均使用 [opencompass](https://github.com/InternLM/opencompass) 测试精度。