docs(README): typo (#56)

7396d8f6 · tpoisonooo · GitHub · 3fff964d · 7396d8f6 · 7396d8f6
Unverified Commit 7396d8f6 authored Jul 05, 2023 by tpoisonooo Committed by GitHub Jul 05, 2023
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 6 deletions

README.md README.md +1 -5

docs/zh_cn/quantization.md docs/zh_cn/quantization.md +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -43,7 +43,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
  <img src="https://github.com/NVIDIA/FasterTransformer/blob/main/docs/images/gpt/gpt_interactive_generation.2.png?raw=true"/>
 </div>

- **Multi-GPU Model Deployment and Quantization**: We provide comprehensive support for model deployment and quantization, and have successfully validated it on models ranging from 7B to 100B parameters.
+- **Multi-GPU Model Deployment and Quantization**: We provide comprehensive model deployment and quantification support, and have been validated at different scales.

 - **Persistent Batch Inference**: Further optimization of model execution efficiency.

@@ -155,16 +155,12 @@ In fp16 mode, kv_cache int8 quantization can be enabled, and a single card can s
 First execute the quantization script, and the quantization parameters are stored in the weight directory transformed by `deploy.py`.

 ```
-
-
 python3 -m lmdeploy.lite.apis.kv_qparams \
  --model $HF_MODEL \
  --output_dir $DEPLOY_WEIGHT_DIR \
  --symmetry True \   # Whether to use symmetric or asymmetric quantization.
  --offload  False \  # Whether to offload some modules to CPU to save GPU memory.
  --num_tp 1 \   # The number of GPUs used for tensor parallelism
-
-
 ```

 Then adjust `config.ini`

--- a/docs/zh_cn/quantization.md
+++ b/docs/zh_cn/quantization.md
@@ -37,7 +37,7 @@
 3. 执行量化脚本，得到量化参数，放到 weights 目录；修改配置文件，使 [kCacheKVInt8](../../src/turbomind/models/llama/llama_utils.h) 选项生效
 4. 再次执行 `client.py`，读取 int8 版本精度

-以下是 `kCacheKVInt8` 方法仅用 c4 数据集量化，在 mmlu-social-science 数据集的精度损失。
+以下是 `kCacheKVInt8` 方法仅从 c4 数据集，随机选择 128 条数据量化，在 mmlu-social-science 数据集的精度损失。

 | task |       dataset       | metric | fp16  | int8  | diff  |
 | :--: | :-----------------: | :----: | :---: | :---: | :---: |