docs(README): fix (#50)

4c303b17 · tpoisonooo · GitHub · 0d19a95d · 4c303b17 · 4c303b17
Unverified Commit 4c303b17 authored Jul 04, 2023 by tpoisonooo Committed by GitHub Jul 04, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 6 deletions

README.md README.md +3 -3

README_zh-CN.md README_zh-CN.md +3 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ English | [简体中文](README_zh-CN.md)
 LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the [MMRazor](https://github.com/open-mmlab/mmrazor) and [MMDeploy](https://github.com/open-mmlab/mmdeploy) teams. It has the following core features:
- **Efficient Inference Engine TurboMind**: Based on [FasterTransformer](https://github.com/NVIDIA/FasterTransformer), we have implemented an efficient inference engine - TurboMind, which supports the inference of LLaMA and its variant models on NVIDIA GPUs.
+- **Efficient Inference Engine (TurboMind)**: Based on [FasterTransformer](https://github.com/NVIDIA/FasterTransformer), we have implemented an efficient inference engine - TurboMind, which supports the inference of LLaMA and its variant models on NVIDIA GPUs.
 - **Interactive Inference Mode**: By caching the k/v of attention during multi-round dialogue processes, it remembers dialogue history, thus avoiding repetitive processing of historical sessions.
@@ -79,7 +79,7 @@ Weights for the LLaMA models can be obtained from by filling out [this form](htt
 Run one of the following commands to serve a LLaMA model on NVIDIA GPU server:
-<details open>
+<details close>
 <summary><b>7B</b></summary>
 ```shell
@@ -90,7 +90,7 @@ bash workspace/service_docker_up.sh --lib-dir $(pwd)/build/install/backends/turb
 </details>
-<details open>
+<details close>
 <summary><b>13B</b></summary>
 ```shell

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -44,7 +44,7 @@ LMDeploy 由 [MMDeploy](https://github.com/open-mmlab/mmdeploy) 和 [MMRazor](ht
    <img src="https://github.com/NVIDIA/FasterTransformer/blob/main/docs/images/gpt/gpt_interactive_generation.2.png?raw=true"/>
  </div>
- **多 GPU 部署和量化**：我们提供了全面的模型部署和量化支持，已经在 7～100B 模型上完成验证。
+- **多 GPU 部署和量化**：我们提供了全面的模型部署和量化支持，已在不同规模上完成验证。
 - **persistent batch 推理**：进一步优化模型执行效率。
@@ -79,7 +79,7 @@ make -j$(nproc) && make install
 执行如下命令，可以把 LLaMA 模型部署到 NVIDIA GPU Server：
-<details open>
+<details close>
 <summary><b>7B</b></summary>
 ```shell
@@ -90,7 +90,7 @@ bash workspace/service_docker_up.sh --lib-dir $(pwd)/build/install/backends/turb
 </details>
-<details open>
+<details close>
 <summary><b>13B</b></summary>
 ```shell