Unverified Commit fa7cbc7a authored by tpoisonooo's avatar tpoisonooo Committed by GitHub
Browse files

Update quantization.md (#47)

parent 197b3ee1
......@@ -4,7 +4,7 @@
测试方法:
1. 运行 `deploy.py`,切分 100B 模型到 8 个 GPU 上
2. 运行量化脚本,得到量化参数,放到 weights 目录
3. 修改配置文件,使 [kCacheKVInt8](../src/turbomind/models/llama/llama_utils.h) 选项生效
3. 修改配置文件,使 [kCacheKVInt8](../../src/turbomind/models/llama/llama_utils.h) 选项生效
4. 执行测试数据集,和 fp16 版本对比精度和显存使用情况
## 显存降低
......@@ -58,4 +58,4 @@
| QA | openbookqa_fact | v1-4e92f0 | accuracy | -14.00 |
| QA | nq | v1-d2370e | score | -2.16 |
| QA | triviaqa | v1-ead882 | score | -0.43 |
| Security | crows_pairs | v1-8fe12f | accuracy | 11.08 |
\ No newline at end of file
| Security | crows_pairs | v1-8fe12f | accuracy | 11.08 |
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment