Unverified Commit e641dd86 authored by pppppM's avatar pppppM Committed by GitHub
Browse files

[Docs] Update Supported Matrix (#679)

* update supported matrix

* change the default shard size when saving quantized weights

* baichuan2 kv8
parent ab1767cf
......@@ -66,10 +66,10 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
| SOLAR | Yes | Yes | Yes | Yes | No |
| InternLM-7B | Yes | Yes | Yes | Yes | No |
| InternLM-20B | Yes | Yes | Yes | Yes | No |
| QWen-7B | Yes | Yes | Yes | No | No |
| QWen-14B | Yes | Yes | Yes | No | No |
| QWen-7B | Yes | Yes | Yes | Yes | No |
| QWen-14B | Yes | Yes | Yes | Yes | No |
| Baichuan-7B | Yes | Yes | Yes | Yes | No |
| Baichuan2-7B | Yes | Yes | No | No | No |
| Baichuan2-7B | Yes | Yes | Yes | Yes | No |
| Code Llama | Yes | Yes | No | No | No |
### Pytorch
......
......@@ -67,10 +67,10 @@ LMDeploy 由 [MMDeploy](https://github.com/open-mmlab/mmdeploy) 和 [MMRazor](ht
| SOLAR | Yes | Yes | Yes | Yes | No |
| InternLM-7B | Yes | Yes | Yes | Yes | No |
| InternLM-20B | Yes | Yes | Yes | Yes | No |
| QWen-7B | Yes | Yes | Yes | No | No |
| QWen-14B | Yes | Yes | Yes | No | No |
| QWen-7B | Yes | Yes | Yes | Yes | No |
| QWen-14B | Yes | Yes | Yes | Yes | No |
| Baichuan-7B | Yes | Yes | Yes | Yes | No |
| Baichuan2-7B | Yes | Yes | No | No | No |
| Baichuan2-7B | Yes | Yes | Yes | Yes | No |
| Code Llama | Yes | Yes | No | No | No |
### Pytorch
......
......@@ -83,7 +83,7 @@ def auto_awq(model: str,
smooth_layers(layers, fc2fcs, norm2fcs, act_scales, w_group_size, device)
quant_weights(model, fcs, w_bits, w_sym, w_group_size, device)
model.save_pretrained(work_dir)
model.save_pretrained(work_dir, max_shard_size='2GB')
tokenizer.save_pretrained(work_dir)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment