"tests/python/vscode:/vscode.git/clone" did not exist on "109aed560f9382fc476fb558b1d9f75478d49457"
Commit 77ca8337 authored by Casper's avatar Casper
Browse files

Update install instructions

parent ff556eb0
...@@ -4,7 +4,7 @@ AutoAWQ is a package that implements the Activation-aware Weight Quantization (A ...@@ -4,7 +4,7 @@ AutoAWQ is a package that implements the Activation-aware Weight Quantization (A
Roadmap: Roadmap:
- [ ] Publish pip package - [x] Publish pip package
- [ ] Refactor quantization code - [ ] Refactor quantization code
- [ ] Support more models - [ ] Support more models
- [ ] Optimize the speed of models - [ ] Optimize the speed of models
...@@ -14,7 +14,18 @@ Roadmap: ...@@ -14,7 +14,18 @@ Roadmap:
Requirements: Requirements:
- Compute Capability 8.0 (sm80). Ampere and later architectures are supported. - Compute Capability 8.0 (sm80). Ampere and later architectures are supported.
Clone this repository and install with pip. Install:
- Use pip to install awq
```
pip install awq
```
### Build source
<details>
<summary>Build AutoAWQ from scratch</summary>
``` ```
git clone https://github.com/casper-hansen/AutoAWQ git clone https://github.com/casper-hansen/AutoAWQ
...@@ -22,6 +33,8 @@ cd AutoAWQ ...@@ -22,6 +33,8 @@ cd AutoAWQ
pip install -e . pip install -e .
``` ```
</details>
## Supported models ## Supported models
The detailed support list: The detailed support list:
...@@ -36,6 +49,7 @@ The detailed support list: ...@@ -36,6 +49,7 @@ The detailed support list:
| OPT | 125m/1.3B/2.7B/6.7B/13B/30B | | OPT | 125m/1.3B/2.7B/6.7B/13B/30B |
| Bloom | 560m/3B/7B/ | | Bloom | 560m/3B/7B/ |
| LLaVA-v0 | 13B | | LLaVA-v0 | 13B |
| GPTJ | 6.7B |
## Usage ## Usage
...@@ -44,8 +58,8 @@ Below, you will find examples for how to easily quantize a model and run inferen ...@@ -44,8 +58,8 @@ Below, you will find examples for how to easily quantize a model and run inferen
### Quantization ### Quantization
```python ```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer from transformers import AutoTokenizer
from awq.models.auto import AutoAWQForCausalLM
model_path = 'lmsys/vicuna-7b-v1.5' model_path = 'lmsys/vicuna-7b-v1.5'
quant_path = 'vicuna-7b-v1.5-awq' quant_path = 'vicuna-7b-v1.5-awq'
...@@ -68,8 +82,8 @@ tokenizer.save_pretrained(quant_path) ...@@ -68,8 +82,8 @@ tokenizer.save_pretrained(quant_path)
Run inference on a quantized model from Huggingface: Run inference on a quantized model from Huggingface:
```python ```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer from transformers import AutoTokenizer
from awq.models.auto import AutoAWQForCausalLM
quant_path = "casperhansen/vicuna-7b-v1.5-awq" quant_path = "casperhansen/vicuna-7b-v1.5-awq"
quant_file = "awq_model_w4_g128.pt" quant_file = "awq_model_w4_g128.pt"
...@@ -101,8 +115,11 @@ Benchmark speeds may vary from server to server and that it also depends on your ...@@ -101,8 +115,11 @@ Benchmark speeds may vary from server to server and that it also depends on your
| MPT-30B | A6000 | OOM | 31.57 | -- | | MPT-30B | A6000 | OOM | 31.57 | -- |
| Falcon-7B | A6000 | 39.44 | 27.34 | 1.44x | | Falcon-7B | A6000 | 39.44 | 27.34 | 1.44x |
<details>
For example, here is the difference between a fast and slow CPU on MPT-7B: <summary>Detailed benchmark (CPU vs. GPU)</summary>
Here is the difference between a fast and slow CPU on MPT-7B:
RTX 4090 + Intel i9 13900K (2 different VMs): RTX 4090 + Intel i9 13900K (2 different VMs):
- CUDA 12.0, Driver 525.125.06: 134 tokens/s (7.46 ms/token) - CUDA 12.0, Driver 525.125.06: 134 tokens/s (7.46 ms/token)
...@@ -113,6 +130,8 @@ RTX 4090 + AMD EPYC 7-Series (3 different VMs): ...@@ -113,6 +130,8 @@ RTX 4090 + AMD EPYC 7-Series (3 different VMs):
- CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token) - CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token)
- CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token) - CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token)
</details>
## Reference ## Reference
If you find AWQ useful or relevant to your research, you can cite their [paper](https://arxiv.org/abs/2306.00978): If you find AWQ useful or relevant to your research, you can cite their [paper](https://arxiv.org/abs/2306.00978):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment