AutoAWQ is a package that implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs. AutoAWQ will speed up your LLM by at least 2x compared to FP16. AutoAWQ was created and improved upon from the [original work](https://github.com/mit-han-lab/llm-awq) from MIT.
<palign="center">
| <ahref="https://github.com/casper-hansen/AutoAWQ/issues/32"><b>Roadmap</b></a> | <ahref="https://github.com/casper-hansen/AutoAWQ/tree/main/examples"><b>Examples</b></a> | <ahref="https://github.com/casper-hansen/AutoAWQ/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22"><b>Issues: Help Wanted</b></a> |
</p>
Roadmap:
AutoAWQ is an easy-to-use package for 4-bit quantized models. AutoAWQ speeds up models by 2x while reducing memory requirements by 3x compared to FP16. AutoAWQ implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs. AutoAWQ was created and improved upon from the [original work](https://github.com/mit-han-lab/llm-awq) from MIT.
- [x] Publish pip package
- [ ] Refactor quantization code
- [ ] Support more models
- [ ] Optimize the speed of models
*Latest News* 🔥
- [2023/09] Multi-GPU support, bug fixes, and better benchmark scripts available
- [2023/08] PyPi package released and AutoModel class available
## Install
...
...
@@ -62,12 +63,11 @@ The detailed support list:
| Falcon | 7B/40B |
| OPT | 125m/1.3B/2.7B/6.7B/13B/30B |
| Bloom | 560m/3B/7B/ |
| LLaVA-v0 | 13B |
| GPTJ | 6.7B |
## Usage
Below, you will find examples for how to easily quantize a model and run inference.
Below, you will find examples of how to easily quantize a model and run inference.