Latest news

d374a2dd · Casper · GitHub · bcf78947 · d374a2dd
Unverified Commit d374a2dd authored Sep 06, 2023 by Casper Committed by GitHub Sep 06, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 8 deletions

README.md README.md +8 -8

No files found.
--- a/README.md
+++ b/README.md
 # AutoAWQ

-AutoAWQ is a package that implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs. AutoAWQ will speed up your LLM by at least 2x compared to FP16. AutoAWQ was created and improved upon from the [original work](https://github.com/mit-han-lab/llm-awq) from MIT.
+<p align="center">
+| <a href="https://github.com/casper-hansen/AutoAWQ/issues/32"><b>Roadmap</b></a> | <a href="https://github.com/casper-hansen/AutoAWQ/tree/main/examples"><b>Examples</b></a> | <a href="https://github.com/casper-hansen/AutoAWQ/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22"><b>Issues: Help Wanted</b></a> |
+</p>

-Roadmap:
+AutoAWQ is an easy-to-use package for 4-bit quantized models. AutoAWQ speeds up models by 2x while reducing memory requirements by 3x compared to FP16. AutoAWQ implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs.  AutoAWQ was created and improved upon from the [original work](https://github.com/mit-han-lab/llm-awq) from MIT.

- [x] Publish pip package
- [ ] Refactor quantization code
- [ ] Support more models
- [ ] Optimize the speed of models
+*Latest News* 🔥
+- [2023/09] Multi-GPU support, bug fixes, and better benchmark scripts available
+- [2023/08] PyPi package released and AutoModel class available

 ## Install

@@ -62,12 +63,11 @@ The detailed support list:
 | Falcon   | 7B/40B                      |
 | OPT      | 125m/1.3B/2.7B/6.7B/13B/30B |
 | Bloom    | 560m/3B/7B/                 |
-| LLaVA-v0 | 13B                         |
 | GPTJ     | 6.7B                        |

 ## Usage

-Below, you will find examples for how to easily quantize a model and run inference.
+Below, you will find examples of how to easily quantize a model and run inference.

 ### Quantization