Notice for deprecation of AutoAWQ (#26820)

Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Notice for deprecation of AutoAWQ (#26820)
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
b92ab3de · HDCharles · GitHub · acaa2c0a · b92ab3de
Unverified Commit b92ab3de authored Oct 14, 2025 by HDCharles Committed by GitHub Oct 14, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

docs/features/quantization/auto_awq.md docs/features/quantization/auto_awq.md +4 -0

No files found.
--- a/docs/features/quantization/auto_awq.md
+++ b/docs/features/quantization/auto_awq.md
 # AutoAWQ
+> ⚠️ **Warning:**
+    The `AutoAWQ` library is deprecated. This functionality has been adopted by the vLLM project in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/tree/main/examples/awq).
+    For the recommended quantization workflow, please see the AWQ examples in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/tree/main/examples/awq). For more details on the deprecation, refer to the original [AutoAWQ repository](https://github.com/casper-hansen/AutoAWQ).
 To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).
 Quantization reduces the model's precision from BF16/FP16 to INT4 which effectively reduces the total model memory footprint.
 The main benefits are lower latency and memory usage.