[Doc] Remove performance warning for auto_awq.md (#12743)

c53dc466 · Michael Goin · GitHub · 3d09e592 · c53dc466
Unverified Commit c53dc466 authored Feb 05, 2025 by Michael Goin Committed by GitHub Feb 04, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 6 deletions

docs/source/features/quantization/auto_awq.md docs/source/features/quantization/auto_awq.md +0 -6

No files found.
--- a/docs/source/features/quantization/auto_awq.md
+++ b/docs/source/features/quantization/auto_awq.md
@@ -2,12 +2,6 @@

 # AutoAWQ

-:::{warning}
-Please note that AWQ support in vLLM is under-optimized at the moment. We would recommend using the unquantized version of the model for better
-accuracy and higher throughput. Currently, you can use AWQ as a way to reduce memory footprint. As of now, it is more suitable for low latency
-inference with small number of concurrent requests. vLLM's AWQ implementation have lower throughput than unquantized version.
-:::
-
 To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).
 Quantizing reduces the model's precision from FP16 to INT4 which effectively reduces the file size by ~70%.
 The main benefits are lower latency and memory usage.