quantization_benchmark.rst

Performance of Quantized Models
==================================

.. attention:: 
    To be updated for Qwen3.

This section reports the generation performance of quantized
models (including GPTQ and AWQ) of the Qwen2 series. Specifically, we
report:

* MMLU (Accuracy)
* C-Eval (Accuracy)
* IFEval (Strict Prompt-Level Accuracy)

We use greedy decoding in evaluating all models.

+---------------------+--------------+---------+-------+--------+--------+
|                     | Quantization | Average | MMLU  | C-Eval | IFEval |
+=====================+==============+=========+=======+========+========+
| Qwen2-72B-Instruct  | BF16         | 81.3    | 82.3  | 83.8   | 77.6   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int8    | 80.7    | 81.3  | 83.4   | 77.5   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int4    | 81.2    | 80.8  | 83.9   | 78.9   |
+                     +--------------+---------+-------+--------+--------+
|                     | AWQ          | 80.4    | 80.5  | 83.9   | 76.9   |
+---------------------+--------------+---------+-------+--------+--------+
| Qwen2-7B-Instruct   | BF16         | 66.9    | 70.5  | 77.2   | 53.1   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int8    | 66.2    | 69.1  | 76.7   | 52.9   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int4    | 64.1    | 67.8  | 75.2   | 49.4   |
+                     +--------------+---------+-------+--------+--------+
|                     | AWQ          | 64.1    | 67.4  | 73.6   | 51.4   |
+---------------------+--------------+---------+-------+--------+--------+
| Qwen2-1.5B-Instruct | BF16         | 48.4    | 52.4  | 63.8   | 29.0   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int8    | 48.1    | 53.0  | 62.5   | 28.8   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int4    | 45.0    | 50.7  | 57.4   | 27.0   |
+                     +--------------+---------+-------+--------+--------+
|                     | AWQ          | 46.5    | 51.6  | 58.1   | 29.9   |
+---------------------+--------------+---------+-------+--------+--------+
| Qwen2-0.5B-Instruct | BF16         | 34.4    | 37.9  | 45.2   | 20.0   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int8    | 32.6    | 35.6  | 43.9   | 18.1   |
+                     +--------------+---------+-------+--------+--------+
|                     | GPTQ-Int4    | 29.7    | 33.0  | 39.2   | 16.8   |
+                     +--------------+---------+-------+--------+--------+
|                     | AWQ          | 31.1    | 34.4  | 42.1   | 16.7   |
+---------------------+--------------+---------+-------+--------+--------+