speed_benchmark.rst 72.2 KB
Newer Older
chenzk's avatar
v1.0  
chenzk committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
Speed Benchmark
=========================

.. attention:: 
    To be updated for Qwen3.

This section reports the speed performance of bf16 models, quantized models 
(including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2.5 series. Specifically, we
report the inference speed (tokens/s) as well as memory footprint (GB)
under the conditions of different context lengths.

The environment of the evaluation with huggingface transformers is:

-  NVIDIA A100 80GB
-  CUDA 12.1
-  Pytorch 2.3.1
-  Flash Attention 2.5.8
-  Transformers 4.46.0
-  AutoGPTQ 0.7.1+cu121 (Compiled from source code)
-  AutoAWQ 0.2.6


The environment of the evaluation with vLLM is:

-  NVIDIA A100 80GB
-  CUDA 12.1
-  vLLM 0.6.3
-  Pytorch 2.4.0
-  Flash Attention 2.6.3
-  Transformers 4.46.0


Notes:

- We use the batch size of 1 and the least number of GPUs as
  possible for the evaluation.
- We test the speed and memory of generating 2048 tokens with 
  the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 
  tokens.
- For vLLM, the memory usage is not reported because it pre-allocates
  all GPU memory. We use ``gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False``
  by default.



-  0.5B (Transformer)

+-------------------------+--------------+--------------+---------+-----------------+----------------+---------------------------+
| Model                   | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                      |
+=========================+==============+==============+=========+=================+================+===========================+
| Qwen2.5-0.5B-Instruct   | 1            | BF16         | 1       | 47.40           | 0.97           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int8    | 1       | 35.17           | 0.64           | auto_gptq==0.6.0+cu1210   |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int4    | 1       | 50.60           | 0.48           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | AWQ          | 1       | 37.09           | 0.68           |                           |
+                         +--------------+--------------+---------+-----------------+----------------+---------------------------+
|                         | 6144         | BF16         | 1       | 47.45           | 1.23           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int8    | 1       | 36.47           | 0.90           | auto_gptq==0.6.0+cu1210   |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int4    | 1       | 48.89           | 0.73           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | AWQ          | 1       | 37.04           | 0.72           |                           |
+                         +--------------+--------------+---------+-----------------+----------------+---------------------------+
|                         | 14336        | BF16         | 1       | 47.11           | 1.60           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int8    | 1       | 35.44           | 1.26           | auto_gptq==0.6.0+cu1210   |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int4    | 1       | 48.26           | 1.10           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | AWQ          | 1       | 37.14           | 1.10           |                           |
+                         +--------------+--------------+---------+-----------------+----------------+---------------------------+
|                         | 30720        | BF16         | 1       | 47.16           | 2.34           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int8    | 1       | 36.25           | 2.01           | auto_gptq==0.6.0+cu1210   |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | GPTQ-Int4    | 1       | 49.22           | 1.85           |                           |
+                         +              +--------------+---------+-----------------+----------------+---------------------------+
|                         |              | AWQ          | 1       | 36.90           | 1.84           |                           |
+-------------------------+--------------+--------------+---------+-----------------+----------------+---------------------------+


-  0.5B (vLLM)

+-------------------------+--------------+--------------+---------+-----------------+
| Model                   | Input Length | Quantization | GPU Num | Speed(tokens/s) |
+=========================+==============+==============+=========+=================+
| Qwen2.5-0.5B-Instruct   | 1            | BF16         | 1       | 311.55          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int8    | 1       | 257.07          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int4    | 1       | 260.93          |
+                         +              +--------------+---------+-----------------+
|                         |              | AWQ          | 1       | 261.95          |
+                         +--------------+--------------+---------+-----------------+
|                         | 6144         | BF16         | 1       | 304.79          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int8    | 1       | 254.10          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int4    | 1       | 257.33          |
+                         +              +--------------+---------+-----------------+
|                         |              | AWQ          | 1       | 259.80          |
+                         +--------------+--------------+---------+-----------------+
|                         | 14336        | BF16         | 1       | 290.28          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int8    | 1       | 243.69          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int4    | 1       | 247.01          |
+                         +              +--------------+---------+-----------------+
|                         |              | AWQ          | 1       | 249.58          |
+                         +--------------+--------------+---------+-----------------+
|                         | 30720        | BF16         | 1       | 264.51          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int8    | 1       | 223.86          |
+                         +              +--------------+---------+-----------------+
|                         |              | GPTQ-Int4    | 1       | 226.50          |
+                         +              +--------------+---------+-----------------+
|                         |              | AWQ          | 1       | 229.84          |
+-------------------------+--------------+--------------+---------+-----------------+



-  1.5B (Transformer)

+--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
+==========================+==============+==============+=========+=================+================+=========================+
| Qwen2.5-1.5B-Instruct    | 1            | BF16         | 1       | 39.68           | 2.95           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 32.62           | 1.82           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 43.33           | 1.18           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 31.70           | 1.51           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 6144         | BF16         | 1       | 40.88           | 3.43           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 31.46           | 2.30           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 43.96           | 1.66           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 32.30           | 1.63           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 14336        | BF16         | 1       | 40.43           | 4.16           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 31.06           | 3.03           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 43.66           | 2.39           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 32.39           | 2.36           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 30720        | BF16         | 1       | 38.59           | 5.62           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 31.04           | 4.49           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 35.68           | 3.85           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 31.95           | 3.82           |                         |
+--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+


-  1.5B (vLLM)

+--------------------------+--------------+--------------+---------+-----------------+
| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) |
+==========================+==============+==============+=========+=================+
| Qwen2.5-1.5B-Instruct    | 1            | BF16         | 1       | 183.33          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 201.67          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 217.03          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 213.74          |
+                          +--------------+--------------+---------+-----------------+
|                          | 6144         | BF16         | 1       | 176.68          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 192.83          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 206.63          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 203.64          |
+                          +--------------+--------------+---------+-----------------+
|                          | 14336        | BF16         | 1       | 168.69          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 183.69          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 195.88          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 192.64          |
+                          +--------------+--------------+---------+-----------------+
|                          | 30720        | BF16         | 1       | 152.04          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 162.82          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 173.57          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 170.20          |
+--------------------------+--------------+--------------+---------+-----------------+



-  3B (Transformer)

+--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
+==========================+==============+==============+=========+=================+================+=========================+
| Qwen2.5-3B-Instruct      | 1            | BF16         | 1       | 30.80           | 5.95           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 25.69           | 3.38           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 35.21           | 2.06           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 25.29           | 2.50           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 6144         | BF16         | 1       | 32.20           | 6.59           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 24.69           | 3.98           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 34.47           | 2.67           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 24.86           | 2.62           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 14336        | BF16         | 1       | 31.72           | 7.47           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 24.70           | 4.89           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 34.36           | 3.58           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 25.19           | 3.54           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 30720        | BF16         | 1       | 25.37           | 9.30           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 21.67           | 6.72           | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 23.60           | 5.41           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 24.56           | 5.37           |                         |
+--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+


-  3B (vLLM)

+--------------------------+--------------+--------------+---------+-----------------+
| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) |
+==========================+==============+==============+=========+=================+
| Qwen2.5-3B-Instruct      | 1            | BF16         | 1       | 127.61          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 150.02          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 168.20          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 165.50          |
+                          +--------------+--------------+---------+-----------------+
|                          | 6144         | BF16         | 1       | 123.15          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 143.09          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 159.85          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 156.38          |
+                          +--------------+--------------+---------+-----------------+
|                          | 14336        | BF16         | 1       | 117.35          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 135.50          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 149.35          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 147.75          |
+                          +--------------+--------------+---------+-----------------+
|                          | 30720        | BF16         | 1       | 105.88          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int8    | 1       | 118.38          |
+                          +              +--------------+---------+-----------------+
|                          |              | GPTQ-Int4    | 1       | 129.28          |
+                          +              +--------------+---------+-----------------+
|                          |              | AWQ          | 1       | 127.19          |
+--------------------------+--------------+--------------+---------+-----------------+



-  7B (Transformer)

+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
+=============================+==============+==============+=========+=================+================+=========================+
| Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 40.38           | 14.38          |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int8    | 1       | 31.55           | 8.42           | auto_gptq==0.6.0+cu1210 |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int4    | 1       | 43.10           | 5.52           |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | AWQ          | 1       | 32.03           | 5.39           |                         |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                             | 6144         | BF16         | 1       | 38.76           | 15.38          |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int8    | 1       | 31.26           | 9.43           | auto_gptq==0.6.0+cu1210 |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int4    | 1       | 38.27           | 6.52           |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | AWQ          | 1       | 32.37           | 6.39           |                         |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                             | 14336        | BF16         | 1       | 29.78           | 16.91          |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int8    | 1       | 26.86           | 10.96          | auto_gptq==0.6.0+cu1210 |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int4    | 1       | 28.70           | 8.05           |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | AWQ          | 1       | 30.23           | 7.92           |                         |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                             | 30720        | BF16         | 1       | 18.83           | 19.97          |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int8    | 1       | 17.59           | 14.01          | auto_gptq==0.6.0+cu1210 |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | GPTQ-Int4    | 1       | 18.45           | 11.11          |                         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------+
|                             |              | AWQ          | 1       | 19.11           | 10.98          |                         |
+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+



-  7B (vLLM)

+-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
+=============================+==============+==============+=========+=================+===========================================+
| Qwen2.5-7B-Instruct         | 1            | BF16         | 1       | 84.28           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 122.01          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 154.05          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 148.10          |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 6144         | BF16         | 1       | 80.70           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 112.38          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 141.98          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 137.64          |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 14336        | BF16         | 1       | 77.69           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 105.25          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 129.35          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 124.91          |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 30720        | BF16         | 1       | 70.33           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 90.71           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 108.30          |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 104.66          |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 63488        | BF16         | 1       | 50.86           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 60.52           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 67.97           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 66.42           | setting-64k                               |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 129024       | BF16         | 1       | 28.94           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 25.97           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 26.37           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 26.57           | vllm==0.6.2, new sample config            |
+-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+

* [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
* [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)

- 14B (Transformer)

+--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+
| Model                    | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                    |
+==========================+==============+==============+=========+=================+================+=========================+
| Qwen2.5-14B-Instruct     | 1            | BF16         | 1       | 24.74           | 28.08          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 18.84           | 16.11          | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 25.89           | 9.94           |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 19.23           | 9.79           |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 6144         | BF16         | 1       | 20.51           | 29.50          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 17.80           | 17.61          | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 20.06           | 11.36          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 19.21           | 11.22          |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 14336        | BF16         | 1       | 13.92           | 31.95          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 12.66           | 19.98          | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 13.79           | 13.81          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 14.17           | 13.67          |                         |
+                          +--------------+--------------+---------+-----------------+----------------+-------------------------+
|                          | 30720        | BF16         | 1       | 8.20            | 36.85          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int8    | 1       | 7.77            | 24.88          | auto_gptq==0.6.0+cu1210 |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | GPTQ-Int4    | 1       | 8.14            | 18.71          |                         |
+                          +              +--------------+---------+-----------------+----------------+-------------------------+
|                          |              | AWQ          | 1       | 8.31            | 18.57          |                         |
+--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+


- 14B (vLLM)

+-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
+=============================+==============+==============+=========+=================+===========================================+
| Qwen2.5-14B-Instruct        | 1            | BF16         | 1       | 46.30           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 70.40           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 98.02           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 92.66           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 6144         | BF16         | 1       | 43.83           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 64.33           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 86.10           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 83.11           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 14336        | BF16         | 1       | 41.91           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 59.21           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 76.85           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 74.03           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 30720        | BF16         | 1       | 37.18           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 49.23           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 60.91           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 59.01           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 63488        | BF16         | 1       | 26.85           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 32.83           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 37.67           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 36.71           | setting-64k                               |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 129024       | BF16         | 1       | 14.53           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 15.10           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 15.13           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 15.25           | vllm==0.6.2, new sample config            |
+-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+

* [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
* [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)



- 32B (Transformer)

+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
+=============================+==============+==============+=========+=================+================+===========================================+
| Qwen2.5-32B-Instruct        | 1            | BF16         | 1       | 17.54           | 61.58          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 14.52           | 33.56          | auto_gptq==0.6.0+cu1210                   |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 19.20           | 18.94          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 14.60           | 18.67          |                                           |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
|                             | 6144         | BF16         | 1       | 12.49           | 63.72          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 11.61           | 35.86          | auto_gptq==0.6.0+cu1210                   |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 13.42           | 21.09          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 13.81           | 20.81          |                                           |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
|                             | 14336        | BF16         | 1       | 8.95            | 67.31          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 8.53            | 39.28          | auto_gptq==0.6.0+cu1210                   |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 9.48            | 24.67          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 9.71            | 24.39          |                                           |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
|                             | 30720        | BF16         | 1       | 5.59            | 74.47          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 5.42            | 46.45          | auto_gptq==0.6.0+cu1210                   |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 5.79            | 31.84          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 5.85            | 31.56          |                                           |
+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+





- 32B (vLLM)

+-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
+=============================+==============+==============+=========+=================+===========================================+
| Qwen2.5-32B-Instruct        | 1            | BF16         | 1       | 22.13           | setting1                                  |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 37.57           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 55.83           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 51.92           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 6144         | BF16         | 1       | 21.05           | setting1                                  |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 34.67           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 49.96           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 46.68           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 14336        | BF16         | 1       | 19.91           | setting1                                  |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 31.89           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 44.79           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 41.83           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 30720        | BF16         | 2       | 31.82           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 26.88           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 35.66           |                                           |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 33.75           |                                           |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 63488        | BF16         | 2       | 24.45           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 18.60           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 22.72           | setting-64k                               |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 21.79           | setting-64k                               |
+                             +--------------+--------------+---------+-----------------+-------------------------------------------+
|                             | 129024       | BF16         | 2       | 14.31           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 1       | 9.77            | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 10.39           | vllm==0.6.2, new sample config            |
+                             +              +--------------+---------+-----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 10.34           | vllm==0.6.2, new sample config            |
+-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+

  * For context length 129024, the model needs to be predicted with the following config: "model_max_length"=131072
  * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)
  * [Setting 1]=(gpu_memory_utilization=1.0 max_model_len=32768 enforce_eager=True)
  * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
  * [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)



- 72B (Transformer)

+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
| Model                       | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note                                      |
+=============================+==============+==============+=========+=================+================+===========================================+
| Qwen2.5-72B-Instruct        | 1            | BF16         | 2       | 8.73            | 136.20         |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 2       | 8.66            | 72.61          |           auto_gptq==0.6.0+cu1210         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 11.07           | 39.91          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 11.50           | 39.44          |                                           |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
|                             | 6144         | BF16         | 2       | 6.39            | 140.00         |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 2       | 6.39            | 77.81          |           auto_gptq==0.6.0+cu1210         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 7.56            | 42.50          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 8.17            | 42.13          |                                           |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
|                             | 14336        | BF16         | 3       | 4.25            | 149.14         |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 2       | 4.66            | 82.55          |           auto_gptq==0.6.0+cu1210         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 1       | 5.27            | 46.86          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 1       | 5.57            | 46.38          |                                           |
+                             +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+
|                             | 30720        | BF16         | 3       | 2.94            | 164.79         |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int8    | 2       | 2.94            | 94.75          |           auto_gptq==0.6.0+cu1210         |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | GPTQ-Int4    | 2       | 3.14            | 62.57          |                                           |
+                             +              +--------------+---------+-----------------+----------------+-------------------------------------------+
|                             |              | AWQ          | 2       | 3.23            | 61.64          |                                           |
+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+




- 72B (vLLM)

+------------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+
| Model                        | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note                                      |
+==============================+==============+==============+=========+=================+===========================================+
| Qwen2.5-72B-Instruct         | 1            | BF16         | 2       | 18.19           | Setting 1                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | BF16         | 4       | 31.37           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int8    | 2       | 31.40           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 1       | 16.47           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 2       | 46.30           | Setting 2                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | AWQ          | 2       | 44.30           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              | 6144         | BF16         | 4       | 29.90           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int8    | 2       | 29.37           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 1       | 13.88           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 2       | 42.50           | Setting 3                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | AWQ          | 2       | 40.67           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              | 14336        | BF16         | 4       | 30.10           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int8    | 2       | 27.20           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 2       | 38.10           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | AWQ          | 2       | 36.63           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              | 30720        | BF16         | 4       | 27.53           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int8    | 2       | 23.32           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 2       | 30.98           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | AWQ          | 2       | 30.02           | Default                                   |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              | 63488        | BF16         | 4       | 20.74           | Setting 4                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int8    | 2       | 16.27           | Setting 4                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 2       | 19.84           | Setting 4                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | AWQ          | 2       | 19.32           | Setting 4                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              | 129024       | BF16         | 4       | 12.68           | Setting 5                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int8    | 4       | 14.11           | Setting 5                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | GPTQ-Int4    | 2       | 10.11           | Setting 5                                 |
+                              +--------------+--------------+---------+-----------------+-------------------------------------------+
|                              |              | AWQ          | 2       | 9.88            | Setting 5                                 |
+------------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+

  * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)
  * [Setting 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)
  * [Setting 2]=(gpu_memory_utilization=1.0 max_model_len=4096 enforce_eager=True)
  * [Setting 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)
  * [Setting 4]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)
  * [Setting 5]=(gpu_memory_utilization=0.9 max_model_len=131072 enforce_eager=False)