Doc - Update benchmark doc (#465)

**Description** Update benchmark doc of cublaslt, cublas-function and model benchmarks

Doc - Update benchmark doc (#465)
**Description** Update benchmark doc of cublaslt, cublas-function and model benchmarks
c387d9c0 · Yuting Jiang · GitHub · f6a08908 · c387d9c0 · c387d9c0
Unverified Commit c387d9c0 authored Jan 18, 2023 by Yuting Jiang Committed by GitHub Jan 18, 2023
Showing with 23 additions and 24 deletions

docs/user-tutorial/benchmarks/micro-benchmarks.md docs/user-tutorial/benchmarks/micro-benchmarks.md +13 -10

docs/user-tutorial/benchmarks/model-benchmarks.md docs/user-tutorial/benchmarks/model-benchmarks.md +10 -14

No files found.
--- a/docs/user-tutorial/benchmarks/micro-benchmarks.md
+++ b/docs/user-tutorial/benchmarks/micro-benchmarks.md
@@ -67,8 +67,8 @@ Measure the GEMM performance of [`cublasLtMatmul`](https://docs.nvidia.com/cuda/
 #### Metrics

 | Name                                           | Unit           | Description                     |
-|---------------------------------|----------------|---------------------------------|
-| cublaslt-gemm/dtype_m_n_k_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. |
+|------------------------------------------------|----------------|---------------------------------|
+| cublaslt-gemm/${dtype}\_${m}\_${n}\_${k}_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. |

 ### `cublas-function`

@@ -87,8 +87,10 @@ The supported functions for cuBLAS are as follows:
 #### Metrics

 | Name                                                              | Unit      | Description                                                                                                                                  |
-|----------------------------------------------------------|-----------|-------------------------------------------------------------------|
-| cublas-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cublas function with the parameters. |
+|-------------------------------------------------------------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------|
+| cublas-function/name\_${function_name}\_${parameters}_time        | time (us) | The mean time to execute the cublas function with the parameters.                                                                            |
+| cublas-function/name\_${function_name}\_${parameters}_correctness |           | Whether the calculation results of executing the cublas function with the parameters pass the correctness check if enable correctness check. |
+| cublas-function/name\_${function_name}\_${parameters}_error       |           | The error ratio of the calculation results of executing the cublas function with the parameters if enable correctness check.                 |

 ### `cudnn-function`

@@ -104,8 +106,8 @@ The supported functions for cuDNN are as follows:
 #### Metrics

 | Name                                                      | Unit      | Description                                                      |
-|---------------------------------------------------------|-----------|------------------------------------------------------------------|
-| cudnn-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |
+|-----------------------------------------------------------|-----------|------------------------------------------------------------------|
+| cudnn-function/name\_${function_name}\_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |

 ### `tensorrt-inference`

@@ -264,9 +266,10 @@ Support the following traffic patterns:
 | rccl-bw/${operation}_${msg_size}_algbw | bandwidth (GB/s) | RCCL operation algorithm bandwidth with given message size. |
 | rccl-bw/${operation}_${msg_size}_busbw | bandwidth (GB/s) | RCCL operation bus bandwidth with given message size.       |

-If traffic pattern is specified, the metrics pattern will change to `nccl-bw/${operation}_${serial_index)_${parallel_index):${msg_size}_time`
+If mpi mode is enable and traffic pattern is specified, the metrics pattern will change to `nccl-bw/${operation}_${serial_index)_${parallel_index):${msg_size}_time`
 - `serial_index` represents the serial index of the host group in serial.
 - `parallel_index` represents the parallel index of the host list in parallel.
+
 ### `tcp-connectivity`

 #### Introduction

--- a/docs/user-tutorial/benchmarks/model-benchmarks.md
+++ b/docs/user-tutorial/benchmarks/model-benchmarks.md
@@ -30,19 +30,15 @@ including the following categories:
 For inference, supported percentiles include
 50<sup>th</sup>, 90<sup>th</sup>, 95<sup>th</sup>, 99<sup>th</sup>, and 99.9<sup>th</sup>.

+**New: Support fp8_hybrid and fp8_e4m3 precision for BERT models.**
+
 #### Metrics

 | Name                                                                                    | Unit                   | Description                                                                  |
-|---------------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
-| model-benchmarks/pytorch-${model_name}/fp32_train_step_time                     | time (ms)              | The average training step time with single precision.                     |
-| model-benchmarks/pytorch-${model_name}/fp32_train_throughput                    | throughput (samples/s) | The average training throughput with single precision.                    |
-| model-benchmarks/pytorch-${model_name}/fp32_inference_step_time                 | time (ms)              | The average inference step time with single precision.                    |
-| model-benchmarks/pytorch-${model_name}/fp32_inference_throughput                | throughput (samples/s) | The average inference throughput with single precision.                   |
-| model-benchmarks/pytorch-${model_name}/fp32_inference_step_time\_${percentile}  | time (ms)              | The n<sup>th</sup> percentile inference step time with single precision.  |
-| model-benchmarks/pytorch-${model_name}/fp32_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with single precision. |
-| model-benchmarks/pytorch-${model_name}/fp16_train_step_time                     | time (ms)              | The average training step time with half precision.                       |
-| model-benchmarks/pytorch-${model_name}/fp16_train_throughput                    | throughput (samples/s) | The average training throughput with half precision.                      |
-| model-benchmarks/pytorch-${model_name}/fp16_inference_step_time                 | time (ms)              | The average inference step time with half precision.                      |
-| model-benchmarks/pytorch-${model_name}/fp16_inference_throughput                | throughput (samples/s) | The average inference throughput with half precision.                     |
-| model-benchmarks/pytorch-${model_name}/fp16_inference_step_time\_${percentile}  | time (ms)              | The n<sup>th</sup> percentile inference step time with half precision.    |
-| model-benchmarks/pytorch-${model_name}/fp16_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with half precision.   |
+|-----------------------------------------------------------------------------------------|------------------------|------------------------------------------------------------------------------|
+| model-benchmarks/pytorch-${model_name}/${precision}_train_step_time                     | time (ms)              | The average training step time with fp32/fp16 precision.                     |
+| model-benchmarks/pytorch-${model_name}/${precision}_train_throughput                    | throughput (samples/s) | The average training throughput with fp32/fp16 precision.                    |
+| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time                 | time (ms)              | The average inference step time with fp32/fp16 precision.                    |
+| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput                | throughput (samples/s) | The average inference throughput with fp32/fp16 precision.                   |
+| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time\_${percentile}  | time (ms)              | The n<sup>th</sup> percentile inference step time with fp32/fp16 precision.  |
+| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with fp32/fp16 precision. |