You need to sign in or sign up before continuing.
Unverified Commit c387d9c0 authored by Yuting Jiang's avatar Yuting Jiang Committed by GitHub
Browse files

Doc - Update benchmark doc (#465)

**Description**
Update benchmark doc of cublaslt, cublas-function and model benchmarks
parent f6a08908
......@@ -67,8 +67,8 @@ Measure the GEMM performance of [`cublasLtMatmul`](https://docs.nvidia.com/cuda/
#### Metrics
| Name | Unit | Description |
|---------------------------------|----------------|---------------------------------|
| cublaslt-gemm/dtype_m_n_k_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. |
|------------------------------------------------|----------------|---------------------------------|
| cublaslt-gemm/${dtype}\_${m}\_${n}\_${k}_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. |
### `cublas-function`
......@@ -87,8 +87,10 @@ The supported functions for cuBLAS are as follows:
#### Metrics
| Name | Unit | Description |
|----------------------------------------------------------|-----------|-------------------------------------------------------------------|
| cublas-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cublas function with the parameters. |
|-------------------------------------------------------------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------|
| cublas-function/name\_${function_name}\_${parameters}_time | time (us) | The mean time to execute the cublas function with the parameters. |
| cublas-function/name\_${function_name}\_${parameters}_correctness | | Whether the calculation results of executing the cublas function with the parameters pass the correctness check if enable correctness check. |
| cublas-function/name\_${function_name}\_${parameters}_error | | The error ratio of the calculation results of executing the cublas function with the parameters if enable correctness check. |
### `cudnn-function`
......@@ -104,8 +106,8 @@ The supported functions for cuDNN are as follows:
#### Metrics
| Name | Unit | Description |
|---------------------------------------------------------|-----------|------------------------------------------------------------------|
| cudnn-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |
|-----------------------------------------------------------|-----------|------------------------------------------------------------------|
| cudnn-function/name\_${function_name}\_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |
### `tensorrt-inference`
......@@ -264,9 +266,10 @@ Support the following traffic patterns:
| rccl-bw/${operation}_${msg_size}_algbw | bandwidth (GB/s) | RCCL operation algorithm bandwidth with given message size. |
| rccl-bw/${operation}_${msg_size}_busbw | bandwidth (GB/s) | RCCL operation bus bandwidth with given message size. |
If traffic pattern is specified, the metrics pattern will change to `nccl-bw/${operation}_${serial_index)_${parallel_index):${msg_size}_time`
If mpi mode is enable and traffic pattern is specified, the metrics pattern will change to `nccl-bw/${operation}_${serial_index)_${parallel_index):${msg_size}_time`
- `serial_index` represents the serial index of the host group in serial.
- `parallel_index` represents the parallel index of the host list in parallel.
### `tcp-connectivity`
#### Introduction
......
......@@ -30,19 +30,15 @@ including the following categories:
For inference, supported percentiles include
50<sup>th</sup>, 90<sup>th</sup>, 95<sup>th</sup>, 99<sup>th</sup>, and 99.9<sup>th</sup>.
**New: Support fp8_hybrid and fp8_e4m3 precision for BERT models.**
#### Metrics
| Name | Unit | Description |
|---------------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| model-benchmarks/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| model-benchmarks/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| model-benchmarks/pytorch-${model_name}/fp32_inference_step_time | time (ms) | The average inference step time with single precision. |
| model-benchmarks/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | The average inference throughput with single precision. |
| model-benchmarks/pytorch-${model_name}/fp32_inference_step_time\_${percentile} | time (ms) | The n<sup>th</sup> percentile inference step time with single precision. |
| model-benchmarks/pytorch-${model_name}/fp32_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with single precision. |
| model-benchmarks/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| model-benchmarks/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| model-benchmarks/pytorch-${model_name}/fp16_inference_step_time | time (ms) | The average inference step time with half precision. |
| model-benchmarks/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | The average inference throughput with half precision. |
| model-benchmarks/pytorch-${model_name}/fp16_inference_step_time\_${percentile} | time (ms) | The n<sup>th</sup> percentile inference step time with half precision. |
| model-benchmarks/pytorch-${model_name}/fp16_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with half precision. |
|-----------------------------------------------------------------------------------------|------------------------|------------------------------------------------------------------------------|
| model-benchmarks/pytorch-${model_name}/${precision}_train_step_time | time (ms) | The average training step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_train_throughput | throughput (samples/s) | The average training throughput with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time | time (ms) | The average inference step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput | throughput (samples/s) | The average inference throughput with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time\_${percentile} | time (ms) | The n<sup>th</sup> percentile inference step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with fp32/fp16 precision. |
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment