--- id: micro-benchmarks --- # Micro Benchmarks ## Benchmarking list ### Computation benchmark ### Communication benchmark ### Computation-communication benchmark ### Storage benchmark ## Benchmarking metrics
Metrics
  • Computation Benchmark
    • GEMM FLOPS
      • GFLOPS
      • TensorCore
      • cuBLAS
      • cuDNN
    • Kernel Launch Time
      • Kernel_Launch_Event_Time
      • Kernel_Launch_Wall_Time
    • Operator Performance
      • MatMul
      • Sharding_MatMul
  • Communication Benchmark
    • Memory
      • H2D_Mem_BW_<GPU ID>
      • D2H_Mem_BW_<GPU ID>
    • Device P2P Bandwidth
      • P2P_BW_Max
      • P2P_BW_Min
      • P2P_BW_Avg
    • RDMA
      • RDMA_Peak
      • RDMA_Avg
    • NCCL
      • NCCL_AllReduce
      • NCCL_AllGather
      • NCCL_broadcast
      • NCCL_reduce
      • NCCL_reduce_scatter
  • Computation-Communication Benchmark
    • Mul_During_NCCL
    • MatMul_During_NCCL
  • Storage Benchmark
    • Disk
      • Seq_Read/Seq_Write
      • Rand_Read/Rand_Write
      • Seq_R/W_Read
      • Seq_R/W_Write
      • Rand_R/W_Read
      • Rand_R/W_Write