--- id: micro-benchmarks --- # Micro Benchmarks ## Benchmarking list ### Computation benchmark ### Communication benchmark ### Computation-communication benchmark ### Storage benchmark ## Benchmarking metrics
Micro Benchmark Model Benchmark
Metrics
  • Computation Benchmark
    • Kernel Performance
      • GFLOPS
      • TensorCore
      • cuBLAS
      • cuDNN
    • Kernel Launch Time
      • Kernel_Launch_Event_Time
      • Kernel_Launch_Wall_Time
    • Operator Performance
      • MatMul
      • Sharding_MatMul
    • Memory
      • H2D_Mem_BW_<GPU ID>
      • H2D_Mem_BW_<GPU ID>
  • Communication Benchmark
    • Device P2P Bandwidth
      • P2P_BW_Max
      • P2P_BW_Min
      • P2P_BW_Avg
    • RDMA
      • RDMA_Peak
      • RDMA_Avg
    • NCCL
      • NCCL_AllReduce
      • NCCL_AllGather
      • NCCL_broadcast
      • NCCL_reduce
      • NCCL_reduce_scatter
  • Computation-Communication Benchmark
    • Mul_During_NCCL
    • MatMul_During_NCCL
  • Storage Benchmark
    • Disk
      • Read/Write
      • Rand_Read/Rand_Write
      • R/W_Read
      • R/W_Write
      • Rand_R/W_Read
      • Rand_R/W_Write
  • CNN models
    • ResNet
      • ResNet-50
      • ResNet-101
      • ResNet-152
    • DenseNet
      • DenseNet-169
      • DenseNet-201
    • VGG
      • VGG-11
      • VGG-13
      • VGG-16
      • VGG-19
    • Other CNN models
      • ...
  • BERT models
    • BERT
    • BERT_LARGE
  • LSTM
  • GPT-2