1. 13 Sep, 2021 4 commits
    • Yuting Jiang's avatar
      Bug: Fix bug - fix bug of hipBusBandwidth build (#193) · 7e48ad34
      Yuting Jiang authored
      **Description**
      fix bug of hipBusBandwidth building
      
      **Major Revision**
      - it failed to enter the check 'hip/samples/1_Utils/hipBusBandwidth/CMakeLists.txt' when building docker, so removed this check
      - add sb_micro_path for rocm_bandwidthTest
      7e48ad34
    • Yuting Jiang's avatar
      Benchmarks: Build Pipeline - Restore rocblas build logic (#197) · ee5c7662
      Yuting Jiang authored
      **Description**
       restore rocblas build logic to cancel support of rocblas build in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 base image.
      
      **Major Revision**
      -  restore rocblas build logic, remove gpu target limit and other resource limit for rocm4.0.
      ee5c7662
    • Yuting Jiang's avatar
      Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) · 7a3a4502
      Yuting Jiang authored
      **Description**
      Add barrier before 'destroy_process_group' to resolve the bug due to when multi models in one model benchmark, some processes haven't finished the previous process group while others failed to initialize new process group for the next model on rocm4.x when running bert_models.
      
      **Major Revision**
      -  Add barrier before 'destroy_process_group'.
      7a3a4502
    • Yuting Jiang's avatar
      Bug - Revise 'docker run' in sb deploy (#195) · 1f9de77f
      Yuting Jiang authored
      **Description**
      
      Revise 'docker run' in sb deploy due to base image running endpoint/cmd under /root.
      
      **Major Revision**
      
      - define endpoint bash when 'docker run'
      1f9de77f
  2. 09 Sep, 2021 1 commit
  3. 06 Sep, 2021 1 commit
  4. 03 Sep, 2021 1 commit
    • Yuting Jiang's avatar
      Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode... · 60762518
      Yuting Jiang authored
      Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode and rename metric (#189)
      
      **Description**
      Revise arguments of nccl/rccl to support mpi mode for (mpi can not run in nccl/rccl due to multiple operators run in sequence without barrier) and rename metric .
      
      **Major Revision**
      - revise argument operators to be a single one
      
      **Minor Revision**
      - rename metric to remove benchmark name info
      - change argument ngpus default value to be 1
      60762518
  5. 02 Sep, 2021 6 commits
  6. 01 Sep, 2021 3 commits
  7. 31 Aug, 2021 5 commits
  8. 30 Aug, 2021 6 commits
  9. 27 Aug, 2021 4 commits
  10. 26 Aug, 2021 1 commit
  11. 25 Aug, 2021 1 commit
  12. 23 Aug, 2021 1 commit
  13. 22 Aug, 2021 1 commit
  14. 20 Aug, 2021 2 commits
    • guoshzhao's avatar
      Runner: Add Feature - Generate summarized output files. (#157) · 7595d794
      guoshzhao authored
      **Description**
      Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op`
      
      **Major Revision**
      - Generate the summarized json file per node:
      For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]`
      For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}`
      `[]` means optional.
      ```
      {
        "kernel-launch/overhead_event:0": 0.00583,
        "kernel-launch/overhead_event:1": 0.00545,
        "kernel-launch/overhead_event:2": 0.00581,
        "kernel-launch/overhead_event:3": 0.00572,
        "kernel-launch/overhead_event:4": 0.00559,
        "kernel-launch/overhead_event:5": 0.00591,
        "kernel-launch/overhead_event:6": 0.00562,
        "kernel-launch/overhead_event:7": 0.00586,
        "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134,
        "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773,
        "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677,
        "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973,
        "pytorch-sharding-matmul/0/allreduce": 10.561786651611328,
        "pytorch-sharding-matmul/1/allreduce": 10.561786651611328,
        "pytorch-sharding-matmul/0/allgather": 10.088025093078613,
        "pytorch-sharding-matmul/1/allgather": 10.088025093078613
      }
      ```
      - Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.
      7595d794
    • Yuting Jiang's avatar
      Benchmarks: Build Pipeline - Add build logic of hipBusBandwidth in third_party (#151) · a1e5c90d
      Yuting Jiang authored
      **Description**
      Add build logic of hipBusBandwidth in third_party.
      
      **Major Revision**
      - Add build logic of hipBusBandwidth in third_party
      a1e5c90d
  15. 19 Aug, 2021 1 commit
  16. 16 Aug, 2021 2 commits