Unverified Commit 2c88db90 authored by Yifan Xiong's avatar Yifan Xiong Committed by GitHub
Browse files

Release - SuperBench v0.10.0 (#607)

**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - U...
parent 2c2096ed
...@@ -18,6 +18,7 @@ jobs: ...@@ -18,6 +18,7 @@ jobs:
docker: docker:
name: Docker build ${{ matrix.name }} name: Docker build ${{ matrix.name }}
runs-on: ${{ matrix.runner }} runs-on: ${{ matrix.runner }}
timeout-minutes: 600
permissions: permissions:
contents: read contents: read
packages: write packages: write
...@@ -27,15 +28,23 @@ jobs: ...@@ -27,15 +28,23 @@ jobs:
- name: cuda12.2 - name: cuda12.2
dockerfile: cuda12.2 dockerfile: cuda12.2
tags: superbench/main:cuda12.2 tags: superbench/main:cuda12.2
runner: ubuntu-latest runner: [self-hosted, rocm-build]
build_args: "NUM_MAKE_JOBS=64"
- name: cuda11.1.1 - name: cuda11.1.1
dockerfile: cuda11.1.1 dockerfile: cuda11.1.1
tags: superbench/main:cuda11.1.1,superbench/superbench:latest tags: superbench/main:cuda11.1.1,superbench/superbench:latest
runner: ubuntu-latest runner: ubuntu-latest
build_args: "NUM_MAKE_JOBS=8"
- name: rocm5.7 - name: rocm5.7
dockerfile: rocm5.7.x dockerfile: rocm5.7.x
tags: superbench/main:rocm5.7 tags: superbench/main:rocm5.7
runner: [self-hosted, rocm-build] runner: [self-hosted, rocm-build]
build_args: "NUM_MAKE_JOBS=64"
- name: rocm6.0
dockerfile: rocm6.0.x
tags: superbench/main:rocm6.0
runner: [self-hosted, rocm-build]
build_args: "NUM_MAKE_JOBS=64"
steps: steps:
- name: Checkout - name: Checkout
uses: actions/checkout@v2 uses: actions/checkout@v2
...@@ -75,7 +84,7 @@ jobs: ...@@ -75,7 +84,7 @@ jobs:
fi fi
DOCKERFILE=dockerfile/${{ matrix.dockerfile }}.dockerfile DOCKERFILE=dockerfile/${{ matrix.dockerfile }}.dockerfile
BUILD_ARGS="NUM_MAKE_JOBS=8" BUILD_ARGS=${{ matrix.build_args }}
if [[ "${{ matrix.extra_args }}" ]]; then if [[ "${{ matrix.extra_args }}" ]]; then
BUILD_ARGS="${BUILD_ARGS} ${{ matrix.extra_args }}" BUILD_ARGS="${BUILD_ARGS} ${{ matrix.extra_args }}"
fi fi
...@@ -87,11 +96,11 @@ jobs: ...@@ -87,11 +96,11 @@ jobs:
CACHE_TO="type=inline,mode=max" CACHE_TO="type=inline,mode=max"
fi fi
echo ::set-output name=dockerfile::${DOCKERFILE} echo "dockerfile=${DOCKERFILE}" >> "$GITHUB_OUTPUT"
echo ::set-output name=build_args::${BUILD_ARGS} echo "build_args=${BUILD_ARGS}" >> "$GITHUB_OUTPUT"
echo ::set-output name=tags::${TAGS} echo "tags=${TAGS}" >> "$GITHUB_OUTPUT"
echo ::set-output name=cache_from::${CACHE_FROM} echo "cache_from=${CACHE_FROM}" >> "$GITHUB_OUTPUT"
echo ::set-output name=cache_to::${CACHE_TO} echo "cache_to=${CACHE_TO}" >> "$GITHUB_OUTPUT"
- name: Echo build args - name: Echo build args
run: echo ${{ steps.metadata.outputs.build_args }} run: echo ${{ steps.metadata.outputs.build_args }}
- name: Echo image tag - name: Echo image tag
...@@ -106,6 +115,9 @@ jobs: ...@@ -106,6 +115,9 @@ jobs:
with: with:
username: ${{ secrets.DOCKERHUB_USERNAME }} username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }} password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Pull cache image
run: sudo docker pull ${{ steps.metadata.outputs.tags }}
continue-on-error: true
- name: Login to the GitHub Container Registry - name: Login to the GitHub Container Registry
uses: docker/login-action@v1 uses: docker/login-action@v1
if: ${{ github.event_name == 'release' }} if: ${{ github.event_name == 'release' }}
......
...@@ -24,3 +24,9 @@ ...@@ -24,3 +24,9 @@
[submodule "third_party/msccl"] [submodule "third_party/msccl"]
path = third_party/msccl path = third_party/msccl
url = https://github.com/Azure/msccl url = https://github.com/Azure/msccl
[submodule "third_party/Megatron/Megatron-LM"]
path = third_party/Megatron/Megatron-LM
url = https://github.com/NVIDIA/Megatron-LM.git
[submodule "third_party/Megatron/Megatron-DeepSpeed"]
path = third_party/Megatron/Megatron-DeepSpeed
url = https://github.com/microsoft/Megatron-DeepSpeed.git
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
__SuperBench__ is a validation and profiling tool for AI infrastructure. __SuperBench__ is a validation and profiling tool for AI infrastructure.
📢 [v0.9.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.9.0) has been released! 📢 [v0.10.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.10.0) has been released!
## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._ ## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._
......
...@@ -7,7 +7,7 @@ FROM nvcr.io/nvidia/pytorch:23.10-py3 ...@@ -7,7 +7,7 @@ FROM nvcr.io/nvidia/pytorch:23.10-py3
# NVIDIA: # NVIDIA:
# - CUDA: 12.2.2 # - CUDA: 12.2.2
# - cuDNN: 8.9.5 # - cuDNN: 8.9.5
# - NCCL: v2.19.3-1 # - NCCL: v2.18.3-1
# Mellanox: # Mellanox:
# - OFED: 23.07-0.5.1.2 # - OFED: 23.07-0.5.1.2
# - HPC-X: v2.16 # - HPC-X: v2.16
...@@ -113,6 +113,13 @@ RUN cd /tmp && \ ...@@ -113,6 +113,13 @@ RUN cd /tmp && \
mv amd-blis /opt/AMD && \ mv amd-blis /opt/AMD && \
rm -rf aocl-blis-linux-aocc-4.0.tar.gz rm -rf aocl-blis-linux-aocc-4.0.tar.gz
# Install NCCL 2.18.3
RUN cd /tmp && \
git clone -b v2.18.3-1 https://github.com/NVIDIA/nccl.git && \
cd nccl && \
make -j src.build && \
make install && \
rm -rf /tmp/nccl
ENV PATH="${PATH}" \ ENV PATH="${PATH}" \
LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" \ LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" \
......
...@@ -54,6 +54,8 @@ RUN curl -s -L https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -o "% ...@@ -54,6 +54,8 @@ RUN curl -s -L https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -o "%
# Run the setup script to install the visual studio components # Run the setup script to install the visual studio components
RUN "%SB_HOME%\\dockerfile\\directx\\install-components.bat" RUN "%SB_HOME%\\dockerfile\\directx\\install-components.bat"
RUN powershell -Command "Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem -Name LongPathsEnabled -Value 1;"
RUN git config --system core.longpaths true
# Install Superbench # Install Superbench
RUN python -m pip install setuptools==65.0.0 && \ RUN python -m pip install setuptools==65.0.0 && \
python -m pip install --no-cache-dir .[amdworker] && \ python -m pip install --no-cache-dir .[amdworker] && \
......
<system version="1"> <system version="1">
<cpu numaid="0" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49"> <cpu numaid="0" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:01.0" class="0x060400" link_speed="16 GT/s" link_width="16"> <pci busid="ffff:ff:01.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="0001:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0101:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="0002:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0102:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
<cpu numaid="1" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:02.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="0003:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/> <pci busid="0003:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0103:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/> <pci busid="0103:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="0004:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/> <pci busid="0004:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0104:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/> <pci busid="0104:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci> </pci>
</cpu> </cpu>
<cpu numaid="2" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49"> <cpu numaid="1" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:03.0" class="0x060400" link_speed="16 GT/s" link_width="16"> <pci busid="ffff:ff:02.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="000b:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/> <pci busid="0001:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0105:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/> <pci busid="0101:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="000c:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/> <pci busid="0002:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0106:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/> <pci busid="0102:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci> </pci>
</cpu> </cpu>
<cpu numaid="3" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49"> <cpu numaid="2" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:04.0" class="0x060400" link_speed="16 GT/s" link_width="16"> <pci busid="ffff:ff:03.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="000d:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/> <pci busid="000d:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0107:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/> <pci busid="0107:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="000e:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/> <pci busid="000e:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0108:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/> <pci busid="0108:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci> </pci>
</cpu> </cpu>
<cpu numaid="3" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:04.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="000b:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0105:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="000c:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0106:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
</system> </system>
<system version="1">
<cpu numaid="0" affinity="ffffffff,ffff0000,00000000" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143">
<pci busid="ffff:ff:01.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0001:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0101:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:02.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0002:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0102:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:03.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0003:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0103:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:04.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0008:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0104:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
</cpu>
<cpu numaid="1" affinity="00000000,0000ffff,ffffffff" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143">
<pci busid="ffff:ff:05.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0009:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0105:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:06.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="000a:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0106:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:07.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="000b:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0107:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:08.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="000c:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0108:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
</cpu>
</system>
\ No newline at end of file
...@@ -17,6 +17,7 @@ RUN apt-get update && \ ...@@ -17,6 +17,7 @@ RUN apt-get update && \
apt-get -q install -y --no-install-recommends \ apt-get -q install -y --no-install-recommends \
autoconf \ autoconf \
automake \ automake \
bc \
build-essential \ build-essential \
curl \ curl \
dmidecode \ dmidecode \
...@@ -27,6 +28,7 @@ RUN apt-get update && \ ...@@ -27,6 +28,7 @@ RUN apt-get update && \
libaio-dev \ libaio-dev \
libboost-program-options-dev \ libboost-program-options-dev \
libcap2 \ libcap2 \
libcurl4-openssl-dev \
libnuma-dev \ libnuma-dev \
libpci-dev \ libpci-dev \
libssl-dev \ libssl-dev \
...@@ -38,6 +40,7 @@ RUN apt-get update && \ ...@@ -38,6 +40,7 @@ RUN apt-get update && \
openssh-client \ openssh-client \
openssh-server \ openssh-server \
pciutils \ pciutils \
python3-mpi4py \
rsync \ rsync \
sudo \ sudo \
util-linux \ util-linux \
...@@ -46,11 +49,11 @@ RUN apt-get update && \ ...@@ -46,11 +49,11 @@ RUN apt-get update && \
&& \ && \
rm -rf /tmp/* rm -rf /tmp/*
ARG NUM_MAKE_JOBS=16 ARG NUM_MAKE_JOBS=
# Check if CMake is installed and its version # Check if CMake is installed and its version
RUN cmake_version=$(cmake --version 2>/dev/null | grep -oP "(?<=cmake version )(\d+\.\d+)" || echo "0.0") && \ RUN cmake_version=$(cmake --version 2>/dev/null | grep -oP "(?<=cmake version )(\d+\.\d+)" || echo "0.0") && \
required_version="3.26.4" && \ required_version="3.24.1" && \
if [ "$(printf "%s\n" "$required_version" "$cmake_version" | sort -V | head -n 1)" != "$required_version" ]; then \ if [ "$(printf "%s\n" "$required_version" "$cmake_version" | sort -V | head -n 1)" != "$required_version" ]; then \
echo "existing cmake version is ${cmake_version}" && \ echo "existing cmake version is ${cmake_version}" && \
cd /tmp && \ cd /tmp && \
...@@ -100,40 +103,26 @@ RUN if ! command -v ofed_info >/dev/null 2>&1; then \ ...@@ -100,40 +103,26 @@ RUN if ! command -v ofed_info >/dev/null 2>&1; then \
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}* ; \ rm -rf MLNX_OFED_LINUX-${OFED_VERSION}* ; \
fi fi
# Install UCX # Add target file to help determine which device(s) to build for
ENV UCX_VERSION=1.14.1 ENV ROCM_PATH=/opt/rocm
RUN if [ -z "$(ls -A /opt/ucx)" ]; then \ RUN bash -c 'echo -e "gfx90a:xnack-\ngfx90a:xnac+\ngfx940\ngfx941\ngfx942\ngfx1030\ngfx1100\ngfx1101\ngfx1102\n" >> ${ROCM_PATH}/bin/target.lst'
echo "/opt/ucx is empty. Installing UCX..."; \
cd /tmp && \
git clone https://github.com/openucx/ucx.git -b v${UCX_VERSION} && \
cd ucx && \
./autogen.sh && \
mkdir build && \
cd build && \
../configure -prefix=$UCX_DIR --with-rocm=/opt/rocm --without-knem && \
make -j $(nproc) && make -j $(nproc) install && rm -rf /tmp/ucx-${UCX_VERSION} ; \
else \
echo "/opt/ucx is not empty. Skipping UCX installation."; \
fi
# Install OpenMPI # Install OpenMPI
ENV OPENMPI_VERSION=4.1.x ENV OPENMPI_VERSION=4.1.x
ENV MPI_HOME=/usr/local/mpi
# Check if Open MPI is installed # Check if Open MPI is installed
RUN [ -d /usr/local/bin/mpirun ] || { \ RUN cd /tmp && \
echo "Open MPI not found. Installing Open MPI..." && \
cd /tmp && \
git clone --recursive https://github.com/open-mpi/ompi.git -b v${OPENMPI_VERSION} && \ git clone --recursive https://github.com/open-mpi/ompi.git -b v${OPENMPI_VERSION} && \
cd ompi && \ cd ompi && \
./autogen.pl && \ ./autogen.pl && \
mkdir build && \ mkdir build && \
cd build && \ cd build && \
../configure --prefix=/usr/local --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --enable-prte-prefix-by-default --enable-mca-no-build=btl-uct --with-ucx=/opt/ucx --with-rocm=/opt/rocm && \ ../configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --enable-prte-prefix-by-default --with-rocm=/opt/rocm && \
make -j $(nproc) && \ make -j $(nproc) && \
make -j $(nproc) install && \ make -j $(nproc) install && \
ldconfig && \ ldconfig && \
cd / && \ cd / && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}* ;\ rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
}
# Install Intel MLC # Install Intel MLC
RUN cd /tmp && \ RUN cd /tmp && \
...@@ -148,12 +137,18 @@ RUN cd /opt/ && \ ...@@ -148,12 +137,18 @@ RUN cd /opt/ && \
cd rccl && \ cd rccl && \
mkdir build && \ mkdir build && \
cd build && \ cd build && \
CXX=/opt/rocm/bin/hipcc cmake -DCMAKE_PREFIX_PATH=/opt/rocm/ .. && \ CXX=/opt/rocm/bin/hipcc cmake -DHIP_COMPILER=clang -DCMAKE_BUILD_TYPE=Release -DCMAKE_VERBOSE_MAKEFILE=1 \
-DCMAKE_PREFIX_PATH="${ROCM_PATH}/hsa;${ROCM_PATH}/hip;${ROCM_PATH}/share/rocm/cmake/;${ROCM_PATH}" \
.. && \
make -j${NUM_MAKE_JOBS} make -j${NUM_MAKE_JOBS}
ENV PATH="/opt/superbench/bin:/usr/local/bin/:/opt/rocm/hip/bin/:/opt/rocm/bin/:${PATH}" \ # Install AMD SMI Python Library
RUN cd /opt/rocm/share/amd_smi && \
python3 -m pip install --user .
ENV PATH="/usr/local/mpi/bin:/opt/superbench/bin:/usr/local/bin/:/opt/rocm/hip/bin/:/opt/rocm/bin/:${PATH}" \
LD_PRELOAD="/opt/rccl/build/librccl.so:$LD_PRELOAD" \ LD_PRELOAD="/opt/rccl/build/librccl.so:$LD_PRELOAD" \
LD_LIBRARY_PATH="/opt/ucx/lib:/usr/local/lib/:/opt/rocm/lib:${LD_LIBRARY_PATH}" \ LD_LIBRARY_PATH="/usr/local/mpi/lib:/usr/lib/x86_64-linux-gnu/:/usr/local/lib/:/opt/rocm/lib:${LD_LIBRARY_PATH}" \
SB_HOME=/opt/superbench \ SB_HOME=/opt/superbench \
SB_MICRO_PATH=/opt/superbench \ SB_MICRO_PATH=/opt/superbench \
ANSIBLE_DEPRECATION_WARNINGS=FALSE \ ANSIBLE_DEPRECATION_WARNINGS=FALSE \
...@@ -163,13 +158,19 @@ RUN echo PATH="$PATH" > /etc/environment && \ ...@@ -163,13 +158,19 @@ RUN echo PATH="$PATH" > /etc/environment && \
echo LD_LIBRARY_PATH="$LD_LIBRARY_PATH" >> /etc/environment && \ echo LD_LIBRARY_PATH="$LD_LIBRARY_PATH" >> /etc/environment && \
echo SB_MICRO_PATH="$SB_MICRO_PATH" >> /etc/environment echo SB_MICRO_PATH="$SB_MICRO_PATH" >> /etc/environment
RUN apt install rocm-cmake -y && \
python3 -m pip install --upgrade pip wheel setuptools==65.7
WORKDIR ${SB_HOME} WORKDIR ${SB_HOME}
ADD third_party third_party
# Apply patch
RUN cd third_party/perftest && \
git apply ../perftest_rocm6.patch
RUN make RCCL_HOME=/opt/rccl/build/ ROCBLAS_BRANCH=release/rocm-rel-5.7.1.1 HIPBLASLT_BRANCH=release/rocm-rel-5.7 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm
ADD . . ADD . .
RUN apt install rocm-cmake -y && \ #ENV USE_HIPBLASLT_DATATYPE=1
python3 -m pip install --upgrade pip wheel setuptools==65.7 && \ RUN python3 -m pip install .[amdworker] && \
python3 -m pip install .[amdworker] && \ CXX=/opt/rocm/bin/hipcc make cppbuild && \
make postinstall make postinstall
RUN make cppbuild
ADD third_party third_party
RUN make RCCL_HOME=/opt/rccl/build/ ROCBLAS_BRANCH=release/rocm-rel-5.7.1.1 HIPBLASLT_BRANCH=release-staging/rocm-rel-5.7 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm
ARG BASE_IMAGE=rocm/pytorch:rocm6.0_ubuntu22.04_py3.9_pytorch_2.0.1
FROM ${BASE_IMAGE}
# OS:
# - Ubuntu: 22.04
# - Docker Client: 20.10.8
# ROCm:
# - ROCm: 6.0
# Lib:
# - torch: 2.0.1
# - rccl: 2.18.3+hip6.0 develop:7e1cbb4
# - hipblaslt: release/rocm-rel-6.0
# - openmpi: 4.1.x
# - apex: 1.0.0
# Intel:
# - mlc: v3.10
LABEL maintainer="SuperBench"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get -q install -y --no-install-recommends \
autoconf \
automake \
bc \
build-essential \
curl \
dmidecode \
git \
hipify-clang \
iproute2 \
jq \
libaio-dev \
libboost-program-options-dev \
libcap2 \
libcurl4-openssl-dev \
libnuma-dev \
libpci-dev \
libssl-dev \
libtinfo5 \
libtool \
lshw \
net-tools \
numactl \
openssh-client \
openssh-server \
pciutils \
python3-mpi4py \
rsync \
sudo \
util-linux \
vim \
wget \
&& \
rm -rf /tmp/*
ARG NUM_MAKE_JOBS=64
# Check if CMake is installed and its version
RUN cmake_version=$(cmake --version 2>/dev/null | grep -oP "(?<=cmake version )(\d+\.\d+)" || echo "0.0") && \
required_version="3.24.1" && \
if [ "$(printf "%s\n" "$required_version" "$cmake_version" | sort -V | head -n 1)" != "$required_version" ]; then \
echo "existing cmake version is ${cmake_version}" && \
cd /tmp && \
wget -q https://github.com/Kitware/CMake/releases/download/v${required_version}/cmake-${required_version}.tar.gz && \
tar xzf cmake-${required_version}.tar.gz && \
cd cmake-${required_version} && \
./bootstrap --prefix=/usr --no-system-curl --parallel=16 && \
make -j ${NUM_MAKE_JOBS} && \
make install && \
rm -rf /tmp/cmake-${required_version}* \
else \
echo "CMake version is greater than or equal to 3.23"; \
fi
# Install Docker
ENV DOCKER_VERSION=20.10.8
RUN cd /tmp && \
wget -q https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_VERSION}.tgz -O docker.tgz && \
tar --extract --file docker.tgz --strip-components 1 --directory /usr/local/bin/ && \
rm docker.tgz
# Update system config
RUN mkdir -p /root/.ssh && \
touch /root/.ssh/authorized_keys && \
mkdir -p /var/run/sshd && \
sed -i "s/[# ]*PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config && \
sed -i "s/[# ]*PermitUserEnvironment no/PermitUserEnvironment yes/" /etc/ssh/sshd_config && \
sed -i "s/[# ]*Port.*/Port 22/" /etc/ssh/sshd_config && \
echo "* soft nofile 1048576\n* hard nofile 1048576" >> /etc/security/limits.conf && \
echo "root soft nofile 1048576\nroot hard nofile 1048576" >> /etc/security/limits.conf
# Get Ubuntu version and set as an environment variable
RUN export UBUNTU_VERSION=$(lsb_release -r -s)
RUN echo "Ubuntu version: $UBUNTU_VERSION"
ENV UBUNTU_VERSION=${UBUNTU_VERSION}
# Install OFED
ENV OFED_VERSION=5.9-0.5.6.0
# Check if ofed_info is present and has a version
RUN if ! command -v ofed_info >/dev/null 2>&1; then \
echo "OFED not found. Installing OFED..."; \
cd /tmp && \
wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu${UBUNTU_VERSION}-x86_64.tgz && \
tar xzf MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu${UBUNTU_VERSION}-x86_64.tgz && \
PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu${UBUNTU_VERSION}-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}* ; \
fi
# Add target file to help determine which device(s) to build for
ENV ROCM_PATH=/opt/rocm
RUN bash -c 'echo -e "gfx90a:xnack-\ngfx90a:xnac+\ngfx940\ngfx941\ngfx942:sramecc+:xnack-\n" >> ${ROCM_PATH}/bin/target.lst'
# Install OpenMPI
ENV OPENMPI_VERSION=4.1.x
ENV MPI_HOME=/usr/local/mpi
# Check if Open MPI is installed
RUN cd /tmp && \
git clone --recursive https://github.com/open-mpi/ompi.git -b v${OPENMPI_VERSION} && \
cd ompi && \
./autogen.pl && \
mkdir build && \
cd build && \
../configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --enable-prte-prefix-by-default --with-rocm=/opt/rocm && \
make -j $(nproc) && \
make -j $(nproc) install && \
ldconfig && \
cd / && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
# Install Intel MLC
RUN cd /tmp && \
wget -q https://downloadmirror.intel.com/763324/mlc_v3.10.tgz -O mlc.tgz && \
tar xzf mlc.tgz Linux/mlc && \
cp ./Linux/mlc /usr/local/bin/ && \
rm -rf ./Linux mlc.tgz
# Install RCCL
RUN cd /opt/ && \
git clone https://github.com/ROCmSoftwarePlatform/rccl.git && \
cd rccl && \
mkdir build && \
cd build && \
CXX=/opt/rocm/bin/hipcc cmake -DHIP_COMPILER=clang -DCMAKE_BUILD_TYPE=Release -DCMAKE_VERBOSE_MAKEFILE=1 \
-DCMAKE_PREFIX_PATH="${ROCM_PATH}/hsa;${ROCM_PATH}/hip;${ROCM_PATH}/share/rocm/cmake/;${ROCM_PATH}" \
.. && \
make -j${NUM_MAKE_JOBS}
ENV PATH="/usr/local/mpi/bin:/opt/superbench/bin:/usr/local/bin/:/opt/rocm/hip/bin/:/opt/rocm/bin/:${PATH}" \
LD_PRELOAD="/opt/rccl/build/librccl.so:$LD_PRELOAD" \
LD_LIBRARY_PATH="/usr/local/mpi/lib:/usr/lib/x86_64-linux-gnu/:/usr/local/lib/:/opt/rocm/lib:${LD_LIBRARY_PATH}" \
SB_HOME=/opt/superbench \
SB_MICRO_PATH=/opt/superbench \
ANSIBLE_DEPRECATION_WARNINGS=FALSE \
ANSIBLE_COLLECTIONS_PATH=/usr/share/ansible/collections
RUN echo PATH="$PATH" > /etc/environment && \
echo LD_LIBRARY_PATH="$LD_LIBRARY_PATH" >> /etc/environment && \
echo SB_MICRO_PATH="$SB_MICRO_PATH" >> /etc/environment
RUN apt install rocm-cmake -y && \
python3 -m pip install --upgrade pip wheel setuptools==65.7
WORKDIR ${SB_HOME}
ADD third_party third_party
# Apply patch
RUN cd third_party/perftest && \
git apply ../perftest_rocm6.patch
RUN make RCCL_HOME=/opt/rccl/build/ ROCBLAS_BRANCH=release/rocm-rel-6.0 HIPBLASLT_BRANCH=release/rocm-rel-6.0 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm
RUN cd third_party/Megatron/Megatron-DeepSpeed && \
git apply ../megatron_deepspeed_rocm6.patch
ADD . .
ENV USE_HIP_DATATYPE=1
ENV USE_HIPBLAS_COMPUTETYPE=1
RUN python3 -m pip install .[amdworker] && \
CXX=/opt/rocm/bin/hipcc make cppbuild && \
make postinstall
...@@ -29,7 +29,7 @@ You need to [clone the code](./development.md#set-up) first before building the ...@@ -29,7 +29,7 @@ You need to [clone the code](./development.md#set-up) first before building the
export DOCKER_BUILDKIT=1 export DOCKER_BUILDKIT=1
docker buildx build \ docker buildx build \
--platform linux/amd64 --cache-to type=inline,mode=max \ --platform linux/amd64 --cache-to type=inline,mode=max \
--tag superbench-dev --file dockerfile/cuda12.1.dockerfile . --tag superbench-dev --file dockerfile/cuda12.2.dockerfile .
``` ```
</TabItem> </TabItem>
......
...@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it. ...@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
:::note Note :::note Note
You should checkout corresponding tag to use release version, for example, You should checkout corresponding tag to use release version, for example,
`git clone -b v0.9.0 https://github.com/microsoft/superbenchmark` `git clone -b v0.10.0 https://github.com/microsoft/superbenchmark`
::: :::
```bash ```bash
......
...@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password] ...@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
:::note Note :::note Note
You should deploy corresponding Docker image to use release version, for example, You should deploy corresponding Docker image to use release version, for example,
`sb deploy -f local.ini -i superbench/superbench:v0.9.0-cuda12.1` `sb deploy -f local.ini -i superbench/superbench:v0.10.0-cuda12.2`
You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone. You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.
......
...@@ -70,7 +70,7 @@ superbench: ...@@ -70,7 +70,7 @@ superbench:
<TabItem value='example'> <TabItem value='example'>
```yaml ```yaml
version: v0.9 version: v0.10
superbench: superbench:
enable: benchmark_1 enable: benchmark_1
monitor: monitor:
......
...@@ -58,17 +58,18 @@ Large scale matmul operation using `torch.matmul` with one GPU. ...@@ -58,17 +58,18 @@ Large scale matmul operation using `torch.matmul` with one GPU.
|--------------------------------|-----------|--------------------------------| |--------------------------------|-----------|--------------------------------|
| pytorch-matmul/nosharding_time | time (ms) | Time of pure matmul operation. | | pytorch-matmul/nosharding_time | time (ms) | Time of pure matmul operation. |
### `cublaslt-gemm` ### `cublaslt-gemm` / `hipblaslt-gemm`
#### Introduction #### Introduction
Measure the GEMM performance of [`cublasLtMatmul`](https://docs.nvidia.com/cuda/cublas/#cublasltmatmul). Measure the GEMM performance of [`cublasLtMatmul`](https://docs.nvidia.com/cuda/cublas/#cublasltmatmul) or [`hipblasLt-bench`](https://github.com/ROCm/hipBLASLt/blob/develop/clients/benchmarks/README.md).
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|----------------------------------------------------------|----------------|---------------------------------| |-----------------------------------------------------------|----------------|---------------------------------|
| cublaslt-gemm/${dtype}\_${batch}\_${m}\_${n}\_${k}_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. | | cublaslt-gemm/${dtype}\_${batch}\_${m}\_${n}\_${k}_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. |
| hipblaslt-gemm/${dtype}\_${batch}\_${m}\_${n}\_${k}_flops | FLOPS (TFLOPS) | TFLOPS of measured GEMM kernel. |
### `cublas-function` ### `cublas-function`
...@@ -243,6 +244,7 @@ or [AMD](https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/1_Utils ...@@ -243,6 +244,7 @@ or [AMD](https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/1_Utils
### `gpu-copy-bw` ### `gpu-copy-bw`
Measure the memory copy bandwidth performed by GPU SM/DMA engine, including device-to-host, host-to-device and device-to-device. Measure the memory copy bandwidth performed by GPU SM/DMA engine, including device-to-host, host-to-device and device-to-device.
For measurements of peer-to-peer communication performance between AMD GPUs, GPU memory buffers are allocated in `hipDeviceMallocUncached` (previous `hipDeviceMallocFinegrained`) mode to maximize performance.
#### Metrics #### Metrics
...@@ -283,6 +285,7 @@ Measure the performance of NCCL/RCCL operations under multi nodes' traffic patte ...@@ -283,6 +285,7 @@ Measure the performance of NCCL/RCCL operations under multi nodes' traffic patte
performed by [nccl-tests](https://github.com/NVIDIA/nccl-tests/tree/44df0bf010dcc95e840ca0fb7466c67cff3f1f0f) performed by [nccl-tests](https://github.com/NVIDIA/nccl-tests/tree/44df0bf010dcc95e840ca0fb7466c67cff3f1f0f)
or [rccl-tests](https://github.com/ROCmSoftwarePlatform/rccl-tests/tree/dc1ad4853d7ec738387d42a75a58a98d7af00c7b). or [rccl-tests](https://github.com/ROCmSoftwarePlatform/rccl-tests/tree/dc1ad4853d7ec738387d42a75a58a98d7af00c7b).
Support the following operations currently: allreduce, allgather, broadcast, reduce, reducescatter, alltoall. Support the following operations currently: allreduce, allgather, broadcast, reduce, reducescatter, alltoall.
Support both in-place and out-of-place measurements.
Support the following traffic patterns: Support the following traffic patterns:
* `all-nodes`, validate the NCCL/RCCL performance across all VM nodes simultaneously. * `all-nodes`, validate the NCCL/RCCL performance across all VM nodes simultaneously.
......
...@@ -28,26 +28,29 @@ available tags are listed below for all stable versions. ...@@ -28,26 +28,29 @@ available tags are listed below for all stable versions.
}> }>
<TabItem value='cuda'> <TabItem value='cuda'>
| Tag | Description | | Tag | Description |
|-------------------|------------------------------------| |--------------------|-------------------------------------|
| v0.9.0-cuda12.1 | SuperBench v0.9.0 with CUDA 12.1 | | v0.10.0-cuda12.2 | SuperBench v0.10.0 with CUDA 12.2 |
| v0.9.0-cuda11.1.1 | SuperBench v0.9.0 with CUDA 11.1.1 | | v0.10.0-cuda11.1.1 | SuperBench v0.10.0 with CUDA 11.1.1 |
| v0.8.0-cuda12.1 | SuperBench v0.8.0 with CUDA 12.1 | | v0.9.0-cuda12.1 | SuperBench v0.9.0 with CUDA 12.1 |
| v0.8.0-cuda11.1.1 | SuperBench v0.8.0 with CUDA 11.1.1 | | v0.9.0-cuda11.1.1 | SuperBench v0.9.0 with CUDA 11.1.1 |
| v0.7.0-cuda11.8 | SuperBench v0.7.0 with CUDA 11.8 | | v0.8.0-cuda12.1 | SuperBench v0.8.0 with CUDA 12.1 |
| v0.7.0-cuda11.1.1 | SuperBench v0.7.0 with CUDA 11.1.1 | | v0.8.0-cuda11.1.1 | SuperBench v0.8.0 with CUDA 11.1.1 |
| v0.6.0-cuda11.1.1 | SuperBench v0.6.0 with CUDA 11.1.1 | | v0.7.0-cuda11.8 | SuperBench v0.7.0 with CUDA 11.8 |
| v0.5.0-cuda11.1.1 | SuperBench v0.5.0 with CUDA 11.1.1 | | v0.7.0-cuda11.1.1 | SuperBench v0.7.0 with CUDA 11.1.1 |
| v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 | | v0.6.0-cuda11.1.1 | SuperBench v0.6.0 with CUDA 11.1.1 |
| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 | | v0.5.0-cuda11.1.1 | SuperBench v0.5.0 with CUDA 11.1.1 |
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 | | v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
| v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 | | v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
| v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 |
</TabItem> </TabItem>
<TabItem value='rocm'> <TabItem value='rocm'>
| Tag | Description | | Tag | Description |
|-------------------------------|--------------------------------------------------| |-------------------------------|--------------------------------------------------|
| v0.10.0-rocm5.7 | SuperBench v0.10.0 with ROCm 5.7 |
| v0.9.0-rocm5.1.3 | SuperBench v0.9.0 with ROCm 5.1.3 | | v0.9.0-rocm5.1.3 | SuperBench v0.9.0 with ROCm 5.1.3 |
| v0.9.0-rocm5.1.1 | SuperBench v0.9.0 with ROCm 5.1.1 | | v0.9.0-rocm5.1.1 | SuperBench v0.9.0 with ROCm 5.1.1 |
| v0.9.0-rocm5.0.1 | SuperBench v0.9.0 with ROCm 5.0.1 | | v0.9.0-rocm5.0.1 | SuperBench v0.9.0 with ROCm 5.0.1 |
......
...@@ -65,7 +65,7 @@ superbench: ...@@ -65,7 +65,7 @@ superbench:
example: example:
```yaml ```yaml
# SuperBench rules # SuperBench rules
version: v0.9 version: v0.10
superbench: superbench:
rules: rules:
failure-rule: failure-rule:
......
...@@ -58,7 +58,7 @@ superbench: ...@@ -58,7 +58,7 @@ superbench:
```yaml title="Example" ```yaml title="Example"
# SuperBench rules # SuperBench rules
version: v0.9 version: v0.10
superbench: superbench:
rules: rules:
kernel_launch: kernel_launch:
......
...@@ -6,5 +6,5 @@ ...@@ -6,5 +6,5 @@
Provide hardware and software benchmarks for AI systems. Provide hardware and software benchmarks for AI systems.
""" """
__version__ = '0.9.0' __version__ = '0.10.0'
__author__ = 'Microsoft' __author__ = 'Microsoft'
...@@ -94,6 +94,17 @@ def add_parser_arguments(self): ...@@ -94,6 +94,17 @@ def add_parser_arguments(self):
default=0, default=0,
help='Number of graph launch iterations. Set to 0 to disable graph mode. Default: 0.', help='Number of graph launch iterations. Set to 0 to disable graph mode. Default: 0.',
) )
self._parser.add_argument(
'--in_place',
action='store_true',
help='If specified, collect in-place numbers, else collect out-of-place numbers.',
)
self._parser.add_argument(
'--data_type',
type=str,
default='float',
help='Data type used in NCCL operations. Default: float.',
)
def _preprocess(self): def _preprocess(self):
"""Preprocess/preparation operations before the benchmarking. """Preprocess/preparation operations before the benchmarking.
...@@ -123,9 +134,10 @@ def _preprocess(self): ...@@ -123,9 +134,10 @@ def _preprocess(self):
return False return False
command = os.path.join(self._args.bin_dir, self._bin_name) command = os.path.join(self._args.bin_dir, self._bin_name)
command += ' -b {} -e {} -f {} -g {} -c {} -n {} -w {} -G {}'.format( command += ' -b {} -e {} -f {} -g {} -c {} -n {} -w {} -G {} -d {}'.format(
self._args.minbytes, self._args.maxbytes, str(self._args.stepfactor), str(self._args.ngpus), self._args.minbytes, self._args.maxbytes, str(self._args.stepfactor), str(self._args.ngpus),
str(self._args.check), str(self._args.iters), str(self._args.warmup_iters), str(self._args.graph_iters) str(self._args.check), str(self._args.iters), str(self._args.warmup_iters), str(self._args.graph_iters),
self._args.data_type
) )
self._commands.append(command) self._commands.append(command)
...@@ -171,9 +183,9 @@ def _process_raw_result(self, cmd_idx, raw_output): # noqa: C901 ...@@ -171,9 +183,9 @@ def _process_raw_result(self, cmd_idx, raw_output): # noqa: C901
content = content[out_of_place_index + 1:out_of_bound_index] content = content[out_of_place_index + 1:out_of_bound_index]
# Parse max out of bound bus bw as the result # Parse max out of bound bus bw as the result
size_index = -1 size_index = -1
time_index = -1 time_index = None
busbw_index = -1 busbw_index = None
algbw_index = -1 algbw_index = None
for line in content: for line in content:
if 'time' in line and 'busbw' in line: if 'time' in line and 'busbw' in line:
# Get index of selected column # Get index of selected column
...@@ -181,11 +193,17 @@ def _process_raw_result(self, cmd_idx, raw_output): # noqa: C901 ...@@ -181,11 +193,17 @@ def _process_raw_result(self, cmd_idx, raw_output): # noqa: C901
line = re.sub(r' +', ' ', line).split(' ') line = re.sub(r' +', ' ', line).split(' ')
# Get first index of condition in list, if it not existing, raise exception # Get first index of condition in list, if it not existing, raise exception
size_index = line.index('size') size_index = line.index('size')
time_index = line.index('time') - len(line) # Need index from the end because sometimes previous fields (like redop) can be empty
busbw_index = line.index('busbw') - len(line) if self._args.in_place:
algbw_index = line.index('algbw') - len(line) time_index = -1 - list(reversed(line)).index('time')
busbw_index = -1 - list(reversed(line)).index('busbw')
algbw_index = -1 - list(reversed(line)).index('algbw')
else:
time_index = line.index('time') - len(line)
busbw_index = line.index('busbw') - len(line)
algbw_index = line.index('algbw') - len(line)
break break
if size_index != -1 and busbw_index != -1 and time_index != -1 and algbw_index != -1: if size_index != -1 and busbw_index is not None and time_index is not None and algbw_index is not None:
for line in content: for line in content:
line = line.strip(' ') line = line.strip(' ')
line = re.sub(r' +', ' ', line).split(' ') line = re.sub(r' +', ' ', line).split(' ')
......
...@@ -493,13 +493,12 @@ def _process_raw_result(self, cmd_idx, raw_output): ...@@ -493,13 +493,12 @@ def _process_raw_result(self, cmd_idx, raw_output):
try: try:
output_lines = [x.strip() for x in raw_output.strip().splitlines()] output_lines = [x.strip() for x in raw_output.strip().splitlines()]
step_time = None step_times = []
for output_line in output_lines: for output_line in output_lines:
if ' ms per iteration' in output_line: if output_line.startswith('Latency of step'):
step_time = float(output_line.split(' ms per iteration')[0].split()[-1]) step_times.append(float(output_line.split(' ms')[0].split()[-1]))
break
return self._process_numeric_result( return self._process_numeric_result(
'step_times', [step_time], reduce_type=ReduceType.MAX, cal_percentile=True 'step_times', step_times, reduce_type=ReduceType.MAX, cal_percentile=True
) )
except BaseException as e: except BaseException as e:
return self._set_error_code_and_print_error_msg( return self._set_error_code_and_print_error_msg(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment