"src/torio/git@developer.sourcefind.cn:OpenDAS/torchaudio.git" did not exist on "09887246bb60d3fe5c2032fb8884d6b32ecf5241"
Unverified Commit dda6ebe5 authored by Lisa's avatar Lisa Committed by GitHub
Browse files

Merge pull request #62 from LisaDelaney/readme-updates

changelog updates
parents 004710fb 97b5e7fc
This diff is collapsed.
# TransferBench # TransferBench
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs). TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
CPU and GPU devices.
Documentation for TransferBench is available at
[https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html](https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html).
## Requirements ## Requirements
1. ROCm stack installed on the system (HIP runtime) * You must have a ROCm stack installed on your system (HIP runtime)
2. libnuma installed on system * You must have `libnuma` installed on your system
## Documentation ## Documentation
Run the steps below to build documentation locally. To build documentation locally, use the following code:
``` ```shell
cd docs cd docs
pip3 install -r .sphinx/requirements.txt pip3 install -r .sphinx/requirements.txt
...@@ -19,45 +23,54 @@ pip3 install -r .sphinx/requirements.txt ...@@ -19,45 +23,54 @@ pip3 install -r .sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
``` ```
## Building ## Building TransferBench
To build TransferBench using Makefile:
You can build TransferBench using Makefile or CMake.
* Makefile:
```shell ```shell
$ make make
``` ```
To build TransferBench using cmake: * CMake:
```shell ```shell
$ mkdir build mkdir build
$ cd build cd build
$ CXX=/opt/rocm/bin/hipcc cmake .. CXX=/opt/rocm/bin/hipcc cmake ..
$ make make
``` ```
If ROCm is installed in a folder other than `/opt/rocm/`, set ROCM_PATH appropriately If ROCm is not installed in `/opt/rocm/`, you must set `ROCM_PATH` to the correct location.
## NVIDIA platform support ## NVIDIA platform support
TransferBench may also be built to run on NVIDIA platforms either via HIP, or native nvcc You can build TransferBench to run on NVIDIA platforms via HIP or native NVCC.
To build with HIP for NVIDIA (requires HIP-compatible CUDA version installed e.g. CUDA 11.5): Use the following code to build with HIP for NVIDIA (note that you must have a HIP-compatible CUDA
``` version installed, e.g., CUDA 11.5):
CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
```
To build with native nvcc: (Builds TransferBenchCuda) ```shell
CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
``` ```
make
Use the following code to build with native NVCC (builds `TransferBenchCuda`):
```shell
make
``` ```
## Hints and suggestions ## Things to note
- Running TransferBench with no arguments will display usage instructions and detected topology information
- There are several preset configurations that can be used instead of a configuration file * Running TransferBench with no arguments displays usage instructions and detected topology
including: information
- p2p - Peer to peer benchmark test * You can use several preset configurations instead of a configuration file:
- sweep - Sweep across possible sets of Transfers * `p2p`: Peer-to-peer benchmark test
- rsweep - Random sweep across possible sets of Transfers * `sweep`: Sweep across possible sets of transfers
- When using the same GPU executor in multiple simultaneous Transfers, performance may be * `rsweep`: Random sweep across possible sets of transfers
serialized due to the maximum number of hardware queues available. * When using the same GPU executor in multiple simultaneous transfers, performance may be
- The number of maximum hardware queues can be adjusted via GPU_MAX_HW_QUEUES serialized due to the maximum number of hardware queues available
- Alternatively, running in single stream mode (USE_SINGLE_STREAM=1) may avoid this issue * The number of maximum hardware queues can be adjusted via `GPU_MAX_HW_QUEUES`
by launching all Transfers on a single stream instead of individual streams * Alternatively, running in single-stream mode (`USE_SINGLE_STREAM`=1) may avoid this issue
by launching all transfers on a single stream, rather than on individual streams
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment