README.md 1.15 KB
Newer Older
Gilbert Lee's avatar
Gilbert Lee committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# TransferBench

TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs).

## Requirements

1. ROCm stack installed on the system (HIP runtime)
2. libnuma installed on system

## Building
  To build TransferBench:
* `make`

  If ROCm is installed in a folder other than `/opt/rocm/`, set ROCM_PATH appropriately
Gilbert Lee's avatar
Gilbert Lee committed
15
16
17
18
19
20
21
22
23
24
25
26
27
28


## Hints and suggestions
- Running TransferBench with no arguments will display usage instructions and detected topology information
- There are several preset configurations that can be used instead of a configuration file
  including:
  - p2p    - Peer to peer benchmark test
  - sweep  - Sweep across possible sets of Transfers
  - rsweep - Random sweep across possible sets of Transfers
- When using the same GPU executor in multiple simultaneous Transfers, performance may be
  serialized due to the maximum number of hardware queues available.
  - The number of maximum hardware queues can be adjusted via GPU_MAX_HW_QUEUES
  - Alternatively, running in single stream mode (USE_SINGLE_STREAM=1) may avoid this issue
    by launching all Transfers on a single stream instead of individual streams