README.md 1.83 KB
Newer Older
Gilbert Lee's avatar
Gilbert Lee committed
1
2
# TransferBench

Lisa Delaney's avatar
Lisa Delaney committed
3
4
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
CPU and GPU devices.
Gilbert Lee's avatar
Gilbert Lee committed
5
6
7
8
9
10

## Requirements

1. ROCm stack installed on the system (HIP runtime)
2. libnuma installed on system

11
12
13
14
15
16
17
18
19
20
21
22
## Documentation

Run the steps below to build documentation locally.

```
cd docs

pip3 install -r .sphinx/requirements.txt

python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```

Gilbert Lee's avatar
Gilbert Lee committed
23
## Building
Lisa Delaney's avatar
Lisa Delaney committed
24

PedramAlizadeh's avatar
PedramAlizadeh committed
25
  To build TransferBench using Makefile:
Lisa Delaney's avatar
Lisa Delaney committed
26
27
28
29
30
31
32
33
34
35
36
37
38

```shell
make
```

To build TransferBench using CMake:

```shell
mkdir build
cd build
CXX=/opt/rocm/bin/hipcc cmake ..
make
```
Gilbert Lee's avatar
Gilbert Lee committed
39
40

  If ROCm is installed in a folder other than `/opt/rocm/`, set ROCM_PATH appropriately
Gilbert Lee's avatar
Gilbert Lee committed
41

42
43
## NVIDIA platform support

44
TransferBench may also be built to run on NVIDIA platforms either via HIP, or native nvcc
45

46
To build with HIP for NVIDIA (requires HIP-compatible CUDA version installed e.g. CUDA 11.5):
47
48
49
```
   CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
```
Gilbert Lee's avatar
Gilbert Lee committed
50

51
52
53
54
55
To build with native nvcc: (Builds TransferBenchCuda)
```
   make
```

Gilbert Lee's avatar
Gilbert Lee committed
56
57
58
59
60
61
62
63
64
65
66
67
## Hints and suggestions
- Running TransferBench with no arguments will display usage instructions and detected topology information
- There are several preset configurations that can be used instead of a configuration file
  including:
  - p2p    - Peer to peer benchmark test
  - sweep  - Sweep across possible sets of Transfers
  - rsweep - Random sweep across possible sets of Transfers
- When using the same GPU executor in multiple simultaneous Transfers, performance may be
  serialized due to the maximum number of hardware queues available.
  - The number of maximum hardware queues can be adjusted via GPU_MAX_HW_QUEUES
  - Alternatively, running in single stream mode (USE_SINGLE_STREAM=1) may avoid this issue
    by launching all Transfers on a single stream instead of individual streams