README.md 2.52 KB
Newer Older
Gilbert Lee's avatar
Gilbert Lee committed
1
2
# TransferBench

Lisa Delaney's avatar
Lisa Delaney committed
3
4
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
CPU and GPU devices.
Gilbert Lee's avatar
Gilbert Lee committed
5

Lisa Delaney's avatar
Lisa Delaney committed
6
7
8
Documentation for TransferBench is available at
[https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html](https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html).

Gilbert Lee's avatar
Gilbert Lee committed
9
10
## Requirements

Lisa Delaney's avatar
Lisa Delaney committed
11
12
* You must have a ROCm stack installed on your system (HIP runtime)
* You must have `libnuma` installed on your system
13
* AMD IOMMU must be enabled and set to passthrough for AMD Instinct cards
Gilbert Lee's avatar
Gilbert Lee committed
14

15
16
## Documentation

Lisa Delaney's avatar
Lisa Delaney committed
17
To build documentation locally, use the following code:
18

Lisa Delaney's avatar
Lisa Delaney committed
19
```shell
20
21
22
23
24
25
26
cd docs

pip3 install -r .sphinx/requirements.txt

python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```

Lisa Delaney's avatar
Lisa Delaney committed
27
## Building TransferBench
Lisa Delaney's avatar
Lisa Delaney committed
28

Lisa Delaney's avatar
Lisa Delaney committed
29
You can build TransferBench using Makefile or CMake.
Lisa Delaney's avatar
Lisa Delaney committed
30

Lisa Delaney's avatar
Lisa Delaney committed
31
* Makefile:
Lisa Delaney's avatar
Lisa Delaney committed
32

Lisa Delaney's avatar
Lisa Delaney committed
33
34
35
  ```shell
  make
  ```
Lisa Delaney's avatar
Lisa Delaney committed
36

Lisa Delaney's avatar
Lisa Delaney committed
37
* CMake:
Gilbert Lee's avatar
Gilbert Lee committed
38

Lisa Delaney's avatar
Lisa Delaney committed
39
40
41
42
43
44
45
46
  ```shell
  mkdir build
  cd build
  CXX=/opt/rocm/bin/hipcc cmake ..
  make
  ```

  If ROCm is not installed in `/opt/rocm/`, you must set `ROCM_PATH` to the correct location.
Gilbert Lee's avatar
Gilbert Lee committed
47

48
49
## NVIDIA platform support

Lisa Delaney's avatar
Lisa Delaney committed
50
You can build TransferBench to run on NVIDIA platforms via HIP or native NVCC.
51

Lisa Delaney's avatar
Lisa Delaney committed
52
53
Use the following code to build with HIP for NVIDIA (note that you must have a HIP-compatible CUDA
version installed, e.g., CUDA 11.5):
Gilbert Lee's avatar
Gilbert Lee committed
54

Lisa Delaney's avatar
Lisa Delaney committed
55
56
```shell
CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
57
```
Lisa Delaney's avatar
Lisa Delaney committed
58
59
60
61
62

Use the following code to build with native NVCC (builds `TransferBenchCuda`):

```shell
make
63
64
```

Lisa Delaney's avatar
Lisa Delaney committed
65
66
67
68
69
## Things to note

* Running TransferBench with no arguments displays usage instructions and detected topology
  information
* You can use several preset configurations instead of a configuration file:
70
71
72
  * `a2a` : All-to-all benchmark test
  * `cmdline` : Take in Transfers to run from command-line instead of via file
  * `healthcheck` : Simple health check (supported on MI300 series only)
73
  * `p2p`    : Peer-to-peer benchmark test
74
  * `pcopy`  : Benchmark parallel copies from a single GPU to other GPUs
75
  * `rsweep` : Random sweep across possible sets of transfers
76
  * `rwrite` : Benchmarks parallel remote writes from a single GPU to other GPUs
77
  * `scaling`: GPU subexecutor scaling tests
78
  * `schmoo` : Local/Remote read/write/copy between two GPUs
79
80
81
82
  * `sweep`  : Sweep across possible sets of transfers

* When using the same GPU executor in multiple simultaneous transfers on separate streams (USE_SINGLE_STREAM=0),
  performance may be serialized due to the maximum number of hardware queues available
Lisa Delaney's avatar
Lisa Delaney committed
83
  * The number of maximum hardware queues can be adjusted via `GPU_MAX_HW_QUEUES`