Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
one
TransferBench
Commits
7e118b0e
Commit
7e118b0e
authored
Nov 02, 2023
by
Lisa Delaney
Browse files
readme updates
parent
70aff133
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
45 additions
and
36 deletions
+45
-36
README.md
README.md
+45
-36
No files found.
README.md
View file @
7e118b0e
...
@@ -3,16 +3,19 @@
...
@@ -3,16 +3,19 @@
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
CPU and GPU devices.
CPU and GPU devices.
Documentation for TransferBench is available at
[
https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html
](
https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html
)
.
## Requirements
## Requirements
1.
ROCm stack installed on
the
system (HIP runtime)
*
You must have a
ROCm stack installed on
your
system (HIP runtime)
2.
libnuma installed on system
*
You must have
`
libnuma
`
installed on
your
system
## Documentation
## Documentation
Run the steps below t
o build documentation locally
.
T
o build documentation locally
, use the following code:
```
```
shell
cd
docs
cd
docs
pip3
install
-r
.sphinx/requirements.txt
pip3
install
-r
.sphinx/requirements.txt
...
@@ -20,48 +23,54 @@ pip3 install -r .sphinx/requirements.txt
...
@@ -20,48 +23,54 @@ pip3 install -r .sphinx/requirements.txt
python3
-m
sphinx
-T
-E
-b
html
-d
_build/doctrees
-D
language
=
en
.
_build/html
python3
-m
sphinx
-T
-E
-b
html
-d
_build/doctrees
-D
language
=
en
.
_build/html
```
```
## Building
## Building
TransferBench
To
build TransferBench using Makefile
:
You can
build TransferBench using Makefile
or CMake.
```
shell
*
Makefile:
make
```
To build TransferBench using CMake:
```
shell
make
```
```
shell
*
CMake:
mkdir
build
cd
build
CXX
=
/opt/rocm/bin/hipcc cmake ..
make
```
If ROCm is installed in a folder other than
`/opt/rocm/`
, set ROCM_PATH appropriately
```
shell
mkdir
build
cd
build
CXX
=
/opt/rocm/bin/hipcc cmake ..
make
```
If ROCm is not installed in
`/opt/rocm/`
, you must set
`ROCM_PATH`
to the correct location.
## NVIDIA platform support
## NVIDIA platform support
TransferBench may also be built
to run on NVIDIA platforms
either
via HIP
,
or native
nvcc
You can build TransferBench
to run on NVIDIA platforms via HIP or native
NVCC.
To build with HIP for NVIDIA (requires HIP-compatible CUDA version installed e.g. CUDA 11.5):
Use the following code to build with HIP for NVIDIA (note that you must have a HIP-compatible CUDA
```
version installed, e.g., CUDA 11.5):
CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
```
To build with native nvcc: (Builds TransferBenchCuda)
```
shell
CUDA_PATH
=
<path_to_CUDA>
HIP_PLATFORM
=
nvidia make
`
```
```
make
Use the following code to build with native NVCC (builds
`TransferBenchCuda`
):
```
shell
make
```
```
## Hints and suggestions
## Things to note
-
Running TransferBench with no arguments will display usage instructions and detected topology information
-
There are several preset configurations that can be used instead of a configuration file
*
Running TransferBench with no arguments displays usage instructions and detected topology
including:
information
-
p2p - Peer to peer benchmark test
*
You can use several preset configurations instead of a configuration file:
-
sweep - Sweep across possible sets of Transfers
*
`p2p`
: Peer-to-peer benchmark test
-
rsweep - Random sweep across possible sets of Transfers
*
`sweep`
: Sweep across possible sets of transfers
-
When using the same GPU executor in multiple simultaneous Transfers, performance may be
*
`rsweep`
: Random sweep across possible sets of transfers
serialized due to the maximum number of hardware queues available.
*
When using the same GPU executor in multiple simultaneous transfers, performance may be
-
The number of maximum hardware queues can be adjusted via GPU_MAX_HW_QUEUES
serialized due to the maximum number of hardware queues available
-
Alternatively, running in single stream mode (USE_SINGLE_STREAM=1) may avoid this issue
*
The number of maximum hardware queues can be adjusted via
`GPU_MAX_HW_QUEUES`
by launching all Transfers on a single stream instead of individual streams
*
Alternatively, running in single-stream mode (
`USE_SINGLE_STREAM`
=1) may avoid this issue
by launching all transfers on a single stream, rather than on individual streams
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment