Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
one
TransferBench
Commits
dda6ebe5
Unverified
Commit
dda6ebe5
authored
Nov 08, 2023
by
Lisa
Committed by
GitHub
Nov 08, 2023
Browse files
Merge pull request #62 from LisaDelaney/readme-updates
changelog updates
parents
004710fb
97b5e7fc
Changes
2
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
404 additions
and
250 deletions
+404
-250
CHANGELOG.md
CHANGELOG.md
+354
-213
README.md
README.md
+50
-37
No files found.
CHANGELOG.md
View file @
dda6ebe5
This diff is collapsed.
Click to expand it.
README.md
View file @
dda6ebe5
# TransferBench
# TransferBench
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs).
TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
CPU and GPU devices.
Documentation for TransferBench is available at
[
https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html
](
https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html
)
.
## Requirements
## Requirements
1.
ROCm stack installed on
the
system (HIP runtime)
*
You must have a
ROCm stack installed on
your
system (HIP runtime)
2.
libnuma installed on system
*
You must have
`
libnuma
`
installed on
your
system
## Documentation
## Documentation
Run the steps below t
o build documentation locally
.
T
o build documentation locally
, use the following code:
```
```
shell
cd
docs
cd
docs
pip3
install
-r
.sphinx/requirements.txt
pip3
install
-r
.sphinx/requirements.txt
...
@@ -19,45 +23,54 @@ pip3 install -r .sphinx/requirements.txt
...
@@ -19,45 +23,54 @@ pip3 install -r .sphinx/requirements.txt
python3
-m
sphinx
-T
-E
-b
html
-d
_build/doctrees
-D
language
=
en
.
_build/html
python3
-m
sphinx
-T
-E
-b
html
-d
_build/doctrees
-D
language
=
en
.
_build/html
```
```
## Building
## Building TransferBench
To build TransferBench using Makefile:
```
shell
You can build TransferBench using Makefile or CMake.
$
make
```
*
Makefile:
```
shell
make
```
To build TransferBench using cmake:
*
CMake:
```
shell
$
mkdir
build
$
cd
build
$ CXX
=
/opt/rocm/bin/hipcc cmake ..
$
make
```
If ROCm is installed in a folder other than
`/opt/rocm/`
, set ROCM_PATH appropriately
```
shell
mkdir
build
cd
build
CXX
=
/opt/rocm/bin/hipcc cmake ..
make
```
If ROCm is not installed in
`/opt/rocm/`
, you must set
`ROCM_PATH`
to the correct location.
## NVIDIA platform support
## NVIDIA platform support
TransferBench may also be built
to run on NVIDIA platforms
either
via HIP
,
or native
nvcc
You can build TransferBench
to run on NVIDIA platforms via HIP or native
NVCC.
To build with HIP for NVIDIA (requires HIP-compatible CUDA version installed e.g. CUDA 11.5):
Use the following code to build with HIP for NVIDIA (note that you must have a HIP-compatible CUDA
```
version installed, e.g., CUDA 11.5):
CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
```
To build with native nvcc: (Builds TransferBenchCuda)
```
shell
CUDA_PATH
=
<path_to_CUDA>
HIP_PLATFORM
=
nvidia make
`
```
```
make
Use the following code to build with native NVCC (builds
`TransferBenchCuda`
):
```
shell
make
```
```
## Hints and suggestions
## Things to note
-
Running TransferBench with no arguments will display usage instructions and detected topology information
-
There are several preset configurations that can be used instead of a configuration file
*
Running TransferBench with no arguments displays usage instructions and detected topology
including:
information
-
p2p - Peer to peer benchmark test
*
You can use several preset configurations instead of a configuration file:
-
sweep - Sweep across possible sets of Transfers
*
`p2p`
: Peer-to-peer benchmark test
-
rsweep - Random sweep across possible sets of Transfers
*
`sweep`
: Sweep across possible sets of transfers
-
When using the same GPU executor in multiple simultaneous Transfers, performance may be
*
`rsweep`
: Random sweep across possible sets of transfers
serialized due to the maximum number of hardware queues available.
*
When using the same GPU executor in multiple simultaneous transfers, performance may be
-
The number of maximum hardware queues can be adjusted via GPU_MAX_HW_QUEUES
serialized due to the maximum number of hardware queues available
-
Alternatively, running in single stream mode (USE_SINGLE_STREAM=1) may avoid this issue
*
The number of maximum hardware queues can be adjusted via
`GPU_MAX_HW_QUEUES`
by launching all Transfers on a single stream instead of individual streams
*
Alternatively, running in single-stream mode (
`USE_SINGLE_STREAM`
=1) may avoid this issue
by launching all transfers on a single stream, rather than on individual streams
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment