Merge pull request #62 from LisaDelaney/readme-updates

changelog updates

Merge pull request #62 from LisaDelaney/readme-updates
changelog updates
dda6ebe5 · Lisa · GitHub · 004710fb · 97b5e7fc · dda6ebe5
Unverified Commit dda6ebe5 authored Nov 08, 2023 by Lisa Committed by GitHub Nov 08, 2023
Expand all Show whitespace changes
Inline Side-by-side

Showing with 404 additions and 250 deletions

CHANGELOG.md CHANGELOG.md +354 -213

README.md README.md +50 -37

No files found.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/README.md
+++ b/README.md
 # TransferBench
-TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs).
+TransferBench is a simple utility capable of benchmarking simultaneous copies between user-specified
+CPU and GPU devices.
+Documentation for TransferBench is available at
+[https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html](https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html).
 ## Requirements
-1. ROCm stack installed on the system (HIP runtime)
+* You must have a ROCm stack installed on your system (HIP runtime)
-2. libnuma installed on system
+* You must have `libnuma` installed on your system
 ## Documentation
-Run the steps below to build documentation locally.
+To build documentation locally, use the following code:
-```
+```shell
 cd docs
 pip3 install -r .sphinx/requirements.txt
@@ -19,45 +23,54 @@ pip3 install -r .sphinx/requirements.txt
 python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
 ```
-## Building
+## Building TransferBench
-  To build TransferBench using Makefile:
+You can build TransferBench using Makefile or CMake.
+* Makefile:
  ```shell
- $ make
+  make
  ```
-  To build TransferBench using cmake:
+* CMake:
  ```shell
-$ mkdir build
+  mkdir build
-$ cd build
+  cd build
-$ CXX=/opt/rocm/bin/hipcc cmake ..
+  CXX=/opt/rocm/bin/hipcc cmake ..
-$ make
+  make
  ```
-  If ROCm is installed in a folder other than `/opt/rocm/`, set ROCM_PATH appropriately
+  If ROCm is not installed in `/opt/rocm/`, you must set `ROCM_PATH` to the correct location.
 ## NVIDIA platform support
-TransferBench may also be built to run on NVIDIA platforms either via HIP, or native nvcc
+You can build TransferBench to run on NVIDIA platforms via HIP or native NVCC.
-To build with HIP for NVIDIA (requires HIP-compatible CUDA version installed e.g. CUDA 11.5):
+Use the following code to build with HIP for NVIDIA (note that you must have a HIP-compatible CUDA
-```
+version installed, e.g., CUDA 11.5):
-   CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
-```
-To build with native nvcc: (Builds TransferBenchCuda)
+```shell
+CUDA_PATH=<path_to_CUDA> HIP_PLATFORM=nvidia make`
 ```
-   make
+Use the following code to build with native NVCC (builds `TransferBenchCuda`):
+```shell
+make
 ```
-## Hints and suggestions
+## Things to note
- Running TransferBench with no arguments will display usage instructions and detected topology information
- There are several preset configurations that can be used instead of a configuration file
+* Running TransferBench with no arguments displays usage instructions and detected topology
-  including:
+  information
-  - p2p    - Peer to peer benchmark test
+* You can use several preset configurations instead of a configuration file:
-  - sweep  - Sweep across possible sets of Transfers
+  * `p2p`: Peer-to-peer benchmark test
-  - rsweep - Random sweep across possible sets of Transfers
+  * `sweep`: Sweep across possible sets of transfers
- When using the same GPU executor in multiple simultaneous Transfers, performance may be
+  * `rsweep`: Random sweep across possible sets of transfers
-  serialized due to the maximum number of hardware queues available.
+* When using the same GPU executor in multiple simultaneous transfers, performance may be
-  - The number of maximum hardware queues can be adjusted via GPU_MAX_HW_QUEUES
+  serialized due to the maximum number of hardware queues available
-  - Alternatively, running in single stream mode (USE_SINGLE_STREAM=1) may avoid this issue
+  * The number of maximum hardware queues can be adjusted via `GPU_MAX_HW_QUEUES`
-    by launching all Transfers on a single stream instead of individual streams
+  * Alternatively, running in single-stream mode (`USE_SINGLE_STREAM`=1) may avoid this issue
+    by launching all transfers on a single stream, rather than on individual streams