Unverified Commit 4e2be38c authored by gilbertlee-amd's avatar gilbertlee-amd Committed by GitHub
Browse files

Update rocm-rel-6.4 to use TransferBench v1.62.00 (#183)

* updating metadata

* v1.58.00 Fixing DMA copy-on-engine (#152)

* Leo's review

* Update use-transferbench.rst

* Update Doxyfile

* Refining API library

* Update TransferBench.hpp

* Update TransferBench.hpp

* Update TransferBench.hpp

* Bump rocm-docs-core from 1.9.2 to 1.11.0 in /docs/sphinx (#153)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.9.2 to 1.11.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.2...v1.11.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core from 1.11.0 to 1.12.0 in /docs/sphinx (#155)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.11.0...v1.12.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* External CI: enable CI triggers

* Apply suggestions from code review
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Update LICENSE.md

* Bump rocm-docs-core from 1.12.0 to 1.13.0 in /docs/sphinx (#160)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.0...v1.13.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* TransferBench V1.59 (#162)

Adding NIC execution capabilities, various bug fixes introduced by header-only-library refactor
---------
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Adding ability to specify A2A_MODE=numSrcs:numDsts (#164)
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Fixing specific DMA engine transfers, enabling GFX_SINGLE_TEAM=1 by default (#166)

* Bump rocm-docs-core from 1.13.0 to 1.15.0 in /docs/sphinx (#165)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.15.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.13.0...v1.15.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core from 1.15.0 to 1.17.0 in /docs/sphinx (#171)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.15.0 to 1.17.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.15.0...v1.17.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* TransferBench v1.61 (#174)
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Bump rocm-docs-core from 1.17.0 to 1.18.1 in /docs/sphinx (#176)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.18.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.18.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core from 1.18.1 to 1.18.2 in /docs/sphinx (#177)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.1 to 1.18.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.1...v1.18.2

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.18.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* TransferBench v1.62.00 (#181)

* Adding non-temporal loads and stores via GFX_TEMPORAL
* Adding additional summary details to a2a preset
* Add SHOW_MIN_ONLY for a2asweep preset
* Adding new P CPU memory type which is indexed by closest GPU

* Bump rocm-docs-core from 1.18.2 to 1.20.0 in /docs/sphinx (#180)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.2 to 1.20.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.2...v1.20.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatarsrawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: default avatarDaniel Su <danielsu@amd.com>
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>
parent 3ea2f226
......@@ -24,6 +24,8 @@ THE SOFTWARE.
// Included after EnvVars and Executors
#include "AllToAll.hpp"
#include "AllToAllN.hpp"
#include "AllToAllSweep.hpp"
#include "HealthCheck.hpp"
#include "OneToAll.hpp"
#include "PeerToPeer.hpp"
......@@ -38,14 +40,16 @@ typedef void (*PresetFunc)(EnvVars& ev,
std::map<std::string, std::pair<PresetFunc, std::string>> presetFuncMap =
{
{"a2a", {AllToAllPreset, "Tests parallel transfers between all pairs of GPU devices"}},
{"healthcheck", {HealthCheckPreset,"Simple bandwidth health check (MI300X series only)"}},
{"one2all", {OneToAllPreset, "Test all subsets of parallel transfers from one GPU to all others"}},
{"p2p" , {PeerToPeerPreset, "Peer-to-peer device memory bandwidth test"}},
{"rsweep", {SweepPreset, "Randomly sweep through sets of Transfers"}},
{"scaling", {ScalingPreset, "Run scaling test from one GPU to other devices"}},
{"schmoo", {SchmooPreset, "Scaling tests for local/remote read/write/copy"}},
{"sweep", {SweepPreset, "Ordered sweep through sets of Transfers"}},
{"a2a", {AllToAllPreset, "Tests parallel transfers between all pairs of GPU devices"}},
{"a2a_n", {AllToAllRdmaPreset, "Tests parallel transfers between all pairs of GPU devices using Nearest NIC RDMA transfers"}},
{"a2asweep", {AllToAllSweepPreset, "Test GFX-based all-to-all transfers swept across different CU and GFX unroll counts"}},
{"healthcheck", {HealthCheckPreset, "Simple bandwidth health check (MI300X series only)"}},
{"one2all", {OneToAllPreset, "Test all subsets of parallel transfers from one GPU to all others"}},
{"p2p" , {PeerToPeerPreset, " Peer-to-peer device memory bandwidth test"}},
{"rsweep", {SweepPreset, "Randomly sweep through sets of Transfers"}},
{"scaling", {ScalingPreset, "Run scaling test from one GPU to other devices"}},
{"schmoo", {SchmooPreset, "Scaling tests for local/remote read/write/copy"}},
{"sweep", {SweepPreset, "Ordered sweep through sets of Transfers"}},
};
void DisplayPresets()
......
......@@ -22,19 +22,21 @@ THE SOFTWARE.
void LogTransfers(FILE *fp, int const testNum, std::vector<Transfer> const& transfers)
{
fprintf(fp, "# Test %d\n", testNum);
fprintf(fp, "%d", -1 * (int)transfers.size());
for (auto const& transfer : transfers)
{
fprintf(fp, " (%s->%c%d->%s %d %lu)",
MemDevicesToStr(transfer.srcs).c_str(),
ExeTypeStr[transfer.exeDevice.exeType], transfer.exeDevice.exeIndex,
MemDevicesToStr(transfer.dsts).c_str(),
transfer.numSubExecs,
transfer.numBytes);
if (fp) {
fprintf(fp, "# Test %d\n", testNum);
fprintf(fp, "%d", -1 * (int)transfers.size());
for (auto const& transfer : transfers)
{
fprintf(fp, " (%s->%c%d->%s %d %lu)",
MemDevicesToStr(transfer.srcs).c_str(),
ExeTypeStr[transfer.exeDevice.exeType], transfer.exeDevice.exeIndex,
MemDevicesToStr(transfer.dsts).c_str(),
transfer.numSubExecs,
transfer.numBytes);
}
fprintf(fp, "\n");
fflush(fp);
}
fprintf(fp, "\n");
fflush(fp);
}
void SweepPreset(EnvVars& ev,
......@@ -54,6 +56,7 @@ void SweepPreset(EnvVars& ev,
int numGpuSubExecs = EnvVars::GetEnvVar("NUM_GPU_SE" , 4);
std::string sweepDst = EnvVars::GetEnvVar("SWEEP_DST" , "CG");
std::string sweepExe = EnvVars::GetEnvVar("SWEEP_EXE" , "CDG");
std::string sweepFile = EnvVars::GetEnvVar("SWEEP_FILE" , "/tmp/lastSweep.cfg");
int sweepMax = EnvVars::GetEnvVar("SWEEP_MAX" , 24);
int sweepMin = EnvVars::GetEnvVar("SWEEP_MIN" , 1);
int sweepRandBytes = EnvVars::GetEnvVar("SWEEP_RAND_BYTES" , 0);
......@@ -78,6 +81,7 @@ void SweepPreset(EnvVars& ev,
ev.Print("NUM_GPU_SE", numGpuSubExecs, "Using %d subExecutors/CUs per GPU executed Transfer", numGpuSubExecs);
ev.Print("SWEEP_DST", sweepDst.c_str(), "Destination Memory Types to sweep");
ev.Print("SWEEP_EXE", sweepExe.c_str(), "Executor Types to sweep");
ev.Print("SWEEP_FILE", sweepFile.c_str(),"File to store the executing sweep configuration");
ev.Print("SWEEP_MAX", sweepMax, "Max simultaneous transfers (0 = no limit)");
ev.Print("SWEEP_MIN", sweepMin, "Min simultaenous transfers");
ev.Print("SWEEP_RAND_BYTES", sweepRandBytes, "Using %s number of bytes per Transfer", (sweepRandBytes ? "random" : "constant"));
......@@ -283,10 +287,14 @@ void SweepPreset(EnvVars& ev,
std::uniform_int_distribution<int> distribution(sweepMin, maxParallelTransfers);
// Log sweep to configuration file
FILE *fp = fopen("lastSweep.cfg", "w");
char absPath[1024];
auto const res = realpath(sweepFile.c_str(), absPath);
FILE *fp = fopen(sweepFile.c_str(), "w");
if (!fp) {
printf("[ERROR] Unable to open lastSweep.cfg. Check permissions\n");
exit(1);
printf("[WARN] Unable to open %s. Skipping output of sweep configuration file\n", res ? absPath : sweepFile.c_str());
} else {
printf("Sweep configuration saved to: %s\n", res ? absPath : sweepFile.c_str());
}
// Create bitmask of numPossible triplets, of which M will be chosen
......@@ -333,7 +341,7 @@ void SweepPreset(EnvVars& ev,
// Check for test limit
if (numTestsRun == sweepTestLimit) {
printf("Test limit reached\n");
printf("Sweep Test limit reached\n");
break;
}
......@@ -341,7 +349,7 @@ void SweepPreset(EnvVars& ev,
auto cpuDelta = std::chrono::high_resolution_clock::now() - cpuStart;
double totalCpuTime = std::chrono::duration_cast<std::chrono::duration<double>>(cpuDelta).count();
if (sweepTimeLimit && totalCpuTime > sweepTimeLimit) {
printf("Time limit exceeded\n");
printf("Sweep Time limit exceeded\n");
break;
}
......@@ -357,5 +365,5 @@ void SweepPreset(EnvVars& ev,
bitmask[i] = (i < M) ? 1 : 0;
}
}
fclose(fp);
if (fp) fclose(fp);
}
......@@ -38,21 +38,56 @@ static int RemappedCpuIndex(int origIdx)
return remappingCpu[origIdx];
}
static void PrintNicToGPUTopo(bool outputToCsv)
{
#ifdef NIC_EXEC_ENABLED
printf(" NIC | Device Name | Active | PCIe Bus ID | NUMA | Closest GPU(s) | GID Index | GID Descriptor\n");
if(!outputToCsv)
printf("-----+-------------+--------+--------------+------+----------------+-----------+-------------------\n");
int numGpus = TransferBench::GetNumExecutors(EXE_GPU_GFX);
auto const& ibvDeviceList = GetIbvDeviceList();
for (int i = 0; i < ibvDeviceList.size(); i++) {
std::string closestGpusStr = "";
for (int j = 0; j < numGpus; j++) {
if (TransferBench::GetClosestNicToGpu(j) == i) {
if (closestGpusStr != "") closestGpusStr += ",";
closestGpusStr += std::to_string(j);
}
}
printf(" %-3d | %-11s | %-6s | %-12s | %-4d | %-14s | %-9s | %-20s\n",
i, ibvDeviceList[i].name.c_str(),
ibvDeviceList[i].hasActivePort ? "Yes" : "No",
ibvDeviceList[i].busId.c_str(),
ibvDeviceList[i].numaNode,
closestGpusStr.c_str(),
ibvDeviceList[i].isRoce && ibvDeviceList[i].hasActivePort? std::to_string(ibvDeviceList[i].gidIndex).c_str() : "N/A",
ibvDeviceList[i].isRoce && ibvDeviceList[i].hasActivePort? ibvDeviceList[i].gidDescriptor.c_str() : "N/A"
);
}
printf("\n");
#endif
}
void DisplayTopology(bool outputToCsv)
{
int numCpus = TransferBench::GetNumExecutors(EXE_CPU);
int numGpus = TransferBench::GetNumExecutors(EXE_GPU_GFX);
int numNics = TransferBench::GetNumExecutors(EXE_NIC);
char sep = (outputToCsv ? ',' : '|');
if (outputToCsv) {
printf("NumCpus,%d\n", numCpus);
printf("NumGpus,%d\n", numGpus);
printf("NumNics,%d\n", numNics);
} else {
printf("\nDetected Topology:\n");
printf("==================\n");
printf(" %d configured CPU NUMA node(s) [%d total]\n", numCpus, numa_max_node() + 1);
printf(" %d GPU device(s)\n", numGpus);
printf(" %d Supported NIC device(s)\n", numNics);
}
// Print out detected CPU topology
......@@ -91,8 +126,10 @@ void DisplayTopology(bool outputToCsv)
}
printf("\n");
// Print out detected GPU topology
// Print out detected NIC topology
PrintNicToGPUTopo(outputToCsv);
// Print out detected GPU topology
#if defined(__NVCC__)
for (int i = 0; i < numGpus; i++) {
hipDeviceProp_t prop;
......@@ -118,12 +155,12 @@ void DisplayTopology(bool outputToCsv)
printf(" %c", sep);
for (int j = 0; j < numGpus; j++)
printf(" GPU %02d %c", j, sep);
printf(" PCIe Bus ID %c #CUs %c NUMA %c #DMA %c #XCC\n", sep, sep, sep, sep);
printf(" PCIe Bus ID %c #CUs %c NUMA %c #DMA %c #XCC %c NIC\n", sep, sep, sep, sep, sep);
if (!outputToCsv) {
for (int j = 0; j <= numGpus; j++)
printf("--------+");
printf("--------------+------+------+------+------\n");
printf("--------------+------+------+------+------+------\n");
}
// Loop over each GPU device
......@@ -149,12 +186,13 @@ void DisplayTopology(bool outputToCsv)
char pciBusId[20];
HIP_CALL(hipDeviceGetPCIBusId(pciBusId, 20, i));
printf(" %11s %c %4d %c %4d %c %4d %c %4d\n",
printf(" %-11s %c %-4d %c %-4d %c %-4d %c %-4d %c %-4d\n",
pciBusId, sep,
TransferBench::GetNumSubExecutors({EXE_GPU_GFX, i}), sep,
TransferBench::GetClosestCpuNumaToGpu(i), sep,
TransferBench::GetNumExecutorSubIndices({EXE_GPU_DMA, i}), sep,
TransferBench::GetNumExecutorSubIndices({EXE_GPU_GFX, i}));
TransferBench::GetNumExecutorSubIndices({EXE_GPU_GFX, i}), sep,
TransferBench::GetClosestNicToGpu(i));
}
#endif
}
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment