"test/ops/cppOps.cpp" did not exist on "44be91d3d15b485aed091f920e863545a8765489"
Unverified Commit 4e2be38c authored by gilbertlee-amd's avatar gilbertlee-amd Committed by GitHub
Browse files

Update rocm-rel-6.4 to use TransferBench v1.62.00 (#183)

* updating metadata

* v1.58.00 Fixing DMA copy-on-engine (#152)

* Leo's review

* Update use-transferbench.rst

* Update Doxyfile

* Refining API library

* Update TransferBench.hpp

* Update TransferBench.hpp

* Update TransferBench.hpp

* Bump rocm-docs-core from 1.9.2 to 1.11.0 in /docs/sphinx (#153)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.9.2 to 1.11.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.2...v1.11.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core from 1.11.0 to 1.12.0 in /docs/sphinx (#155)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.11.0...v1.12.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* External CI: enable CI triggers

* Apply suggestions from code review
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Update LICENSE.md

* Bump rocm-docs-core from 1.12.0 to 1.13.0 in /docs/sphinx (#160)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.0...v1.13.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* TransferBench V1.59 (#162)

Adding NIC execution capabilities, various bug fixes introduced by header-only-library refactor
---------
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Adding ability to specify A2A_MODE=numSrcs:numDsts (#164)
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Fixing specific DMA engine transfers, enabling GFX_SINGLE_TEAM=1 by default (#166)

* Bump rocm-docs-core from 1.13.0 to 1.15.0 in /docs/sphinx (#165)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.15.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.13.0...v1.15.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core from 1.15.0 to 1.17.0 in /docs/sphinx (#171)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.15.0 to 1.17.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.15.0...v1.17.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* TransferBench v1.61 (#174)
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>

* Bump rocm-docs-core from 1.17.0 to 1.18.1 in /docs/sphinx (#176)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.18.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.18.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core from 1.18.1 to 1.18.2 in /docs/sphinx (#177)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.1 to 1.18.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.1...v1.18.2

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.18.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* TransferBench v1.62.00 (#181)

* Adding non-temporal loads and stores via GFX_TEMPORAL
* Adding additional summary details to a2a preset
* Add SHOW_MIN_ONLY for a2asweep preset
* Adding new P CPU memory type which is indexed by closest GPU

* Bump rocm-docs-core from 1.18.2 to 1.20.0 in /docs/sphinx (#180)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.2 to 1.20.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.2...v1.20.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------
Signed-off-by: default avatardependabot[bot] <support@github.com>
Co-authored-by: default avatarsrawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: default avatarDaniel Su <danielsu@amd.com>
Co-authored-by: default avatarMustafa Abduljabbar <mustafa.abduljabbar@amd.com>
parent 3ea2f226
...@@ -24,6 +24,8 @@ THE SOFTWARE. ...@@ -24,6 +24,8 @@ THE SOFTWARE.
// Included after EnvVars and Executors // Included after EnvVars and Executors
#include "AllToAll.hpp" #include "AllToAll.hpp"
#include "AllToAllN.hpp"
#include "AllToAllSweep.hpp"
#include "HealthCheck.hpp" #include "HealthCheck.hpp"
#include "OneToAll.hpp" #include "OneToAll.hpp"
#include "PeerToPeer.hpp" #include "PeerToPeer.hpp"
...@@ -38,14 +40,16 @@ typedef void (*PresetFunc)(EnvVars& ev, ...@@ -38,14 +40,16 @@ typedef void (*PresetFunc)(EnvVars& ev,
std::map<std::string, std::pair<PresetFunc, std::string>> presetFuncMap = std::map<std::string, std::pair<PresetFunc, std::string>> presetFuncMap =
{ {
{"a2a", {AllToAllPreset, "Tests parallel transfers between all pairs of GPU devices"}}, {"a2a", {AllToAllPreset, "Tests parallel transfers between all pairs of GPU devices"}},
{"healthcheck", {HealthCheckPreset,"Simple bandwidth health check (MI300X series only)"}}, {"a2a_n", {AllToAllRdmaPreset, "Tests parallel transfers between all pairs of GPU devices using Nearest NIC RDMA transfers"}},
{"one2all", {OneToAllPreset, "Test all subsets of parallel transfers from one GPU to all others"}}, {"a2asweep", {AllToAllSweepPreset, "Test GFX-based all-to-all transfers swept across different CU and GFX unroll counts"}},
{"p2p" , {PeerToPeerPreset, "Peer-to-peer device memory bandwidth test"}}, {"healthcheck", {HealthCheckPreset, "Simple bandwidth health check (MI300X series only)"}},
{"rsweep", {SweepPreset, "Randomly sweep through sets of Transfers"}}, {"one2all", {OneToAllPreset, "Test all subsets of parallel transfers from one GPU to all others"}},
{"scaling", {ScalingPreset, "Run scaling test from one GPU to other devices"}}, {"p2p" , {PeerToPeerPreset, " Peer-to-peer device memory bandwidth test"}},
{"schmoo", {SchmooPreset, "Scaling tests for local/remote read/write/copy"}}, {"rsweep", {SweepPreset, "Randomly sweep through sets of Transfers"}},
{"sweep", {SweepPreset, "Ordered sweep through sets of Transfers"}}, {"scaling", {ScalingPreset, "Run scaling test from one GPU to other devices"}},
{"schmoo", {SchmooPreset, "Scaling tests for local/remote read/write/copy"}},
{"sweep", {SweepPreset, "Ordered sweep through sets of Transfers"}},
}; };
void DisplayPresets() void DisplayPresets()
......
...@@ -22,19 +22,21 @@ THE SOFTWARE. ...@@ -22,19 +22,21 @@ THE SOFTWARE.
void LogTransfers(FILE *fp, int const testNum, std::vector<Transfer> const& transfers) void LogTransfers(FILE *fp, int const testNum, std::vector<Transfer> const& transfers)
{ {
fprintf(fp, "# Test %d\n", testNum); if (fp) {
fprintf(fp, "%d", -1 * (int)transfers.size()); fprintf(fp, "# Test %d\n", testNum);
for (auto const& transfer : transfers) fprintf(fp, "%d", -1 * (int)transfers.size());
{ for (auto const& transfer : transfers)
fprintf(fp, " (%s->%c%d->%s %d %lu)", {
MemDevicesToStr(transfer.srcs).c_str(), fprintf(fp, " (%s->%c%d->%s %d %lu)",
ExeTypeStr[transfer.exeDevice.exeType], transfer.exeDevice.exeIndex, MemDevicesToStr(transfer.srcs).c_str(),
MemDevicesToStr(transfer.dsts).c_str(), ExeTypeStr[transfer.exeDevice.exeType], transfer.exeDevice.exeIndex,
transfer.numSubExecs, MemDevicesToStr(transfer.dsts).c_str(),
transfer.numBytes); transfer.numSubExecs,
transfer.numBytes);
}
fprintf(fp, "\n");
fflush(fp);
} }
fprintf(fp, "\n");
fflush(fp);
} }
void SweepPreset(EnvVars& ev, void SweepPreset(EnvVars& ev,
...@@ -54,6 +56,7 @@ void SweepPreset(EnvVars& ev, ...@@ -54,6 +56,7 @@ void SweepPreset(EnvVars& ev,
int numGpuSubExecs = EnvVars::GetEnvVar("NUM_GPU_SE" , 4); int numGpuSubExecs = EnvVars::GetEnvVar("NUM_GPU_SE" , 4);
std::string sweepDst = EnvVars::GetEnvVar("SWEEP_DST" , "CG"); std::string sweepDst = EnvVars::GetEnvVar("SWEEP_DST" , "CG");
std::string sweepExe = EnvVars::GetEnvVar("SWEEP_EXE" , "CDG"); std::string sweepExe = EnvVars::GetEnvVar("SWEEP_EXE" , "CDG");
std::string sweepFile = EnvVars::GetEnvVar("SWEEP_FILE" , "/tmp/lastSweep.cfg");
int sweepMax = EnvVars::GetEnvVar("SWEEP_MAX" , 24); int sweepMax = EnvVars::GetEnvVar("SWEEP_MAX" , 24);
int sweepMin = EnvVars::GetEnvVar("SWEEP_MIN" , 1); int sweepMin = EnvVars::GetEnvVar("SWEEP_MIN" , 1);
int sweepRandBytes = EnvVars::GetEnvVar("SWEEP_RAND_BYTES" , 0); int sweepRandBytes = EnvVars::GetEnvVar("SWEEP_RAND_BYTES" , 0);
...@@ -78,6 +81,7 @@ void SweepPreset(EnvVars& ev, ...@@ -78,6 +81,7 @@ void SweepPreset(EnvVars& ev,
ev.Print("NUM_GPU_SE", numGpuSubExecs, "Using %d subExecutors/CUs per GPU executed Transfer", numGpuSubExecs); ev.Print("NUM_GPU_SE", numGpuSubExecs, "Using %d subExecutors/CUs per GPU executed Transfer", numGpuSubExecs);
ev.Print("SWEEP_DST", sweepDst.c_str(), "Destination Memory Types to sweep"); ev.Print("SWEEP_DST", sweepDst.c_str(), "Destination Memory Types to sweep");
ev.Print("SWEEP_EXE", sweepExe.c_str(), "Executor Types to sweep"); ev.Print("SWEEP_EXE", sweepExe.c_str(), "Executor Types to sweep");
ev.Print("SWEEP_FILE", sweepFile.c_str(),"File to store the executing sweep configuration");
ev.Print("SWEEP_MAX", sweepMax, "Max simultaneous transfers (0 = no limit)"); ev.Print("SWEEP_MAX", sweepMax, "Max simultaneous transfers (0 = no limit)");
ev.Print("SWEEP_MIN", sweepMin, "Min simultaenous transfers"); ev.Print("SWEEP_MIN", sweepMin, "Min simultaenous transfers");
ev.Print("SWEEP_RAND_BYTES", sweepRandBytes, "Using %s number of bytes per Transfer", (sweepRandBytes ? "random" : "constant")); ev.Print("SWEEP_RAND_BYTES", sweepRandBytes, "Using %s number of bytes per Transfer", (sweepRandBytes ? "random" : "constant"));
...@@ -283,10 +287,14 @@ void SweepPreset(EnvVars& ev, ...@@ -283,10 +287,14 @@ void SweepPreset(EnvVars& ev,
std::uniform_int_distribution<int> distribution(sweepMin, maxParallelTransfers); std::uniform_int_distribution<int> distribution(sweepMin, maxParallelTransfers);
// Log sweep to configuration file // Log sweep to configuration file
FILE *fp = fopen("lastSweep.cfg", "w"); char absPath[1024];
auto const res = realpath(sweepFile.c_str(), absPath);
FILE *fp = fopen(sweepFile.c_str(), "w");
if (!fp) { if (!fp) {
printf("[ERROR] Unable to open lastSweep.cfg. Check permissions\n"); printf("[WARN] Unable to open %s. Skipping output of sweep configuration file\n", res ? absPath : sweepFile.c_str());
exit(1); } else {
printf("Sweep configuration saved to: %s\n", res ? absPath : sweepFile.c_str());
} }
// Create bitmask of numPossible triplets, of which M will be chosen // Create bitmask of numPossible triplets, of which M will be chosen
...@@ -333,7 +341,7 @@ void SweepPreset(EnvVars& ev, ...@@ -333,7 +341,7 @@ void SweepPreset(EnvVars& ev,
// Check for test limit // Check for test limit
if (numTestsRun == sweepTestLimit) { if (numTestsRun == sweepTestLimit) {
printf("Test limit reached\n"); printf("Sweep Test limit reached\n");
break; break;
} }
...@@ -341,7 +349,7 @@ void SweepPreset(EnvVars& ev, ...@@ -341,7 +349,7 @@ void SweepPreset(EnvVars& ev,
auto cpuDelta = std::chrono::high_resolution_clock::now() - cpuStart; auto cpuDelta = std::chrono::high_resolution_clock::now() - cpuStart;
double totalCpuTime = std::chrono::duration_cast<std::chrono::duration<double>>(cpuDelta).count(); double totalCpuTime = std::chrono::duration_cast<std::chrono::duration<double>>(cpuDelta).count();
if (sweepTimeLimit && totalCpuTime > sweepTimeLimit) { if (sweepTimeLimit && totalCpuTime > sweepTimeLimit) {
printf("Time limit exceeded\n"); printf("Sweep Time limit exceeded\n");
break; break;
} }
...@@ -357,5 +365,5 @@ void SweepPreset(EnvVars& ev, ...@@ -357,5 +365,5 @@ void SweepPreset(EnvVars& ev,
bitmask[i] = (i < M) ? 1 : 0; bitmask[i] = (i < M) ? 1 : 0;
} }
} }
fclose(fp); if (fp) fclose(fp);
} }
...@@ -38,21 +38,56 @@ static int RemappedCpuIndex(int origIdx) ...@@ -38,21 +38,56 @@ static int RemappedCpuIndex(int origIdx)
return remappingCpu[origIdx]; return remappingCpu[origIdx];
} }
static void PrintNicToGPUTopo(bool outputToCsv)
{
#ifdef NIC_EXEC_ENABLED
printf(" NIC | Device Name | Active | PCIe Bus ID | NUMA | Closest GPU(s) | GID Index | GID Descriptor\n");
if(!outputToCsv)
printf("-----+-------------+--------+--------------+------+----------------+-----------+-------------------\n");
int numGpus = TransferBench::GetNumExecutors(EXE_GPU_GFX);
auto const& ibvDeviceList = GetIbvDeviceList();
for (int i = 0; i < ibvDeviceList.size(); i++) {
std::string closestGpusStr = "";
for (int j = 0; j < numGpus; j++) {
if (TransferBench::GetClosestNicToGpu(j) == i) {
if (closestGpusStr != "") closestGpusStr += ",";
closestGpusStr += std::to_string(j);
}
}
printf(" %-3d | %-11s | %-6s | %-12s | %-4d | %-14s | %-9s | %-20s\n",
i, ibvDeviceList[i].name.c_str(),
ibvDeviceList[i].hasActivePort ? "Yes" : "No",
ibvDeviceList[i].busId.c_str(),
ibvDeviceList[i].numaNode,
closestGpusStr.c_str(),
ibvDeviceList[i].isRoce && ibvDeviceList[i].hasActivePort? std::to_string(ibvDeviceList[i].gidIndex).c_str() : "N/A",
ibvDeviceList[i].isRoce && ibvDeviceList[i].hasActivePort? ibvDeviceList[i].gidDescriptor.c_str() : "N/A"
);
}
printf("\n");
#endif
}
void DisplayTopology(bool outputToCsv) void DisplayTopology(bool outputToCsv)
{ {
int numCpus = TransferBench::GetNumExecutors(EXE_CPU); int numCpus = TransferBench::GetNumExecutors(EXE_CPU);
int numGpus = TransferBench::GetNumExecutors(EXE_GPU_GFX); int numGpus = TransferBench::GetNumExecutors(EXE_GPU_GFX);
int numNics = TransferBench::GetNumExecutors(EXE_NIC);
char sep = (outputToCsv ? ',' : '|'); char sep = (outputToCsv ? ',' : '|');
if (outputToCsv) { if (outputToCsv) {
printf("NumCpus,%d\n", numCpus); printf("NumCpus,%d\n", numCpus);
printf("NumGpus,%d\n", numGpus); printf("NumGpus,%d\n", numGpus);
printf("NumNics,%d\n", numNics);
} else { } else {
printf("\nDetected Topology:\n"); printf("\nDetected Topology:\n");
printf("==================\n"); printf("==================\n");
printf(" %d configured CPU NUMA node(s) [%d total]\n", numCpus, numa_max_node() + 1); printf(" %d configured CPU NUMA node(s) [%d total]\n", numCpus, numa_max_node() + 1);
printf(" %d GPU device(s)\n", numGpus); printf(" %d GPU device(s)\n", numGpus);
printf(" %d Supported NIC device(s)\n", numNics);
} }
// Print out detected CPU topology // Print out detected CPU topology
...@@ -91,8 +126,10 @@ void DisplayTopology(bool outputToCsv) ...@@ -91,8 +126,10 @@ void DisplayTopology(bool outputToCsv)
} }
printf("\n"); printf("\n");
// Print out detected GPU topology // Print out detected NIC topology
PrintNicToGPUTopo(outputToCsv);
// Print out detected GPU topology
#if defined(__NVCC__) #if defined(__NVCC__)
for (int i = 0; i < numGpus; i++) { for (int i = 0; i < numGpus; i++) {
hipDeviceProp_t prop; hipDeviceProp_t prop;
...@@ -118,12 +155,12 @@ void DisplayTopology(bool outputToCsv) ...@@ -118,12 +155,12 @@ void DisplayTopology(bool outputToCsv)
printf(" %c", sep); printf(" %c", sep);
for (int j = 0; j < numGpus; j++) for (int j = 0; j < numGpus; j++)
printf(" GPU %02d %c", j, sep); printf(" GPU %02d %c", j, sep);
printf(" PCIe Bus ID %c #CUs %c NUMA %c #DMA %c #XCC\n", sep, sep, sep, sep); printf(" PCIe Bus ID %c #CUs %c NUMA %c #DMA %c #XCC %c NIC\n", sep, sep, sep, sep, sep);
if (!outputToCsv) { if (!outputToCsv) {
for (int j = 0; j <= numGpus; j++) for (int j = 0; j <= numGpus; j++)
printf("--------+"); printf("--------+");
printf("--------------+------+------+------+------\n"); printf("--------------+------+------+------+------+------\n");
} }
// Loop over each GPU device // Loop over each GPU device
...@@ -149,12 +186,13 @@ void DisplayTopology(bool outputToCsv) ...@@ -149,12 +186,13 @@ void DisplayTopology(bool outputToCsv)
char pciBusId[20]; char pciBusId[20];
HIP_CALL(hipDeviceGetPCIBusId(pciBusId, 20, i)); HIP_CALL(hipDeviceGetPCIBusId(pciBusId, 20, i));
printf(" %11s %c %4d %c %4d %c %4d %c %4d\n", printf(" %-11s %c %-4d %c %-4d %c %-4d %c %-4d %c %-4d\n",
pciBusId, sep, pciBusId, sep,
TransferBench::GetNumSubExecutors({EXE_GPU_GFX, i}), sep, TransferBench::GetNumSubExecutors({EXE_GPU_GFX, i}), sep,
TransferBench::GetClosestCpuNumaToGpu(i), sep, TransferBench::GetClosestCpuNumaToGpu(i), sep,
TransferBench::GetNumExecutorSubIndices({EXE_GPU_DMA, i}), sep, TransferBench::GetNumExecutorSubIndices({EXE_GPU_DMA, i}), sep,
TransferBench::GetNumExecutorSubIndices({EXE_GPU_GFX, i})); TransferBench::GetNumExecutorSubIndices({EXE_GPU_GFX, i}), sep,
TransferBench::GetClosestNicToGpu(i));
} }
#endif #endif
} }
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment