-
Weile authored
- Instantiates optimized kernels for common Transfer types: - Copy (1 src → 1 dst): Optimized single-source data copy - Read-only (1 src → 0 dst): Optimized memory read validation - Write-only (0 src → 1 dst): Optimized memory write/initialization - Compiler eliminates dead code loops for these specialized cases, improving performance by up to 7% for all-to-all workloads on MI3xx machines - Update CHANGELOG44140eeb
To find the state of this project's repository at the time of any of these versions, check out the tags..