- 04 Sep, 2025 1 commit
-
-
gilbertlee-amd authored
* Added BLOCKSIZES to a2asweep preset to allow sweeping over threadblock sizes * Fixing src initialization when using BYTE_OFFSET * Adding FILL_COMPRESS functionality to allow for different input data patterns * Updating CHANGELOG regarding GFX_BLOCKSIZE limit increase to 1024
-
- 09 Jun, 2025 1 commit
-
-
gilbertlee-amd authored
* Adding non-temporal loads and stores via GFX_TEMPORAL * Adding additional summary details to a2a preset * Add SHOW_MIN_ONLY for a2asweep preset * Adding new P CPU memory type which is indexed by closest GPU
-
- 28 Feb, 2025 1 commit
-
-
gilbertlee-amd authored
Co-authored-by:Mustafa Abduljabbar <mustafa.abduljabbar@amd.com>
-