- 07 Feb, 2025 5 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 06 Feb, 2025 2 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 05 Feb, 2025 4 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Illia Silin authored
-
Andriy Roshchenko authored
-
- 04 Feb, 2025 9 commits
-
-
Illia Silin authored
* enable batched_gemm_softmax_gemm_perm_wmma for gfx12 * disable instances with blocksize=256 in attention examples * debuggging * debug * fixed lds_enabled * debugging * Fix and add limit to skiplds feature * Enable skipLds feature and fix compilation bugs * add ck_tile definitions for gfx12 * fix clang format and test/wmma_op * updage instances cmake for gfx12 * disable the test_wmma_op on gfx12 * fix the builds for gfx950 * add gfx12 and gfx950 to default target list * clean-up cmake file * Initial introduction of OFP8 data types. * Renamed FP8 and BF8 tests into FP8_FNUZ and BF8_FNUZ. * Implementation of ConvertFP32Nearest in test_fp8_ocp. * Remove dependence on possibly undeclared alias. * Implement FP8OCP test for stochastic rounding mode. * Implement FP8OCP tests for half_t type conversions. * enable bf16 atomic add on gfx950 * Implement ConvertFP32Nearest test. * Implement ConvertFP32Stochastic test. ...
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Bartłomiej Kocot authored
* Add Grouped Convolution docs * Add gemm docs * Update docs * fix
-
Bartłomiej Kocot authored
-
Bartłomiej Kocot authored
* Fix pk_int4 cast and add pk_int4 dtype in ck tile * fixes * Improvements * fix typo
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 03 Feb, 2025 2 commits
-
-
Andriy Roshchenko authored
-
Illia Silin authored
Merge gfx950 changes into develop branch.
-
- 01 Feb, 2025 7 commits
-
-
Ben Richard authored
* Honor BUILD_SHARED_LIBS * Add .so versioning when building shared libraries
-
illsilin authored
-
Illia Silin authored
Add FP6/BF6 vector type support
-
Illia Silin authored
Fix gemm gemm on gfx950
-
jefyang1 authored
-
jefyang1 authored
-
Andriy Roshchenko authored
-
- 31 Jan, 2025 11 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Andriy Roshchenko authored
-
Rostyslav Geyyer authored
-
Andriy Roshchenko authored
Test the functionality of V_MFMA_F32_16X16X128_F8F6F4 and V_MFMA_F32_32X32X64_F8F6F4 instructions. (#293) * Introduced MFMA tests * Verified f8f6f4 MFMA Instructions
-
Rostyslav Geyyer authored
-
arai713 authored
* updating codegen build for MIOpen access: adding .cmake for codegen component * updating CMake * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen * some more header guards * putting env file in header guard * cleaning up some includes * updated types file for hiprtc purposes * fixed types file: bit-wise/memcpy issue * updating multiple utility files to deal with standard header inclusion for hiprtc * added some more header guards in the utility files, replacing some standard header functionality * added some more header guards * fixing some conflicts in utility files, another round of header guards * fixing errors in data type file * resolved conflict errors in a few utility files * added header guards/replicated functionality in device files * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd * resolved issues with standard ...
-
Illia Silin authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-