-
turneram authored
* Add initial ck_gemm code * Format * Add additional src files * Format * Add include * Simplify fuse_ck * Format * Rename var * Enable pass * Update ck version * Fix include * Add group stride * Disable warnings for ck headers * Format * Add unpack array * Add interface to enable tuning * Format * Update compile_ops to handle tuning config * Format * Add some comments * Move time_op to migraphx_gpu * Add banchmarking * Refactor * Format * Add lift class macro * Use device name * Format * Generate configs * Format * Pass tuning parameter * Move data type to is_ck_gemm matcher * Format * Add problem_cache to avoid retuning same configs * Format * Format * Mark the problems * Format * Use is_null * Format * Resize vector * Only tune with exaustive tuning * Format * Use assert * FOrmat * Tidy fixes * More tidy fixes * Format * Add license to missing files * Format * Use transform * Format * Fix tidy * Format * Fix cppcheck issues * Format * Add static_assert * Add ops header * Add assertion in batcher * Format * Improve the batch fold check * Format * Add where op workaround for CK * Skip if any input is not a supported ck type * Format * Check batch is standard * Format * Remove redundant static keyword * Update commit hash * Fix error when running without --exhaustive-tune * Formatting * Formatting * Remove fuse_ck_gemm_softmax_gemm * Update ck hash * Correct spelling mistake * Remove commented out logic from fuse_ck * Remove unused include and add comment * Formatting * Remove redundant get_shape and remove ck_gemm from names * Formatting * Allow for mixed types with int8 gemms * Formatting * Add back find_package from merge * Update CK commit hash and add gfx940 to fuse_ops supported archs * Formatting * Update CK hash
b8898d7e