1. 11 Feb, 2025 1 commit
  2. 10 Feb, 2025 1 commit
    • Mingtao Gu's avatar
      Added Int4 mixed batch gemm support (#1839) · d9f1ead3
      Mingtao Gu authored
      
      
      * remove redundant kernels.
      
      * added batched_gemm_xdl_fp16int4_b_scale_v3
      
      * Enabled the split K.
      
      * added the batched_gemm_b_scale ckProfiler, meet function issue
      
      * fix some typo
      
      * fix ckProfiler build issue
      
      * fix some bugs
      
      * updated some debug info
      
      * comment some code
      
      * Fix
      
      * fixed some bugs and refactor the code
      
      * fixed a function bug.
      
      * formatted files.
      
      * formatted
      
      * uncommented the ckProfiler CMakeLists
      
      * fixed.
      
      * fix ckProfiler for batched_gemm_b_scale
      
      ---------
      Co-authored-by: default avatarmtgu0705 <mtgu@amd.com>
      Co-authored-by: default avataraska-0096 <haocwang@amd.com>
      Co-authored-by: default avatarBartlomiej Kocot <barkocot@amd.com>
      d9f1ead3
  3. 07 Feb, 2025 7 commits
    • Daniel Su's avatar
      a8c5bd9b
    • Rostyslav Geyyer's avatar
      Add a host mx gemm reference kernel (#1864) · 37bfa01c
      Rostyslav Geyyer authored
      * Add mx gemm reference kernel
      
      * Update copyright year
      
      * Update mx gemm example
      
      * Use element-wise ops in the reference gemm
      37bfa01c
    • Adam Osewski's avatar
      CK Tile - small fix to hotloop scheduler & KPack value. (#1867) · 0c15de6a
      Adam Osewski authored
      * Use SmemPack in HotLoop scheduler
      
      * Additional debug print information
      
      * Change KPack value.
      
      Hardcode for now, as without AK1/BK1 there's no good way to determine
      its value.
      
      * Fix HotLoopScheduler MFMA instr parameters.
      0c15de6a
    • kylasa's avatar
      Support for dtypes (fp8, bf8, bf16 and fp16) for the ck_tile/03_gemm example. (#1845) · ab5d0278
      kylasa authored
      
      
      * Support bf16/fb8/bf8 datatypes for ck_tile/gemm
      
      * remove commented out code.
      
      * Addressing code review comments and enabling universal_gemm for all the supported data types.
      
      * Merge conflict resolution.
      
      * Solve the memory pipeline compilation error. Merge with the new change of CShuffle
      
      * finish the feature, pass the tests
      
      * Fix the pipeline and add the benchmark script for other data types
      
      ---------
      Co-authored-by: default avatarThomasNing <thomas.ning@amd.com>
      ab5d0278
    • jakpiase's avatar
      9b5dfba2
    • Illia Silin's avatar
      restore cron trigger (#1863) · 82cda34d
      Illia Silin authored
      82cda34d
    • Illia Silin's avatar
      Merge from internal (#1857) · 555244e7
      Illia Silin authored
      * enable batched_gemm_softmax_gemm_perm_wmma for gfx12
      
      * disable instances with blocksize=256 in attention examples
      
      * debuggging
      
      * debug
      
      * fixed lds_enabled
      
      * debugging
      
      * Fix and add limit to skiplds feature
      
      * Enable skipLds feature and fix compilation bugs
      
      * add ck_tile definitions for gfx12
      
      * fix clang format and test/wmma_op
      
      * updage instances cmake for gfx12
      
      * disable the test_wmma_op on gfx12
      
      * fix the builds for gfx950
      
      * add gfx12 and gfx950 to default target list
      
      * clean-up cmake file
      
      * Initial introduction of OFP8 data types.
      
      * Renamed FP8 and BF8 tests into FP8_FNUZ and BF8_FNUZ.
      
      * Implementation of ConvertFP32Nearest in test_fp8_ocp.
      
      * Remove dependence on possibly undeclared alias.
      
      * Implement FP8OCP test for stochastic rounding mode.
      
      * Implement FP8OCP tests for half_t type conversions.
      
      * enable bf16 atomic add on gfx950
      
      * Implement ConvertFP32Nearest test.
      
      * Implement ConvertFP32Stochastic test.
      
      * Implement ConvertFP16Nearest and ConvertFP16...
      555244e7
  4. 04 Feb, 2025 3 commits
  5. 01 Feb, 2025 1 commit
  6. 31 Jan, 2025 3 commits
    • arai713's avatar
      Codegen hipRTC compilation (#1579) · 2e3183af
      arai713 authored
      
      
      * updating codegen build for MIOpen access: adding .cmake for codegen component
      
      * updating CMake
      
      * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen
      
      * some more header guards
      
      * putting env file in header guard
      
      * cleaning up some includes
      
      * updated types file for hiprtc purposes
      
      * fixed types file: bit-wise/memcpy issue
      
      * updating multiple utility files to deal with standard header inclusion for hiprtc
      
      * added some more header guards in the utility files, replacing some standard header functionality
      
      * added some more header guards
      
      * fixing some conflicts in utility files, another round of header guards
      
      * fixing errors in data type file
      
      * resolved conflict errors in a few utility files
      
      * added header guards/replicated functionality in device files
      
      * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd
      
      * resolved issues with standard headers in device files: device_base.hpp, device_grouped_conv_fwd_multiple_abd.hpp, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp
      
      * added header guards for gridwise gemm files: gridwise_gemm_multiple_abd_xdl_cshuffle.hpp and gridwise_gemm_multiple_d_xdl_cshuffle.hpp
      
      * fixed issue with numerics header, removed from transform_conv_fwd_to_gemm and added to device_column_to_image_impl, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3, device_image_to_column_impl
      
      * replaced standard header usage and added header guards in block to ctile map and gridwise_gemm_pipeline_selector
      
      * resolved errors in device_gemm_xdl_splitk_c_shuffle files in regards to replacement of standard headers in previous commit
      
      * added replicated functionality for standard header methods in utility files
      
      * replaced standard header functionality in threadwise tensor slice transfer files and added header guards in element_wise_operation.hpp
      
      * temp fix for namespace error in MIOpen
      
      * remove standard header usage in codegen device op
      
      * removed standard header usage in elementwise files, resolved namespace errors
      
      * formatting fix
      
      * changed codegen argument to ON for testing
      
      * temporarily removing codegen compiler flag for testing purposes
      
      * added codegen flag again, set default to ON
      
      * set codegen flag default back to OFF
      
      * replaced enable_if_t standard header usage in data_type.hpp
      
      * added some debug prints to pinpoint issues in MIOpen
      
      * added print outs to debug in MIOpen
      
      * removed debug print outs from device op
      
      * resolved stdexcept include error
      
      * formatting fix
      
      * adding includes to new fp8 file to resolve ck::enable_if_t errors
      
      * made changes to amd_wave_read_first_lane
      
      * updated functionality in type utility file
      
      * fixed end of file issue
      
      * resovled errors in type utility file, added functionality to array utility file
      
      * fixed standard header usage replication in data_type file, resolves error with failing examples on navi3x
      
      * formatting fix
      
      * replaced standard header usage in amd_ck_fp8 file
      
      * added include to random_gen file
      
      * removed and replicated standard header usage from data_type and type_convert files for fp8 changes
      
      * replicated standard unsigned integer types in random_gen
      
      * resolved comments from review: put calls to reinterpret_cast for size_t in header guards
      
      * updated/added copyright headers
      
      * removed duplicate header
      
      * fixed typo in header guard
      
      * updated copyright headers
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      2e3183af
    • Illia Silin's avatar
      fix ck_tile gemm scripts (#1851) · 2ab8bf4c
      Illia Silin authored
      2ab8bf4c
    • Illia Silin's avatar
      Enable ck_tile gemms build in CI by default. (#1850) · 7cf89316
      Illia Silin authored
      * turn on the ck_tile gemm tests by default
      
      * enable ck_tile gemms CI build by default
      7cf89316
  7. 30 Jan, 2025 4 commits
  8. 29 Jan, 2025 1 commit
  9. 28 Jan, 2025 1 commit
  10. 27 Jan, 2025 2 commits
    • Andriy Roshchenko's avatar
      Add OCP FP8 support in CK_TILE (#1829) · 35aebe59
      Andriy Roshchenko authored
      * Add OCP FP8 to CK_TILE
      
      * Validate OCP FP8 in FMHA FWD under VALID=1
      35aebe59
    • Adam Osewski's avatar
      [CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835) · 39dc25a9
      Adam Osewski authored
      
      
      * Refactor universal gemm policy.
      
      * Adapt example to refactor changes.
      
      * Introduce static encoding pattern
      
      * Adding shuffled encoding patterns.
      
      * Fix err in reverse tuple.
      
      * Add transpose_tile2d
      
      * Small refactoring + doc
      
      * Enable reading on contiguous dimension in all layouts.
      
      * Transpose A/B register tile if needed for comp v3 pipeline.
      
      * Take contiguous dim size when calculating dram vector load size.
      
      * A/B smem pack size taken from WarpGemm attributes
      
      * Update B LDS layout and setup tile distribution pattern at class level.
      
      * Fix static assert.
      
      * Fix errors in examples.
      
      * Formatting & fix IsTranspose
      
      * Fix VectorSize & refactor.
      
      * Add error loging messages.
      
      * Fix VecLoadSize and TranspseC for mem pipeline.
      
      * Update unit-tests & disable mem pipeline.
      
      * Clang format
      
      * Update include/ck_tile/core/tensor/tile_window.hpp
      Co-authored-by: default avatarjakpiase <jakub.piasecki@amd.com>
      
      * Fix compilation and reviewers comments.
      
      * Refactor unit-test. Fallback to non-universal gemm.
      
      Need to use GemmPipelineAGmemBGmemCRegV1 for now,
      since GemmKernel is now supporting also non-K major vector reads.
      
      ---------
      Co-authored-by: default avatarjakpiase <jakub.piasecki@amd.com>
      39dc25a9
  11. 24 Jan, 2025 2 commits
  12. 22 Jan, 2025 3 commits
  13. 21 Jan, 2025 2 commits
    • Mateusz Ozga's avatar
      Simplify static_cast if-lands (#1828) · 3db77bc4
      Mateusz Ozga authored
      3db77bc4
    • Mateusz Ozga's avatar
      CK-Tile Grouped GEMM refactor and post PR fixes (#1756) · 3c93d3c4
      Mateusz Ozga authored
      * Grouped gemm simple code refactor
      
      * Offset invoker
      
      * Invoke generic Run, and replace name of parrtitioner variable
      
      * Tests fix type
      
      * Removed namespaces
      
      * Add template param to avoid implicit cast
      
      * Remove generic function
      
      * Constant value
      
      * underline enum to int16_t
      
      * Generalize partitioner function
      
      * Remove whitespaces
      
      * Rename function
      
      * Using support
      
      * Clang-format
      
      * Clang-format
      
      * Fn-partitioner description fn
      
      * Typo
      
      * Typo 2
      
      * Better description
      
      * Better description
      
      * Refactor after review
      
      * Use ctr instead of set fn
      
      * Inovke ctr and typo
      
      * Comments
      
      * Remove unnecessary comment
      
      * Review, remove modulo
      3c93d3c4
  14. 20 Jan, 2025 2 commits
  15. 19 Jan, 2025 1 commit
  16. 18 Jan, 2025 1 commit
  17. 17 Jan, 2025 2 commits
  18. 16 Jan, 2025 2 commits
  19. 15 Jan, 2025 1 commit