1. 08 Sep, 2023 14 commits
  2. 07 Sep, 2023 6 commits
  3. 06 Sep, 2023 3 commits
    • Bartlomiej Wroblewski's avatar
      Redesign the DPP8 GEMM kernel to use warp-wise component (#863) · 37a8c1f7
      Bartlomiej Wroblewski authored
      * Redesign the DPP8 GEMM kernel to use warp-wise component
      
      * Review: Improve error messages
      
      * Review: Remove unnecessary empty lines
      
      * Review: Fix M, N per thread names
      
      * Review: Rename mfma_input_type to dpp_input_type
      
      * Review: Fix tensor adaptor; remove unnecessary element
      
      * Review: Remove calls to dpp_gemm's MakeCDescriptor
      
      * Review: Add blockwise doc, change function names to include dimension names
      
      * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file
      
      * Review: Add __restrict__ keywords
      
      * Review: Use MatrixPadder for padding A, B, C matrices
      
      * Review: Remove hardcoded datatypes
      
      * Review: Change names from FloatX to XDataType
      
      * Review: Introduce AK0 and BK0 instead of a single K0
      
      * Review: Remove construction of dpp_datatypes object
      
      * Review: Rename DppInstrRunner to DppLanegroupGemm
      37a8c1f7
    • zjing14's avatar
      added padding of K into gemm_v2r3 (#887) · 3786bfe1
      zjing14 authored
      
      
      * added kpad support into v2r3
      
      * add generic instances
      
      * fixed comments
      
      * fixed mnk padding
      
      * Update device_batched_gemm_xdl.hpp
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      3786bfe1
    • zjing14's avatar
      Fixed fp8 gemm (#882) · a61b8b78
      zjing14 authored
      
      
      * add generic instances; fixed initi with fp8
      
      * fixed comment
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      a61b8b78
  4. 05 Sep, 2023 6 commits
  5. 04 Sep, 2023 1 commit
  6. 31 Aug, 2023 3 commits
    • zjing14's avatar
      Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f5ec04f0
    • rocking's avatar
      MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de
      rocking authored
      * Add maxpool instances
      
      * Rename index pool to max pool.
      
      * Add maxpool bwd bf16 instances
      
      * Add avg pool bwd instances
      
      * Rename avgpool and maxpool to avg_pool3d and max_pool
      
      * Add bf16 pool fwd instances
      
      * Add max pool bwd to ckProfiler
      
      * Add avg pool3d bwd to ckProfiler
      
      * Add avg pool bwd test
      
      * Fix bug of reference pool fwd (dilation)
      
      * Fix bug of max pool bwd  (dilation and initZero)
      
      * Support bf16 compute data type
      
      * Force compute type be f32. Because atomicAdd only support f32
      
      * Add max pool bwd test
      
      * Rename folder
      
      * Rename pool
      
      * Add max pool bwd client example
      
      * Add avg pool bwd client example
      
      * Add missing workspace
      
      * clang format
      
      * Rename macro
      
      * remove useless header
      
      * remove useless layout
      866377de
    • Illia Silin's avatar
      fix gemm_streamk example on mi300 (#875) · bf1912ed
      Illia Silin authored
      bf1912ed
  7. 30 Aug, 2023 1 commit
  8. 29 Aug, 2023 1 commit
  9. 28 Aug, 2023 1 commit
  10. 23 Aug, 2023 4 commits