1. 08 Oct, 2024 1 commit
    • Rostyslav Geyyer's avatar
      Add a gpu gemm reference kernel (#1528) · aa932445
      Rostyslav Geyyer authored
      
      
      * Add a gpu gemm reference kernel
      
      * Switch to gpu reference in gemm examples
      
      * Remove redundant arguments
      
      * Update all related examples
      
      * Update more examples
      
      * Try less threads per block
      
      * Try even less threads per block
      
      * Add support for all matrix layouts
      
      * Increase block size
      
      * Clean up
      
      * Remove hardcoded strides
      
      * Clean up
      
      * Try a column-major case
      
      * Revert back to row-major
      
      * Run both CPU and GPU veriffication
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      aa932445
  2. 06 Sep, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Redesign the DPP8 GEMM kernel to use warp-wise component (#863) · 37a8c1f7
      Bartlomiej Wroblewski authored
      * Redesign the DPP8 GEMM kernel to use warp-wise component
      
      * Review: Improve error messages
      
      * Review: Remove unnecessary empty lines
      
      * Review: Fix M, N per thread names
      
      * Review: Rename mfma_input_type to dpp_input_type
      
      * Review: Fix tensor adaptor; remove unnecessary element
      
      * Review: Remove calls to dpp_gemm's MakeCDescriptor
      
      * Review: Add blockwise doc, change function names to include dimension names
      
      * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file
      
      * Review: Add __restrict__ keywords
      
      * Review: Use MatrixPadder for padding A, B, C matrices
      
      * Review: Remove hardcoded datatypes
      
      * Review: Change names from FloatX to XDataType
      
      * Review: Introduce AK0 and BK0 instead of a single K0
      
      * Review: Remove construction of dpp_datatypes object
      
      * Review: Rename DppInstrRunner to DppLanegroupGemm
      37a8c1f7