1. 06 Dec, 2024 1 commit
  2. 05 Dec, 2024 2 commits
  3. 04 Dec, 2024 3 commits
  4. 30 Nov, 2024 1 commit
    • mtgu0705's avatar
      Merge the int4 kernel and profiling in one commit for RTP. · 40054f53
      mtgu0705 authored
      Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass.
      Modify the kernel to 128x128x128, and use mfma_32x32x4
      Move the weight permute from host to device
      
      Modified the scale init method.
      
      Modified the init method, the function is failed, need to debug.
      
      Added init method
      
      Support group=128 for Llam2-7B-int4
      
      Move the weight permute from host to device
      
      Add ckProfiler for GEMM b scale (int4)
      
      Add reference function.
      
      Add pipeline v4 (2 LDS pingpong)
      
      Add more int4-Gemm kernel profiling instances.
      
      Modify the in4-Gemm kernel instances
      
      Move the pk_i4 permute in kernel
      40054f53
  5. 27 Oct, 2024 1 commit
  6. 24 Oct, 2024 2 commits
  7. 23 Oct, 2024 6 commits
  8. 22 Oct, 2024 3 commits
  9. 21 Oct, 2024 3 commits
  10. 20 Oct, 2024 2 commits
  11. 18 Oct, 2024 2 commits
  12. 16 Oct, 2024 1 commit
  13. 15 Oct, 2024 3 commits
  14. 14 Oct, 2024 1 commit
  15. 13 Oct, 2024 1 commit
  16. 11 Oct, 2024 1 commit
  17. 09 Oct, 2024 1 commit
  18. 08 Oct, 2024 2 commits
  19. 07 Oct, 2024 3 commits
  20. 04 Oct, 2024 1 commit
    • kylasa's avatar
      Adding seed and offset pointer support to the philox random number generator. (#1523) · c24fae23
      kylasa authored
      
      
      * Adding seed and offset pointer support to the philox random number generator.
      
      * Separating seed and offset pointer checks with different condition statements.
      
      * Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs.
      
      * Correcting a typo in the readme file
      
      * Re-format files using remod.py
      
      * Use STL type for API parameters
      
      * Use simpler struct design for drop_seed & drop_offset
      
      * Undo unnecessary changes
      
      * Sync kargs style for fmha_fwd.hpp/.cpp
      
      * Use templated union to reduce code
      
      * Use structured binding to make code more readable
      
      ---------
      Co-authored-by: default avatarSudhir Kylasa <sukylasa@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      c24fae23