1. 15 Oct, 2024 1 commit
  2. 14 Oct, 2024 1 commit
  3. 12 Oct, 2024 2 commits
  4. 11 Oct, 2024 2 commits
  5. 10 Oct, 2024 2 commits
  6. 08 Oct, 2024 4 commits
  7. 07 Oct, 2024 2 commits
  8. 04 Oct, 2024 1 commit
    • kylasa's avatar
      Adding seed and offset pointer support to the philox random number generator. (#1523) · c24fae23
      kylasa authored
      
      
      * Adding seed and offset pointer support to the philox random number generator.
      
      * Separating seed and offset pointer checks with different condition statements.
      
      * Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs.
      
      * Correcting a typo in the readme file
      
      * Re-format files using remod.py
      
      * Use STL type for API parameters
      
      * Use simpler struct design for drop_seed & drop_offset
      
      * Undo unnecessary changes
      
      * Sync kargs style for fmha_fwd.hpp/.cpp
      
      * Use templated union to reduce code
      
      * Use structured binding to make code more readable
      
      ---------
      Co-authored-by: default avatarSudhir Kylasa <sukylasa@amd.com>
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      c24fae23
  9. 01 Oct, 2024 2 commits
    • Po Yen Chen's avatar
      [CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels (#1527) · a1c07e8d
      Po Yen Chen authored
      * Use same layout for o_acc and o tensor
      
      * Use better param names in partitioner
      
      * Remove redundant kargs 'max_seqlen_q'
      
      * Use better param names in splitkv kernel
      
      * Add comment for additional kernel arguments
      
      * Sync empty loop early return logics between pipelines
      
      * Pass more arguments to cmake in scripts
      
      * Align backslashes
      
      * Fix wrong o_acc tensor view strides
      
      * Change o_acc layout if o_perm=0
      
      * Handle whole row masked via attn_bias
      
      * Use use vector width = 1 for o_acc
      
      * Use more even split sizes
      a1c07e8d
    • M.Emin Ozturk's avatar
      Complex Contraction CK Bilinear Example (#1061) · 4cd1dc7f
      M.Emin Ozturk authored
      
      
      * complex type contraction
      
      * bug fix
      
      * update
      
      * Tensor Contraction Complex Data Type is working
      
      * 4D Kernel
      
      * some change
      
      * validation check in progress
      
      * validation issue
      
      * fp32 verification error is fixed
      
      * fp32 and fp64 are done
      
      * remove old files
      
      * remove cmake files
      
      * remove cmake files
      
      * Readme
      
      * img verification
      
      * CMakeList
      
      * number changed
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarEmin Ozturk <emin.ozturk@utah.edu>
      4cd1dc7f
  10. 29 Sep, 2024 1 commit
  11. 27 Sep, 2024 2 commits
  12. 26 Sep, 2024 1 commit
  13. 23 Sep, 2024 2 commits
  14. 21 Sep, 2024 2 commits
  15. 20 Sep, 2024 1 commit
  16. 19 Sep, 2024 3 commits
  17. 18 Sep, 2024 3 commits
  18. 14 Sep, 2024 1 commit
  19. 13 Sep, 2024 1 commit
    • Jun Liu's avatar
      Customize filesystem in CK for legacy systems (#1509) · 81bc1496
      Jun Liu authored
      
      
      * Legacy support: customized filesystem
      
      * Update cmakefile for python alternative path
      
      * fix build issues
      
      * CK has no boost dependency
      
      * More fixes to issues found on legay systems
      
      * fix clang format issue
      
      * Check if blob is correctly generated in cmake
      
      * fix the python issues
      
      * add a compiler flag for codegen when using alternative python
      
      * use target_link_options instead of target_compile_options
      
      ---------
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      81bc1496
  20. 09 Sep, 2024 1 commit
  21. 07 Sep, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm example (#1488) · caacd388
      Thomas Ning authored
      
      
      * Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout
      
      * Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future.
      
      * Fix: Clang Format, API fixed from fmha
      
      * fix with better naming convention
      
      * revert back the pipeline code of fmha
      
      * Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one.
      
      * clang format with the reference_gemm file
      
      * convert the clang format with the remod.py
      
      * Changed the format and variable name of the kernel gemm_shape and partitioner
      
      ---------
      Co-authored-by: default avatarthomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu>
      caacd388
  22. 05 Sep, 2024 4 commits