1. 06 Aug, 2024 3 commits
    • bibek's avatar
      adding mha as static lib (#1366) · 840c5397
      bibek authored
      
      
      * adding mha as static lib
      
      * add fmha fwd compile options
      
      * typo
      
      * fix python version
      
      * python version to 3
      
      * increase path length
      
      * add max path flag in mha cmake
      
      * fix long path issue
      
      * mha currently only runs in gfx94x
      
      * only buld mha in mi300
      
      * populate gpu_list
      
      * add mha compile flags
      
      * avoid building mha in gpu other then gfx94x
      
      * some comments and  include ck_tile in rocm
      
      * use rocm_install
      
      * place ck_tile in include
      
      * correct ck_tile path
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      840c5397
    • jakpiase's avatar
      Fix for beta!=0 in reduce (#1440) · b74d4d4d
      jakpiase authored
      * fix for beta!=0 in reduce
      
      * add reviewers suggestions
      b74d4d4d
    • Bartłomiej Kocot's avatar
      Add Grouped Conv Fwd Large Tensor kernel (#1432) · 4ec5c52a
      Bartłomiej Kocot authored
      * Support 64 bit indexing
      
      * Add new grouped conv fwd kernel for large tensors
      
      * Add instances large tensor
      
      * Fixes for transform conv to gemm
      
      * Fixes
      
      * fixes
      
      * Remove not needed instances
      
      * examples fixes
      
      * Remove not need ds arrays
      
      * Fix tests
      
      * Add 2GB check in gridwise dl
      
      * Fixes
      4ec5c52a
  2. 05 Aug, 2024 2 commits
  3. 01 Aug, 2024 1 commit
  4. 31 Jul, 2024 4 commits
  5. 30 Jul, 2024 2 commits
  6. 26 Jul, 2024 2 commits
  7. 25 Jul, 2024 2 commits
  8. 24 Jul, 2024 3 commits
  9. 23 Jul, 2024 1 commit
  10. 22 Jul, 2024 1 commit
  11. 19 Jul, 2024 3 commits
    • Haocong WANG's avatar
      [GEMM] F8 GEMM, performance optimized. (#1384) · 8c90f25b
      Haocong WANG authored
      
      
      * add ab_scale init support
      
      * enabled interwave
      
      * add scale type; update isSupport
      
      * adjust example
      
      * clean
      
      * enable f8 pure gemm rcr ckprofiler
      
      * Add gemm_multiply_multiply instances
      
      * clang format
      
      * Optimize for ScaleBlockMNK=128
      
      * enable abscale f8 gemm ck profiler
      
      * Add pure f8 gemm test suite
      
      * Reverting to the state of project at f60fd77
      
      * update copyright
      
      * clang format
      
      * update copyright
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      8c90f25b
    • ltqin's avatar
      Universal gemm splitk using reduce (with multi-d) (#1341) · c544eb4d
      ltqin authored
      
      
      * init for reduce_threadwise multi_d
      
      * add reduce_threadwise_multi_d
      
      * add reduce_multi_d
      
      * clean
      
      * start add an other splitk device op
      
      * add reduce template parameter to SplitKBatchOffset
      
      * add reduce c matrix
      
      * clean up code
      
      * change example data type to bf16
      
      * add bf16Ai8B example
      
      * remove reduce template parameter
      
      * add splitk atomic status to v4
      
      * example add multi d parameters
      
      * device op add multi-d parameters
      
      * add multi-d to reduce
      
      * fix kbach=1 bug
      
      * change B layout to col in  bf16Ai8B example
      
      * remove float adding struct
      
      * change  multi-d interface
      
      * change file and class name
      
      * remove multi-d of bf16Ai8B example
      
      * change IsReduce function to IsReduceAdd
      
      * change example layout to RRR from RCR
      
      * according layout to set ds stride
      
      * reset parameter layout
      
      * add gemm universal reduce instance
      
      * add reduce factory
      
      * add profile_gemm_universal_reduce
      
      * add reduce to profiler
      
      * fix reduce instance
      
      * fix profiler reduce compiling bug
      
      * format
      
      * format library instance code
      
      * add mem instance for reduce library
      
      * fix call instance names
      
      * add workspace for reduce in ckProfiler
      
      * format
      
      * add mnpading to reduce library instance
      
      * add fp16 instance to reduce of profiler
      
      * change copyright time
      
      * restore profiler cmake file
      
      * add reduce text to instances
      
      * add DsLayout and DsDataType to instances template parameter
      
      * fixed gemm_reduce_multi_d
      
      * add an example without multi_d
      
      * Update common.hpp
      
      * Update gtest.cmake
      
      * Update gemm_xdl_splitk_reduce_bf16.cpp
      
      * clean
      
      * Update gtest.cmake
      
      * format
      
      * fixe api
      
      * format
      
      * default parameter change to RRR
      
      * add vector_len for multi_d
      
      * format
      
      * Update gtest.cmake
      
      * fix bf16A iBB elementwiseop
      
      * add ReduceDataType
      
      * move ReduceDataType to end position
      
      * format
      
      * remove googletest git method  address
      
      * fix copyright time
      
      * update init data
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      c544eb4d
    • Bartłomiej Kocot's avatar
      Refactor transform conv to gemm fwd (#1391) · 70a814f1
      Bartłomiej Kocot authored
      * Refactor transform conv to gemm fwd
      
      * fixes codegen
      
      * wmma fixes
      
      * fix wmma
      
      * Fix copyright
      70a814f1
  12. 18 Jul, 2024 1 commit
  13. 17 Jul, 2024 1 commit
  14. 16 Jul, 2024 5 commits
  15. 12 Jul, 2024 2 commits
  16. 11 Jul, 2024 3 commits
  17. 10 Jul, 2024 2 commits
  18. 09 Jul, 2024 2 commits