"sims/vscode:/vscode.git/clone" did not exist on "56775bc1685edb3c30c488bf1bd95db03ef38c72"
  1. 26 Oct, 2024 1 commit
  2. 25 Oct, 2024 2 commits
    • aledudek's avatar
      Generic threshold calculation (#1546) · 9385caa3
      aledudek authored
      * Calculate generic relative threshold pool3dfwd
      
      * Calculate absolute error threshold pool3d fwd
      
      * Generic threshold calculation take max input for relative error pool3dfwd
      
      * Remove max possible value for error calculation at runtime
      
      * Remove debug print in pool3dfwd
      
      * Pool3d fwd adjusted types in generic threshold calculation
      
      * Generic threshold calculation take into account number of accumulations and accdatatype
      
      * Generic threshold fix final error formula
      
      * Generic threshold calculation - num of accs fix
      
      * Generic threshold calculation - adjust absolute error
      
      * Generic threshold calculation - OutDataType in absolute error
      9385caa3
    • dummycoderfe's avatar
      hot_fix epsilon pos (#1597) · 9183ce69
      dummycoderfe authored
      
      Co-authored-by: default avatardummycoderfe <noplydummmycoder@163.com>
      9183ce69
  3. 22 Oct, 2024 2 commits
    • Jatin Chaudhary's avatar
    • ltqin's avatar
      update layernorm (#1570) · 0394f8a7
      ltqin authored
      * port layernorm
      
      * change warp_welford.hpp
      
      * Update warpshuffle
      
      * 1. Add save mean and save std back
      2. Move construction of tensor_view and tile_window to operator()
      
      * refine welford max count calculation
      
      * unify layernorm api
      
      * Rename file
      
      * Remove save mean and inv std
      
      * Revert "refine welford max count calculation"
      
      This reverts commit 02236580
      
      .
      
      * Fix order of parameter
      
      * refine welford max count calculation again
      
      * Remove fp32 instances
      
      * Fix bug of padding
      
      * refactor api
      
      * Support bf16
      
      * Extract common function
      
      * Refine arg of operator()
      
      * Add kMThreadPerBlock to template parameter
      
      * clang format
      
      * Refine variable name
      
      * Refine file name
      
      * remove redundant line
      
      * refactor layernorm2d pipeline and add block-per-block utility
      
      * fix name
      
      * rename more
      
      * add more block-per-tile instance
      
      * remove duplicated define
      
      * update instance for 2048, 1024 case
      
      * support up to 2048 now
      
      * opt loading
      
      * add n1536
      
      * Add two pass pipeline
      
      * format
      
      * Fix incorrect type
      
      * parallel compilation
      
      * Use smaller N
      
      * fix 2p pass
      
      * Support Repeat_M in distribution
      
      * Refine nameing
      
      * Add reduce example
      
      ---------
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      Co-authored-by: default avataraska-0096 <haocwang@amd.com>
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      0394f8a7
  4. 21 Oct, 2024 1 commit
    • Po Yen Chen's avatar
      [CK_TILE] Optimize fmha splitkv & splitkv combine kernels (#1577) · 95e722a3
      Po Yen Chen authored
      * Use smaller width for lse_accum dist tensor
      
      * Update pipeline comment
      
      * Fix wrong distribution for lse_accum
      
      * Remove duplicate dim in lse_accum dist encoding
      
      * Decide fmha splitkv combine kernel kBlockSize by kM0
      
      * Remove assumption of MPerThread=1
      
      * Add log<4> & log<8> specialization
      
      * Enlarge occupancy array
      
      * Fix vector size for small tile
      
      * Add support for kMaxSplits=8
      
      * Re-format gemm.hpp
      
      * Use 16x16x16 warp gemm for fwd_splitkv
      
      * Centralize policy code changes
      
      * Leave fp8/bf8 tile settings unchanged
      95e722a3
  5. 16 Oct, 2024 1 commit
    • Qianfeng's avatar
      [CK_TILE] Improve headdim96 performance for fmha-bwd (#1573) · 14c3cfb1
      Qianfeng authored
      
      
      * Add kQKHeaddimForGemmN and kVHeaddimForGemmN in order to support headdim 96
      
      * Remove the using of MakeKRegBlockDescriptor and MakeVRegBlockDescriptor
      
      * Fix in bwd_piple_default_policy
      
      * Remove kQKHeaddim and rename kQKHeaddimForGemmN to kQKHeaddim in the bwd kernel and pipelines
      
      * Replace kVHeaddimForGemmN by kVHeaddim and kDoDvHeaddim
      
      * Update to hd96 tile settings
      
      * Add smoke test scripts for fmha-bwd hd96
      
      * Revert "Add smoke test scripts for fmha-bwd hd96"
      
      This reverts commit 7ca7e1a93dc65eb99ce3ff4e82693589830e42a2.
      
      * Remove hd96 tile settings in fmha_bwd codegen to save compiling
      
      * Fix lost code line in bwd_pipeline_default_policy
      
      * Merge kDoDvHeaddim/kPadHeadDimDoDv to kVHeaddim/kPadHeadDimV and remove TileFmhaBwdTraits
      
      * Rename KRegSliceBlockDescriptor/VRegSliceBlockDescriptor to KRegBlockDescriptor/VRegBlockDescriptor
      
      * tiny adjustments
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      Co-authored-by: default avatardanyao12 <Dan.Yao@amd.com>
      14c3cfb1
  6. 15 Oct, 2024 2 commits
  7. 14 Oct, 2024 3 commits
  8. 12 Oct, 2024 1 commit
  9. 10 Oct, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm cshuffle & CK Tile GEMM restructure (#1535) · 6f27bc98
      Thomas Ning authored
      
      
      * ake the cshuffle compilable
      
      * modify Mhe reference on gpu and cpu. Correaccess of cshuffle
      
      * fix the cpu reference code
      
      * Complete the in tile shuffle logic
      
      * restructure the kernel template input
      
      * change the naming pattern of ck_tile gemm pipeline
      
      * Re-format files using remod.py
      
      * Solve the fmha conflict with gemm
      
      * Comment Addressed from Carlus
      
      ---------
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      6f27bc98
  10. 09 Oct, 2024 1 commit
  11. 08 Oct, 2024 2 commits
  12. 07 Oct, 2024 3 commits
  13. 04 Oct, 2024 2 commits
  14. 02 Oct, 2024 1 commit
  15. 01 Oct, 2024 2 commits
  16. 27 Sep, 2024 1 commit
  17. 26 Sep, 2024 1 commit
  18. 25 Sep, 2024 1 commit
  19. 22 Sep, 2024 1 commit
  20. 20 Sep, 2024 2 commits
  21. 18 Sep, 2024 1 commit
  22. 14 Sep, 2024 1 commit
  23. 13 Sep, 2024 1 commit
    • Jun Liu's avatar
      Customize filesystem in CK for legacy systems (#1509) · 81bc1496
      Jun Liu authored
      
      
      * Legacy support: customized filesystem
      
      * Update cmakefile for python alternative path
      
      * fix build issues
      
      * CK has no boost dependency
      
      * More fixes to issues found on legay systems
      
      * fix clang format issue
      
      * Check if blob is correctly generated in cmake
      
      * fix the python issues
      
      * add a compiler flag for codegen when using alternative python
      
      * use target_link_options instead of target_compile_options
      
      ---------
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      81bc1496
  24. 12 Sep, 2024 1 commit
  25. 11 Sep, 2024 2 commits
  26. 10 Sep, 2024 1 commit
  27. 07 Sep, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm example (#1488) · caacd388
      Thomas Ning authored
      
      
      * Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout
      
      * Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future.
      
      * Fix: Clang Format, API fixed from fmha
      
      * fix with better naming convention
      
      * revert back the pipeline code of fmha
      
      * Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one.
      
      * clang format with the reference_gemm file
      
      * convert the clang format with the remod.py
      
      * Changed the format and variable name of the kernel gemm_shape and partitioner
      
      ---------
      Co-authored-by: default avatarthomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu>
      caacd388
  28. 05 Sep, 2024 1 commit