- 21 Oct, 2024 14 commits
-
-
rocking authored
-
carlushuang authored
-
rocking authored
-
rocking authored
-
rocking authored
-
letaoqin authored
-
rocking authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
Po Yen Chen authored
* Use smaller width for lse_accum dist tensor * Update pipeline comment * Fix wrong distribution for lse_accum * Remove duplicate dim in lse_accum dist encoding * Decide fmha splitkv combine kernel kBlockSize by kM0 * Remove assumption of MPerThread=1 * Add log<4> & log<8> specialization * Enlarge occupancy array * Fix vector size for small tile * Add support for kMaxSplits=8 * Re-format gemm.hpp * Use 16x16x16 warp gemm for fwd_splitkv * Centralize policy code changes * Leave fp8/bf8 tile settings unchanged
-
carlushuang authored
-
- 20 Oct, 2024 5 commits
-
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
- 18 Oct, 2024 2 commits
-
-
Haocong WANG authored
-
Illia Silin authored
-
- 17 Oct, 2024 4 commits
- 16 Oct, 2024 14 commits
-
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
Qianfeng authored
* Add kQKHeaddimForGemmN and kVHeaddimForGemmN in order to support headdim 96 * Remove the using of MakeKRegBlockDescriptor and MakeVRegBlockDescriptor * Fix in bwd_piple_default_policy * Remove kQKHeaddim and rename kQKHeaddimForGemmN to kQKHeaddim in the bwd kernel and pipelines * Replace kVHeaddimForGemmN by kVHeaddim and kDoDvHeaddim * Update to hd96 tile settings * Add smoke test scripts for fmha-bwd hd96 * Revert "Add smoke test scripts for fmha-bwd hd96" This reverts commit 7ca7e1a93dc65eb99ce3ff4e82693589830e42a2. * Remove hd96 tile settings in fmha_bwd codegen to save compiling * Fix lost code line in bwd_pipeline_default_policy * Merge kDoDvHeaddim/kPadHeadDimDoDv to kVHeaddim/kPadHeadDimV and remove TileFmhaBwdTraits * Rename KRegSliceBlockDescriptor/VRegSliceBlockDescriptor to KRegBlockDescriptor/VRegBlockDescriptor * tiny adjustments --------- Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by:
danyao12 <Dan.Yao@amd.com>
-
- 15 Oct, 2024 1 commit
-
-
Paul Fultz II authored
* Build codegen as standalone * Add exception for device tests * Use local filesystem header * add a codegen test CI stage and daily build --------- Co-authored-by:
illsilin <Illia.Silin@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-