- 21 Oct, 2024 6 commits
-
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
Po Yen Chen authored
* Use smaller width for lse_accum dist tensor * Update pipeline comment * Fix wrong distribution for lse_accum * Remove duplicate dim in lse_accum dist encoding * Decide fmha splitkv combine kernel kBlockSize by kM0 * Remove assumption of MPerThread=1 * Add log<4> & log<8> specialization * Enlarge occupancy array * Fix vector size for small tile * Add support for kMaxSplits=8 * Re-format gemm.hpp * Use 16x16x16 warp gemm for fwd_splitkv * Centralize policy code changes * Leave fp8/bf8 tile settings unchanged
-
carlushuang authored
-
- 20 Oct, 2024 5 commits
-
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
- 18 Oct, 2024 2 commits
-
-
Haocong WANG authored
-
Illia Silin authored
-
- 17 Oct, 2024 4 commits
- 16 Oct, 2024 14 commits
-
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
rocking authored
-
Qianfeng authored
* Add kQKHeaddimForGemmN and kVHeaddimForGemmN in order to support headdim 96 * Remove the using of MakeKRegBlockDescriptor and MakeVRegBlockDescriptor * Fix in bwd_piple_default_policy * Remove kQKHeaddim and rename kQKHeaddimForGemmN to kQKHeaddim in the bwd kernel and pipelines * Replace kVHeaddimForGemmN by kVHeaddim and kDoDvHeaddim * Update to hd96 tile settings * Add smoke test scripts for fmha-bwd hd96 * Revert "Add smoke test scripts for fmha-bwd hd96" This reverts commit 7ca7e1a93dc65eb99ce3ff4e82693589830e42a2. * Remove hd96 tile settings in fmha_bwd codegen to save compiling * Fix lost code line in bwd_pipeline_default_policy * Merge kDoDvHeaddim/kPadHeadDimDoDv to kVHeaddim/kPadHeadDimV and remove TileFmhaBwdTraits * Rename KRegSliceBlockDescriptor/VRegSliceBlockDescriptor to KRegBlockDescriptor/VRegBlockDescriptor * tiny adjustments --------- Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by:
danyao12 <Dan.Yao@amd.com>
-
- 15 Oct, 2024 3 commits
-
-
Paul Fultz II authored
* Build codegen as standalone * Add exception for device tests * Use local filesystem header * add a codegen test CI stage and daily build --------- Co-authored-by:
illsilin <Illia.Silin@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
Bartłomiej Kocot authored
* [CK_TILE] Add block universal gemm pipeline policy * Fixes * fixes2 * Fixes3 * fixeS
-
Po Yen Chen authored
-
- 14 Oct, 2024 5 commits
-
-
Rostyslav Geyyer authored
* Add non_native_vector_type * Add a test * Add non-native vector type * Fix CTOR * Fix non-native vector type of 1 * Fix CTORs * Use vector_type to cover non-native implementation as well * Update the test * Format * Format * Fix copyright years * Remove BoolVecT so far * Add AsType test cases * Update assert error message * Remove redundant type * Update naming * Add complex half type with tests * Add tests for vector reshaping * Add missing alignas * Update test/data_type/test_custom_type.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Compare custom types to built-in types * Add default constructor test * Add an alignment test --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com>
-
Bartłomiej Kocot authored
* Add transpose scale amax example * fixes * Tune reduce instance
-
rocking authored
-
rocking authored
2. Move construction of tensor_view and tile_window to operator()
-
Thomas Ning authored
* decouple the calling from gemm_pipeline * clang format
-
- 12 Oct, 2024 1 commit
-
-
aska-0096 authored
-