- 28 Nov, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK TILE] Add gemm compute pipeline v3 * Enable universal gemm compute pipeline. * Rename example and add compute pipeline. * Introduce ag bg cr pipeline impl base. * Refactor to reuse code. * Cleaning * Formatting. --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Adam Osewski <Adam.Osewski@amd.com>
-
- 27 Nov, 2024 3 commits
-
-
jakpiase authored
* add interwave scheduler for gemm mem pipeline * Fix merge artifacts. * Refactor unit tests. * Switch to interwave scheduler for mem example --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Adam Osewski <Adam.Osewski@amd.com>
-
Illia Silin authored
-
Adam Osewski authored
* Few small fixes. * New GroupedGemm instances (BF16) * Unify and refactor GroupedGEMM device API. * Adapt changes to new API. * Adapt grouped gemm profiler. * Accept multiple kbatches for grouped gemm profiler. - delete obsolete two stage as it is now covered by grouped gemm * Update unit test for grouped gemm. * Fix thresholds for BF16 and F8. Unblock tests. * Fix few instances. * Multiple small fixes. * Adapt to new API, check dynamic casting. * Uncomment few data types in grouped gemm profiler. * Fix call to SetDeviceArgs. * Fix profile grouped gemm multiply tile loop. * Fix grouped gemm tile loop kernel args in client examples. * Review comments.
-
- 26 Nov, 2024 7 commits
-
-
Illia Silin authored
-
rocking authored
* Fix cmake example build * Support max3 in smoothquant one pass * support max3 in two pass * support max3 in add_rmsnorm_rdquant
-
Adam Osewski authored
* Fix loop order. * Fix loop order in pipeline v4
-
jakpiase authored
* add check for bf16 splitk support for grouped gemm splitk * Update if condition --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
Po Yen Chen authored
* Allow getting batch size from splitkv tile partitioner * Fix wrong paged-kvcache impl for group mode * Fix wrong example code for page-kvcache * Undo changes in fmha_fwd.cpp * Always use 2D block table * Add is_gappy kernel argument for paged-kvcache The is_gappy argument is used for differentiating seqstart_k_ptr usage in flash-attention & xformers * Remove out-of-date comments * Remove no-longer used method * Fix wrong # page-block calculation * Fix wrong comment --------- Co-authored-by:Qianfeng <qianfeng.zhang@amd.com>
-
Adam Osewski authored
* Block universal gemm. * Universal block gemm with interwave scheduler - draft. * Refactoring * Move a/b_warp_tiles into BlockGemmImpl * set BlockGemmImpl as a class member * Change tile size for more suitable to memory bound cases. * Introduce kKPerThread to WarpGemm * Add documentation comment. * Fix Interwave scheduler block gemm. * Add compute/memory friendly tile configuration. * Clean * New tile configurations in gemm mem example. * Add more static checks and fix loop order in block gemm. * Add more static checks and use warp gemm mfma dispatcher. * Add default scheduler block gemm. * Remove logging in example.
-
carlushuang authored
* moe pipeline * update code * compile OK * update * update cpu reference * update pipeline_gemm0 * compiler ok * update pipeline * rename to ex pipeline * block-asm * update * update * update first gemm ok * compute correct * update file structure * update README * update * update * update code * update API * return unsupport case * add comment * update readme * update * uncomment * update * fix build err --------- Co-authored-by:valarLip <340077269@qq.com>
-
- 25 Nov, 2024 4 commits
-
-
Po Yen Chen authored
* Fix mis-matched tuple<> elem types * Rename MakeKargs() as MakeKargsImpl() --------- Co-authored-by:Qianfeng <qianfeng.zhang@amd.com>
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.5 to 1.9.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.9.0/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.5...v1.9.0 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
carlushuang authored
* update MOCK_ID for moe-sorting * add moe-smoothquant * update a comment * fix format * hot fix * update topk in overflow case * update comments * update bf16 cvt --------- Co-authored-by:valarLip <340077269@qq.com>
-
Qianfeng authored
* Change in fwd-splitkv kernel to support num_splits=1 case * Update in codegen fwd-splitkv to make num_splits > 1 cases pass * Specify instance traits in dispatch * Fix link error for fp8 kernels --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 23 Nov, 2024 2 commits
-
-
Illia Silin authored
-
Illia Silin authored
-
- 22 Nov, 2024 2 commits
-
-
schung-amd authored
* Add overloads for MakeKargs Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples. * Add overloads for MakeKargs Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples. * Re-format files using ck_tile remod.py --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
Illia Silin authored
-
- 21 Nov, 2024 3 commits
-
-
Harisankar Sadasivan authored
* universal streamk fp8 changes & ckprofiler instances * revert strides to -1 and verification options * fp8 exclusion on pre-gfx94 for universal_streamk * PR review based revisions: permissions reverted, removed hip err checks --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
Po Yen Chen authored
* Generate group mode paged-attn kernel * Enable paged-kvcache + group mode support * Add missing header: fused_moe.hpp * Add comment to explain kernel arg usage * Make error message more clear * Add comment for confusing data member names * Add more comment for confusing variable names * Fix typo in option description
-
Po Yen Chen authored
-
- 20 Nov, 2024 2 commits
-
-
Illia Silin authored
* reduce the docker image size and layers * clean up docker file * fix linker error for client example 24 * install CK into the default /opt/rocm/ path * restore installing CK to alternative path in CI * add linking for utility lib
-
Haocong WANG authored
-
- 19 Nov, 2024 2 commits
-
-
Illia Silin authored
* add more fp32 dl gemm instances * update the dates
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.4 to 1.8.5. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.5/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.4...v1.8.5 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 18 Nov, 2024 2 commits
-
-
Illia Silin authored
* add bf16 gemms for gfx11/gfx12 * reduce the input values in test_gemm * add int8 wmma gemm instances for gfx11/gfx12 * add example gemm_wmma_int8 * fix bug in gemm_wmma_int8 test * increase bf16 gemm test tolerance * update the dates and clean-up commented-out instances
-
Bartłomiej Kocot authored
* Batched GEMM Multiple D based on Universal GEMM Co-authored-by:
Jing Zhang <jizhan@fb.com> * CI fixes Co-authored-by:
Jing Zhang <jizhan@fb.com> --------- Co-authored-by:
Jing Zhang <jizhan@fb.com>
-
- 15 Nov, 2024 3 commits
-
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.3 to 1.8.4. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.4/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.3...v1.8.4 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Illia Silin authored
-
Illia Silin authored
-
- 14 Nov, 2024 2 commits
-
-
Andriy Roshchenko authored
* Improve test verbosity. * BUGFIX: Add missing initialization for reduction buffer * Change default initialization method Performance may be affected for fp32 and int8 examples. * Improve test verbosity * Cleanup
-
feli authored
Co-authored-by:dummycoderfe <noplydummmycoder@163.com>
-
- 13 Nov, 2024 3 commits
-
-
Illia Silin authored
-
Taylor Ding authored
-
Bartłomiej Kocot authored
* [CK TILE] Update gemm universal pipeline * Fixes * fix * Rebase
-
- 12 Nov, 2024 2 commits
-
-
Illia Silin authored
-
Thomas Ning authored
* Finished the feature * Modified the test file * Test case update * addresss comment * Addressed the review comment * Fixed the CI error
-
- 11 Nov, 2024 2 commits
-
-
Illia Silin authored
-
valarLip authored
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor * align CK coding style * extend strides to layernrom expample * clang-format...
-