- 02 Oct, 2023 1 commit
-
-
Umang Yadav authored
-
- 31 May, 2023 1 commit
-
-
Illia Silin authored
-
- 01 Jul, 2022 1 commit
-
-
Anthony Chang authored
* dump lds content in appropriate precision type * add squared add reduction op; allows sq sum * initial stub from regular gemm impl * layernorm example code & host verification * initial layernorm implementation * tidy up * make C0 precision type consistent with C * clang-tidy and additional comments * tighten up example code * account for extra flops/bytes from normalization * clang-format * c0 bias/beta/gamma now have its own precision type * AccElemOp for gemm outputs prior to feeding to layernorm * update workgroup mapping * rename kernel template param to reflect its dual use * use LDS mem pool for reduction workspace * change cshuffle precision type to f16; clean up * clang-format * correct naming * explicit cast * fully implemented gemm + bias + activation + add + norm * activation in correct order * reflect reduction API's recent change * amend * clean up; add comment * keep up with recent changes in reduction API * format * resolve merge conflicts Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 25 Jun, 2022 1 commit
-
-
Chao Liu authored
-
- 09 Mar, 2022 1 commit
-
-
Chao Liu authored
* delete obselete files * move files * build * update cmake * update cmake * fix build * reorg examples * update cmake for example and test
-
- 28 Feb, 2022 1 commit
-
-
Anthony Chang authored
* add gitignore * host tensor: allow generating sequentially increasing value in a given dimension * gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor - remove dangling header include - modify example gemm_xdl accordingly - infer KPack value from M/NPerXdl - device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1 (API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight) * add LDS data dump utility * profiler: reflect API change for distinct K0/K1 for A/B matrices * profiler: add conflict-free LDS write FP16 kernel instances * fix accidental perf regression * address feedback; cosmetic changes * clang-format for new files * format Co-authored-by:Chao Liu <chao.liu2@amd.com>
-