"docs/archive_en_US/Tutorial/SetupNniDeveloperEnvironment.md" did not exist on "1d893dda8755fc2e29167584cd1fc18a44872c1b"
- 26 Jul, 2022 3 commits
- 25 Jul, 2022 6 commits
- 24 Jul, 2022 1 commit
-
-
Chao Liu authored
-
- 22 Jul, 2022 2 commits
-
-
zjing14 authored
* add batched_gemm_multiD * add ds * rename file * add batched_gemm_bias example * add batch_strides into bmm_c_permute * clean * rename example_28 to example_29 Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
Chao Liu authored
-
- 21 Jul, 2022 3 commits
-
-
Chao Liu authored
-
zjing14 authored
* replace gridwise_v2r3 with multiD * adjust parameters * add instances * fixed test_grouped_gemm * fix standalone softmax race condition around blockwise reduction * fixed ci * fixed comment: remove redundant workspace * use instanceFactory * add test layout * add empty Ds * add bias example * use array * sperate examples Co-authored-by:Anthony Chang <ac.chang@outlook.com>
-
Chao Liu authored
-
- 20 Jul, 2022 4 commits
- 19 Jul, 2022 2 commits
- 18 Jul, 2022 8 commits
- 17 Jul, 2022 3 commits
- 14 Jul, 2022 3 commits
- 13 Jul, 2022 2 commits
-
-
rocking5566 authored
* Implement layernorm kernel and deviceOp * verify gpu kernel with host code * 1. Separate gamma aand beta from affine 2. Check if argument is valid * clean * Sync the naming * Support sweep once mode if we can put k dimension data inside one block * [What] Get length from upper length. [Why] if we get length directly, we may get length after padding. * We only use one block in K dimension. Hence, we can simplify the indexing of global R/W. * Use 1d descriptor for gamma and beta * Add accElementwiseOp * Extract layernorm host code * Support different YVectorDim in GridwiseLayernorm * Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp * Gamma and beta can share the VGPR. * Add test for fp32 and fp16 * Fix bug of concurrency and add test case which may fail orignally * Propagate NaN for layernorm Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
Chao Liu authored
-
- 12 Jul, 2022 3 commits