- 06 Dec, 2024 1 commit
-
-
aska-0096 authored
-
- 05 Dec, 2024 2 commits
- 04 Dec, 2024 2 commits
- 30 Nov, 2024 1 commit
-
-
mtgu0705 authored
Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass. Modify the kernel to 128x128x128, and use mfma_32x32x4 Move the weight permute from host to device Modified the scale init method. Modified the init method, the function is failed, need to debug. Added init method Support group=128 for Llam2-7B-int4 Move the weight permute from host to device Add ckProfiler for GEMM b scale (int4) Add reference function. Add pipeline v4 (2 LDS pingpong) Add more int4-Gemm kernel profiling instances. Modify the in4-Gemm kernel instances Move the pk_i4 permute in kernel
-
- 27 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 24 Oct, 2024 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 23 Oct, 2024 7 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 22 Oct, 2024 5 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 21 Oct, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 20 Oct, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 18 Oct, 2024 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 16 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 15 Oct, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 14 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 13 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 11 Oct, 2024 1 commit
-
-
Jing Zhang authored
-
- 09 Oct, 2024 2 commits
-
-
Illia Silin authored
-
Christopher Millette authored
-
- 08 Oct, 2024 2 commits
-
-
Rostyslav Geyyer authored
* Add a gpu gemm reference kernel * Switch to gpu reference in gemm examples * Remove redundant arguments * Update all related examples * Update more examples * Try less threads per block * Try even less threads per block * Add support for all matrix layouts * Increase block size * Clean up * Remove hardcoded strides * Clean up * Try a column-major case * Revert back to row-major * Run both CPU and GPU veriffication --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
Po Yen Chen authored
* Fix text alignment of ArgParser::print() * Update example README files * Clarify make-ck-dev.sh <arch> usage * Only keep some of the argument from '-?' output * Undo command line output changes in README * Only keep existing argument on doc and update description * Fix text alignment * Make cmake-ck-*.sh compatible with 'sh' command
-