- 02 Aug, 2024 1 commit
-
-
danyao12 authored
-
- 30 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 28 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 27 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 26 Jul, 2024 2 commits
- 25 Jul, 2024 4 commits
-
-
Qianfeng Zhang authored
-
zjing14 authored
* add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Haocong WANG <haocwang@amd.com>
-
danyao12 authored
-
danyao12 authored
-
- 24 Jul, 2024 3 commits
-
-
Andriy Roshchenko authored
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412) * Add CMakePresets configurations. * Add binary elementwise ConvScaleAdd and an example. * Numerical verification of results. Observed significant irregularities in F8 to F32 type conversions: ```log ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000 ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000 ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000 ``` * Implemented ConvScaleAdd + Example. * Add ConvScale+Bias Instances * Add Client Example for ConvScale+Bias * Fix number of bytes in an example.. * Cleanup.
-
Bartłomiej Kocot authored
* Add support for half_t and bfloat to reduction operations * Fix bhalf convert * Next fix bf16
-
danyao12 authored
-
- 23 Jul, 2024 2 commits
- 22 Jul, 2024 3 commits
-
-
Bartłomiej Kocot authored
-
danyao12 authored
-
danyao12 authored
-
- 21 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 20 Jul, 2024 4 commits
- 19 Jul, 2024 4 commits
-
-
Haocong WANG authored
* add ab_scale init support * enabled interwave * add scale type; update isSupport * adjust example * clean * enable f8 pure gemm rcr ckprofiler * Add gemm_multiply_multiply instances * clang format * Optimize for ScaleBlockMNK=128 * enable abscale f8 gemm ck profiler * Add pure f8 gemm test suite * Reverting to the state of project at f60fd77 * update copyright * clang format * update copyright --------- Co-authored-by:root <jizhan@amd.com>
-
ltqin authored
* init for reduce_threadwise multi_d * add reduce_threadwise_multi_d * add reduce_multi_d * clean * start add an other splitk device op * add reduce template parameter to SplitKBatchOffset * add reduce c matrix * clean up code * change example data type to bf16 * add bf16Ai8B example * remove reduce template parameter * add splitk atomic status to v4 * example add multi d parameters * device op add multi-d parameters * add multi-d to reduce * fix kbach=1 bug * change B layout to col in bf16Ai8B example * remove float adding struct * change multi-d interface * change file and class name * remove multi-d of bf16Ai8B example * change IsReduce function to IsReduceAdd * change example layout to RRR from RCR * according layout to set ds stride * reset parameter layout * add gemm universal reduce instance * add reduce factory * add profile_gemm_universal_reduce * add reduce to profiler * fix reduce instance * fix profiler reduce compiling bug * format * format library instance code * add mem instance for reduce library * fix call instance names * add workspace for reduce in ckProfiler * format * add mnpading to reduce library instance * add fp16 instance to reduce of profiler * change copyright time * restore profiler cmake file * add reduce text to instances * add DsLayout and DsDataType to instances template parameter * fixed gemm_reduce_multi_d * add an example without multi_d * Update common.hpp * Update gtest.cmake * Update gemm_xdl_splitk_reduce_bf16.cpp * clean * Update gtest.cmake * format * fixe api * format * default parameter change to RRR * add vector_len for multi_d * format * Update gtest.cmake * fix bf16A iBB elementwiseop * add ReduceDataType * move ReduceDataType to end position * format * remove googletest git method address * fix copyright time * update init data --------- Co-authored-by:
root <jizhan@amd.com> Co-authored-by:
letaoqin <letaoqin@amd.com> Co-authored-by:
Jing Zhang <jizhan@meta.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
Bartłomiej Kocot authored
* Refactor transform conv to gemm fwd * fixes codegen * wmma fixes * fix wmma * Fix copyright
-
danyao12 authored
-
- 18 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 17 Jul, 2024 1 commit
-
-
Qianfeng authored
-
- 16 Jul, 2024 2 commits
-
-
Andriy Roshchenko authored
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. (#1386) * Add CMakePresets configurations. * Add ConvScale+ReLU Functor and an Example * Account for ReLU FLOPs. * Add instances of 3D convolutions with ConvscaleRelu operation. * Implement Client Example * Cleanup
-
danyao12 authored
-
- 15 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 12 Jul, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Support access per groups and filter3x3 in grouped conv fwd * Fixes for large cases * Fixes for large tensors
-
danyao12 authored
-
- 11 Jul, 2024 2 commits
- 10 Jul, 2024 1 commit
-
-
danyao12 authored
-
- 08 Jul, 2024 1 commit
-
-
carlushuang authored
* wa prec, remove sgpr offset for inline asm * macro for set tile * ignore unused param if no kernel instances in host API * fix more prec issue * cache buffer resource * fix * support pre-nop * clear tile by vector type members * add workaround to reduce scratch memory * conditionally enable workaround code * enable workaround start from certain build version * fallback set_tile() implementation from certain build version * undo template argument changes * put dummy asm in load_raw() * fix comments, refactor s_nop inside buffer_load --------- Co-authored-by:PoYen, Chen <PoYen.Chen@amd.com>
-
- 06 Jul, 2024 1 commit
-
-
Harisankar Sadasivan authored
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). * Update README.md * fixing clang-format issues * removed conflicts in struct members between streamk and universal streamk * corrected arg parsing for streamk and universal streamk * added stream-k policies for 3 tile and 4 tile * fixed argument type issue with parsing cmd args * changes suggested in PR review are made- removing comments and correcting copyright * file permissions updated * added default value support for grid_size and streamk-policy selection set to -1 * print messages for arguments * print messages for arguments * print messages for arguments1
-
- 04 Jul, 2024 1 commit
-
-
jakpiase authored
* Implemented smfmac xdlops * add reviewer comments
-