- 27 Oct, 2023 3 commits
-
-
Lisa Delaney authored
-
illsilin authored
-
illsilin authored
-
- 25 Oct, 2023 2 commits
- 24 Oct, 2023 5 commits
- 23 Oct, 2023 2 commits
- 22 Oct, 2023 4 commits
- 21 Oct, 2023 1 commit
-
-
illsilin authored
-
- 20 Oct, 2023 6 commits
- 19 Oct, 2023 5 commits
-
-
Illia Silin authored
* apply the patch for dl kernels on gfx11 * build DL kernels on navi32 CI
-
Qianfeng authored
* reinterpret_cast to const char* in dumpBufferToFile to be compatible with both const and non-const input pointers * Add seed input to GeneratorTensor_4 for normal_distribution generator * Add GetTypeString() for DeviceElementwiseImpl * Add HIP_CHECK_ERROR macro
-
Bartłomiej Kocot authored
* Extend available elementwise operations with conv examples * Fixes * Remove not needed convert * Update CMakeFile and dir name
-
Po Yen Chen authored
* Avoid force setting ENABLE_PIPELINE_V2_OPT to OFF * Remove compilation option variable MAX_ILP_OPTS
-
Bartlomiej Wroblewski authored
-
- 18 Oct, 2023 4 commits
-
-
rocking authored
* save mean and inverse std in normalization * Save mean and inverse std in splitK * Vector save mean and inv std * Modify instance for save mean and std * simplify the layernorm example * Save mean and std in groupnorm example * Save mean and inv std in ckProfiler and test * Remove compute data type from base class * Save mean and inv std in client example * Add changelog * clang format * Fix compile error * Refine naming * Avoid error in bf16 * revert changelog
-
zjing14 authored
Co-authored-by:Jing Zhang <jizha@amd.com>
-
zjing14 authored
* Add a condition to build fp8 instances * simplified buffer_load/store * add bfp8/fp8 * fixed * remove all f8/bf8 condition include folder * fixed cmake conditions * fixed DTYPES=fp16/bfp16 * fix * fixed buffer_load * fixed buffer_store * fix * clean example cmake files * fixed ci * fixed cit --------- Co-authored-by:
Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by:
Jing Zhang <jizha@amd.com>
-
zjing14 authored
* add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example * init commit for contraction_multi_ABD * add examples * add examples of multiA and broadcast * update example * fixed comments * Update cmake-ck-dev.sh * Update cmake-ck-dev.sh * Add comments into the example * Update CMakeLists.txt --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 17 Oct, 2023 8 commits
-
-
illsilin authored
-
illsilin authored
-
illsilin authored
-
zjing14 authored
* add ab_elementwise * fixed ci * fixed a merge issue * fixed pr comments * fixed a conflict * remove 61_example --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Bartłomiej Kocot authored
* Add grouped conv bwd weight wmma * Update README, changelog, profiler * Minor fixes * Fix grouped conv bwd wei dl kernel * Minor fixes * Minor stylistic fixes
-
illsilin authored
-
illsilin authored
-
illsilin authored
-