- 12 Sep, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Refactor f8_t to add bf8_t * Add check_err impl for f8_t * Update fp8 test * Format * Revert the fix * Update vector_type implementation * Add bf8 test * Add bf8, use BitInt types * Add bf8 conversion methods * Update type_convert for fp8/bf8 * Add check_err fp8/bf8 support * Add subnorm fp8 tests * Add subnorm bf8 tests * Fix conversion * Add bf8 cmake bindings * Add macros to enable build with disabled fp8/bf8 * Remove is_native method * Update flag combination for mixed precision instances * Add more flag checks * Add another flag to a client example * Add type traits, decouple f8/bf8 casting * Clean up * Decouple fp8 and bf8 flags * Remove more redundant flags * Remove leftover comments
-
- 22 Aug, 2023 2 commits
-
-
zjing14 authored
* updated regular gemm * update ckProfiler * fixed gtests --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Rostyslav Geyyer authored
* Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 31 May, 2023 1 commit
-
-
Illia Silin authored
-
- 30 May, 2023 1 commit
-
-
Adam Osewski authored
* Add license header. * Reduce number of logged output. Add constant initialization. * Add functional tests for grouped_gemm with different kbatch value. * Add debug log informations + remove unused code. * Don't pass kbatch to CalculateKPadded. * Turn on logging in grouped gemm and gemm splitk profiler * Debug: limit number of test cases to run; * Log more information and initialize with constant value. * Turn on DEBUG_LOG * Add more debug log informations. * Limit the number of instances to compile. * Use GridwiseGemmPipeline * Use KBatch to calculate K0 * Multiple DebugLog messages. * Unit tests for multiple KBatch values. * Refactoring * Disable logging * extract out of if statement KBatch update. * Uncomment instances. * Disable DebugLog. * Use Kbatch when calculate KPadded. * Fix CGridDesc padding. * Use available helper functions. * Uncomment code commented for debuggin. * Remove unnecessary debug log messages. * Uncomment previously commented code for debug purposes. * Add KBatch info to profiler output summary log. * Add gtests for gemm splitk using ckProfiler API. * Add more test-cases for different data layout. * Add more test cases for gemm splitk * Remove old test. * Unit tests for MKNK ggemm interface. * Fix and add more unit-tests. * Constepxr everything! * Increase error threshold for fp16 and splitk. Since we're using fp16 atomic add for splitk there's a known precision loss. --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 02 May, 2023 1 commit
-
-
zjing14 authored
-
- 24 Apr, 2023 1 commit
-
-
zjing14 authored
-
- 01 Dec, 2022 1 commit
-
-
Po Yen Chen authored
* Re-structure ckProfiler source files * Rename profiler.cpp to main.cpp * Modularize ckProfiler operations * Add description for profiler operations * Use longer name to avoid name collision * Use macro to delay expansion * Use std::move() to avoid object copying * Prohibit users from calling dtor * Use macro to eliminate redundant code * Make friend function hidden * Add missing include directive <iostream> * Fix wrong include directives * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test Co-authored-by:Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-
- 11 Nov, 2022 1 commit
-
-
Po Yen Chen authored
* Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Rangify check_err() By rangifying check_err(), we can not only compare values between std::vector<>s, but also compare any ranges which have same value type. * Allow constructing Tensor<> like a HostTensorDescriptor * Simplify Tensor<> object construction logics * Remove more unnecessary 'HostTensorDescriptor' objects * Re-format example code * Re-write more HostTensorDescriptor ctor call
-
- 29 Jul, 2022 1 commit
-
-
Chao Liu authored
* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test
-
- 01 Jul, 2022 1 commit
-
-
Chao Liu authored
* interface for GEMM and GEMM+add+add+fastgelu * rename namespace * instance factory * fix build * fix build; add GEMM client example * clean
-
- 27 Jun, 2022 1 commit
-
-
Chao Liu authored
* add client example * clean * clean * reorg * clean up profiler * reorg * clea * fix profiler * function for getinstances * update client example * update client example * update client example * update * update example * update Jenkins file * update cmake * update Jenkins
-