GroupedGEMM + Gelu client example/instances/profiler (#614)
* Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by:Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment