• Adam Osewski's avatar
    GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7
    Adam Osewski authored
    
    
    * Grouped gemm + Gelu instances.
    
    * Device Instance Factory for GroupedGemm+Gelu
    
    * Client example
    
    * Rangify fill helper functions.
    
    * Fix name clash.
    
    * Profiler for grouped_gemm+gelu
    
    * No need to use full namespace name.
    
    * Add check for MRaw divisible by vector load.
    
    * Ugly fix for big errors.
    
    * Add grouped_gemm+gelu to profiler CMakelists.
    
    * Store in argument additional info.
    
    * Information about Mraw, Nraw, Kraw values.
    
    * Use FastGelu instead of Gelu.
    
    * Change client ex to use FastGelu
    
    * Remove relaxed error precision.
    
    * Remove duplicate output elementwise-op
    
    ---------
    Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    9096b1c7
CMakeLists.txt 3.99 KB