1. 08 Sep, 2022 15 commits
  2. 07 Sep, 2022 15 commits
  3. 06 Sep, 2022 10 commits
    • Po-Yen, Chen's avatar
      b41e6019
    • Po-Yen, Chen's avatar
      Remove no-longer used type argument · d356c871
      Po-Yen, Chen authored
      d356c871
    • Po-Yen, Chen's avatar
      Add 'GridwisePermute' kernel · 7a6dbadc
      Po-Yen, Chen authored
      This kernel is a clone of 'GridwiseElementwise_1D'
      7a6dbadc
    • Anthony Chang's avatar
      Fused attention instances & padding tests (#395) · 868e5c55
      Anthony Chang authored
      * modify comment
      
      * trim unnecessary check
      
      * add gemm spec in kernel name
      
      * add TNTT gemm_gemm + atten kernel instances
      
      * refactor attention padding to better fit in unit tests
      
      This streamlines usage where "ResetNaNToMinusInf" is now hidden from user facing device op.
      Also added compile-time conditionals that load OOB value as NaN only after padding is enabled
      
      * add adhoc padding test for atten
      
      * shrink input value range for attention kernel validation to avoid occasional error by 1e-3
      
      Still unsure whether this kind of deterministic floating point accurary issue is expected
      or not. May want to try exact same approach as the GPU kernel in the host reference
      GEMM+Softmax+GEMM function to see if the accuracy discrepancy goes away. Until then,
      shrink the input value range as it is less likely to produce errors of around ~1e-3.
      
      * attention kernel proper granular padding for all 4 dims
      
      * IsSupportedArgument checks
      
      * test more padded cases
      
      * block PadK specialization in attention kernels
      
      * workaround clang crash for gfx908
      
      (gfx908 only) workaround for compiler crash in fused kernels on mainline #9110; #10738 seems ok
      error message was "fatal error: error in backend: Error while trying to spill VGPR0 from class
      VGPR_32: Cannot scavenge register without an emergency spill slot!"
      this fall back to less ideal way of handle NPadding in fused attention kernel
      
      * comment out kernels giving wrong results on MI100; MI200 doesn't seem affected
      868e5c55
    • Anthony Chang's avatar
      GemmGemm TNNT instances (#399) · fe52c94c
      Anthony Chang authored
      * add gemm_gemm TNNT instance
      
      * sanitize Gemm1KPack
      
      * disable instances that failed validation on mi100
      fe52c94c
    • Po-Yen, Chen's avatar
      fa21bcde
    • Po-Yen, Chen's avatar
      Passing 'axes' to 'DevicePermute' · 5b63400a
      Po-Yen, Chen authored
      5b63400a
    • Adam Osewski's avatar
      Softmax client example (#396) · 3da5c19e
      Adam Osewski authored
      
      
      * Update Softmax device operation interface.
      
      * Update ckProfiler.
      
      * Update Softmax UT.
      
      * Update example.
      
      * Client example.
      
      * Clang format
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      3da5c19e
    • Po-Yen, Chen's avatar
      50f5ce49
    • Po-Yen, Chen's avatar
      Remove unnecessary include directives · 2377c2e8
      Po-Yen, Chen authored
      2377c2e8