1. 28 Mar, 2022 3 commits
  2. 04 Mar, 2022 2 commits
  3. 27 Sep, 2021 1 commit
  4. 20 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Fuse skip layernorm (#683) · 1bfb147d
      Paul Fultz II authored
      
      
      * Unify the vectorized and non-vectorized path
      
      * Formatting
      
      * Make fusion easily extendable
      
      * Add skip layernorm fusion
      
      * Formatting
      
      * Call correct layernorm function
      
      * Fix compile errors
      
      * Add DCE
      
      * Add test for skip layernorm
      
      * Formatting
      
      * Remove unused typedef
      
      * Formatting
      
      * Fix tidy issues
      
      * Formatting
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      1bfb147d
  5. 25 Aug, 2020 1 commit
    • Paul Fultz II's avatar
      Improve layernorm performance (#613) · 56b3bf58
      Paul Fultz II authored
      * Use increment instead of division to compute register offset
      
      * Formatting
      
      * Limit layernorm to 1024 elements
      
      * Formatting
      
      * Add verification to driver
      
      * Formatting
      
      * Remove early return
      
      * Use block_size 256
      
      * Vectorize the kernel
      
      * Formatting
      
      * Convert to vector type
      
      * Add layernorm tests
      
      * Formatting
      
      * Formatting
      
      * Refactor layernorm to run both algos
      
      * Formatting
      
      * Fix compile error
      
      * Fix tidy warnings
      
      * Formatting
      
      * Add layernorm function
      
      * Formatting
      56b3bf58
  6. 14 Aug, 2020 1 commit
    • kahmed10's avatar
      Layernorm onnx support (#599) · 2c5d5fee
      kahmed10 authored
      
      
      * fix pad calc
      
      * bert tf passes correctness
      
      * formatting
      
      * add test
      
      * formatting
      
      * remove comment
      
      * add inline
      
      * formatting
      
      * fix order for literal
      
      * formatting
      
      * test no mul_add
      
      * formatting
      
      * debug layernorm
      
      * debug layernorm
      
      * manual merge
      
      * more progress
      
      * formatting
      
      * remove miopen batchnorm
      
      * remove headers
      
      * Fix compile error with no dpp reductions
      
      * fix indices
      
      * formatting
      
      * change matcher
      
      * formatting
      
      * remove binds
      
      * formatting
      
      * disable tf matcher
      
      * formatting
      
      * use fast div
      
      * formatting
      
      * fix matcher
      
      * formatting
      
      * remove comment
      
      * move find_matches
      
      * add assert
      
      * formatting
      
      * fix deepcode issue
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      2c5d5fee