1. 28 Sep, 2023 1 commit
    • Umang Yadav's avatar
      Add options to set tolerances inside MIGraphX driver (#2213) · 69d8d789
      Umang Yadav authored
      MIGraphX verification by default uses normalized RMS error as the basis for the verification.  This change adds some logic to allow migraphx to do "np.allclose" type of elementwise verification using atol and rtol.
      
      Commit also includes changes to consistently pass "gold" or "expected" results as the second argument for "verify_range()" calls.  Default RMS tolerance inside driver is set to 0.001 which IMO is high for FP32 compared to what we had earlier. Need better defaults
      69d8d789
  2. 14 Sep, 2023 1 commit
  3. 16 Jul, 2023 1 commit
  4. 22 Jun, 2022 1 commit
  5. 04 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Split cpu and reference implementation (#671) · 500d9441
      Paul Fultz II authored
      
      
      * Add all_targets cmake target
      
      * Rename target
      
      * Add ref target
      
      * Rename tests
      
      * Refactor compiler target
      
      * Formatting
      
      * Verify for every target
      
      * Formatting
      
      * Add verify test suite
      
      * Formatting
      
      * Add initial test programs
      
      * Formatting
      
      * Add rnn tests
      
      * Formatting
      
      * Validate gpu
      
      * Formatting
      
      * Remove old gpu tests
      
      * Fix gpu tests
      
      * Fix ref error
      
      * Fix tidy issues
      
      * Formatting
      
      * Tidy fixes
      
      * Fix header in python api
      
      * Rename to ref
      
      * Use ref in verify_onnx
      
      * Fix tidy issue
      
      * Build with verbose on
      
      * Fix typo
      
      * Remove verbose
      
      * rename some cpu prefix to ref
      Co-authored-by: default avatarShucai Xiao <Shucai.Xiao@amd.com>
      500d9441
  6. 25 Aug, 2020 1 commit
    • Paul Fultz II's avatar
      Improve layernorm performance (#613) · 56b3bf58
      Paul Fultz II authored
      * Use increment instead of division to compute register offset
      
      * Formatting
      
      * Limit layernorm to 1024 elements
      
      * Formatting
      
      * Add verification to driver
      
      * Formatting
      
      * Remove early return
      
      * Use block_size 256
      
      * Vectorize the kernel
      
      * Formatting
      
      * Convert to vector type
      
      * Add layernorm tests
      
      * Formatting
      
      * Formatting
      
      * Refactor layernorm to run both algos
      
      * Formatting
      
      * Fix compile error
      
      * Fix tidy warnings
      
      * Formatting
      
      * Add layernorm function
      
      * Formatting
      56b3bf58
  7. 27 Nov, 2018 1 commit
  8. 14 Nov, 2018 1 commit
  9. 06 Nov, 2018 4 commits
  10. 18 Sep, 2018 1 commit
  11. 17 Sep, 2018 3 commits
  12. 30 Aug, 2018 1 commit
  13. 29 Aug, 2018 2 commits