1. 04 Nov, 2019 1 commit
  2. 30 Oct, 2019 1 commit
    • Paul Fultz II's avatar
      Enable scheduler for 1 stream (#399) · ca17bcd6
      Paul Fultz II authored
      * Enable scheduler for 1 stream
      
      * Formatting
      
      * Improve performance of sorting
      
      * Formatting
      
      * Adjust the weight calculation
      
      * Formatting
      
      * Simplify formula
      
      * Formatting
      
      * Avoid division by zero
      
      * Fix scheduler test
      
      * Check for either 1 or 2
      
      * Check for waits when order may change
      
      * Formatting
      ca17bcd6
  3. 28 Oct, 2019 1 commit
  4. 25 Oct, 2019 2 commits
  5. 24 Oct, 2019 1 commit
  6. 21 Oct, 2019 1 commit
  7. 16 Oct, 2019 3 commits
  8. 15 Oct, 2019 2 commits
  9. 10 Oct, 2019 1 commit
  10. 09 Oct, 2019 1 commit
    • Paul Fultz II's avatar
      Fix bug in bert accuraccy (#385) · a797f890
      Paul Fultz II authored
      * Fix bug in bert accuraccy
      
      * Formatting
      
      * add another test
      
      * Fix add and overflow
      
      * Formatting
      
      * Fix bug in shape_for_each
      
      * Use front instead of iterator
      
      * Use result.front()
      
      * Split add_unary files
      
      * Formatting
      
      * Fix incorrect last index
      
      * Remove comment
      
      * Inline function
      
      * Fix carry check
      
      * Fix metadata errors
      
      * Formatting
      
      * Reflow
      
      * Reflow
      a797f890
  11. 07 Oct, 2019 1 commit
  12. 04 Oct, 2019 1 commit
    • kahmed10's avatar
      Add_clip fusion (#370) · 1398bcc1
      kahmed10 authored
      * initial testing of add_clip fusion
      
      * formatting
      
      * clipped relu fusion
      
      * formatting
      
      * remove some executables, add fusion test
      
      * formatting
      
      * remove clipped_relu code
      
      * fix clang-tidy
      
      * revert changes to cmake files
      
      * remove fusion from weight map
      
      * formatting
      
      * fix syntax error
      
      * formatting
      
      * fix syntax error
      
      * fix syntax error
      
      * formatting
      1398bcc1
  13. 03 Oct, 2019 2 commits
    • Shucai Xiao's avatar
      bug_fix_for_gemm_copy (#378) · 84a3f56e
      Shucai Xiao authored
      * fixed a bug related to removing gemm copy
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix unit test failure
      
      * fix review comments
      
      * clang format
      84a3f56e
    • Paul Fultz II's avatar
      Improve contiguous and concat performance (#368) · 9b55685c
      Paul Fultz II authored
      * Add env to trace nary device functions
      
      * Formatting
      
      * Improve contiguous and concat performance
      
      * Formatting
      
      * Remove unused variable
      
      * Formatting
      
      * Fix gpu tests
      
      * Formatting
      
      * Add more test for transposed concat
      
      * Formatting
      
      * Compute offset and not index
      
      * Compute multi-index once
      
      * Formatting
      
      * Fix transposed inputs
      
      * Formatting
      
      * Use product order for comparisons of hip_array
      
      * Formatting
      
      * Add missing s parameter
      
      * Formatting
      
      * Dont invert permutation
      
      * Fix tidy warnings
      
      * Formatting
      
      * Remove incorrect license
      
      * Use a single integer for stride
      
      * Formatting
      
      * Fix tidy issue
      9b55685c
  14. 27 Sep, 2019 1 commit
    • Shucai Xiao's avatar
      Ceil floor operators (#375) · 7d06cdbd
      Shucai Xiao authored
      * add two operators ceil and floor
      
      * clang format
      
      * add unit test for the ceil and floor operators
      
      * remove unintended code
      7d06cdbd
  15. 26 Sep, 2019 1 commit
  16. 25 Sep, 2019 1 commit
    • Shucai Xiao's avatar
      Reduce_min/max operators (#363) · 3962c2ad
      Shucai Xiao authored
      * first version of refactoring reduce operators.
      
      * clang format
      
      * refactor the gpu implemantation of the reduce_mean operator
      
      * clang format
      
      * refactor gpu implementation of the resuce_sum operator
      
      * fix cpp check error
      
      * fix cppcheck error
      
      * fix cppcheck error
      
      * fix review comments
      
      * clang format
      
      * fix a jenkin error
      
      * fixed review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * add implemenation of reduce_min and reduce_max
      
      * clang format
      
      * add unit test for reduce_min/max operator
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * fix review comments
      3962c2ad
  17. 19 Sep, 2019 1 commit
  18. 18 Sep, 2019 1 commit
    • Shucai Xiao's avatar
      Remove gemm copy and simplify rocblas call (#356) · a0f9b785
      Shucai Xiao authored
      * Remove extra copy in gemm
      
      * combine rocblas gemm call
      
      * clang format
      
      * fix a bug in calling rocblas function
      
      * clang format'
      
      * backup of temporary changes
      
      * clang format
      
      * unify the gemm call to avoid multiple gpu implemantation
      
      * clang format
      
      * remove unnecessary code
      
      * backup temp changes
      
      * clang format
      
      * fix cppcheck error
      
      * code backup
      
      * clang format
      
      * remove unnecessary synchronization function
      
      * clang format
      
      * fix bugs
      
      * clang format
      
      * more optimization related to gemm
      
      * clang format
      
      * code cleanup
      
      * implementation that can achieves better performance
      
      * clang format
      
      * temp changes to try performance
      
      * clang format
      
      * revert to previous commits
      
      * fixed review comments
      
      * clang format
      
      * fix review comments
      a0f9b785
  19. 16 Sep, 2019 3 commits
    • Paul Fultz II's avatar
      Add flags to driver to run quantization (#361) · f445d962
      Paul Fultz II authored
      * Add flags to quantize in driver
      
      * Formatting
      
      * Fix compile error
      f445d962
    • kahmed10's avatar
      Add fusions for sigmoid and tanh (#354) · ef5e7ce0
      kahmed10 authored
      * add tests, fix bug in ternary op
      
      * formatting
      
      * uncomment fusion
      ef5e7ce0
    • Shucai Xiao's avatar
      Refactor reduce ops (#350) · 307c40cd
      Shucai Xiao authored
      * first version of refactoring reduce operators.
      
      * clang format
      
      * refactor the gpu implemantation of the reduce_mean operator
      
      * clang format
      
      * refactor gpu implementation of the resuce_sum operator
      
      * fix cpp check error
      
      * fix cppcheck error
      
      * fix cppcheck error
      
      * fix review comments
      
      * clang format
      
      * fix a jenkin error
      
      * fixed review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      307c40cd
  20. 04 Sep, 2019 4 commits
  21. 03 Sep, 2019 6 commits
  22. 01 Sep, 2019 2 commits
  23. 31 Aug, 2019 1 commit
  24. 30 Aug, 2019 1 commit