1. 29 Jul, 2023 1 commit
  2. 21 Jul, 2023 1 commit
    • Umang Yadav's avatar
      Make global workitems multiple of local workitems (#1976) · 3216fe52
      Umang Yadav authored
      HIP requires global work items in multiple of local work items. If it is not it is not guaranteed to generate correct results all the time.
      Fixes #1977
      Fixes #1644
      MIGraphX CI has moved to rocm-5.6 which doesn't require hipRTC workarounds
      3216fe52
  3. 08 Jun, 2023 1 commit
  4. 17 May, 2023 1 commit
  5. 31 Jan, 2023 1 commit
    • Umang Yadav's avatar
      hipRTC fixes (#1531) · 91cc7242
      Umang Yadav authored
      Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC.
      Added stages in Jenkins for hipRTC.
      Fixes for some of the pending issues from hipRTC.
      91cc7242
  6. 28 Oct, 2022 1 commit
  7. 19 Sep, 2022 1 commit
    • Paul Fultz II's avatar
      Improve layernorm and reductions performance (#1348) · 97a1ed2d
      Paul Fultz II authored
      Compute mean and variance in same reduction
      Set block size to numbers divisible by 32 instead powers of 2
      Global is also set exactly instead of being divisible by block size
      More exact matching of global/local can help get rid of branching/loops
      Reduce vectors first before doing dpp_reduce
      Explicitly vectorize array operators since the compiler doesnt always vectorize them
      Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
      97a1ed2d
  8. 11 Jul, 2022 1 commit
  9. 22 Jun, 2022 1 commit
  10. 10 Jun, 2022 1 commit
  11. 29 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
      Paul Fultz II authored
      This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.
      
      This also makes it easier to add new runtime compiled kernels in the future.
      661046c6
  12. 28 Jan, 2022 1 commit
  13. 07 Dec, 2021 1 commit
  14. 19 Aug, 2021 1 commit
  15. 10 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add option to compile with hiprtc (#892) · 91c9ebbc
      Paul Fultz II authored
      * Add hiprtc compile option
      * Add cross compile test
      * Update error reporting
      * Add tests for errors and warnings
      * Fix tidy warning
      * Add comment to ifdefs
      * Skip null character at end of log
      * Assert there is null at the end
      91c9ebbc
  16. 05 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666
      Paul Fultz II authored
      
      
      * Add method to compile pointwise
      
      * Formatting
      
      * Add lambda
      
      * Add semicolon
      
      * Rename variable
      
      * Add driver to run jit kernels
      
      * Formatting
      
      * Add context
      
      * Formatting
      
      * Make seperate driver folder
      
      * Add more general gpu driver
      
      * Formatting
      
      * Print out wll time
      
      * Formatting
      
      * Run multiple times and skip first run
      
      * Formatting
      
      * Seperate time_op
      
      * Run an op for comparison
      
      * Formatting
      
      * Add debug asserts
      
      * Formatting
      
      * Change parameer name
      
      * Formatting
      
      * Fix argument order
      
      * Formatting
      
      * Add preloading
      
      * Formatting
      
      * Allow a different data type
      
      * Formatting
      
      * Pipeline transformations
      
      * Formatting
      
      * Add vectorization
      
      * Formatting
      
      * Reduce dims
      
      * Formatting
      
      * Compile with launch params as constant
      
      * Formatting
      
      * Make sure buffer can be vecotrized
      
      * Formatting
      
      * Enable vectorization and preloading
      
      * Formatting
      
      * Add print header
      
      * Formatting
      
      * Avoid allocating to large of LDS
      
      * Formatting
      
      * Add some vec functions to a seperate header
      
      * Formatting
      
      * Add stride loops
      
      * Formatting
      
      * Improve the transform pipeline
      
      * Formatting
      
      * Add const
      
      * Fix shape check
      
      * Formatting
      
      * Just check stride axis is zero
      
      * Remove extra finc_vector_axis overload
      
      * Simplify some mroe functions
      
      * Formatting
      
      * Remove some more extra functions
      
      * Formatting
      
      * Simplify more decltypes
      
      * Add another const
      
      * Fix test
      
      * Get buffer pointer different for older compilers
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      29fa2666
  17. 14 Jul, 2021 1 commit
  18. 27 Apr, 2021 1 commit
    • Paul Fultz II's avatar
      Add tuple type to shape (#800) · 66aa4cc8
      Paul Fultz II authored
      
      
      * Add definitions for all pointwise operators
      
      * Formatting
      
      * Add cpp generator class
      
      * Formatting
      
      * Move compilation to core
      
      * Formatting
      
      * Add clock to tmp name
      
      * Add dynamic loader
      
      * Formatting
      
      * Add tests for code gen
      
      * Formatting
      
      * Add test for literals
      
      * Formatting
      
      * Use with_char
      
      * Add missing header
      
      * Fix mismerge
      
      * Ignore tidy warning
      
      * Fxx gcc 5 errors
      
      * Apply fixits
      
      * Skip signed bitwise of status
      
      * Remove unused parameters
      
      * Explicitly add c++14 flag
      
      * Fix tidy warning
      
      * Add tuple type to shape class
      
      * Formatting
      
      * Make data member private
      
      * Formatting
      
      * Add sub arguments
      
      * Formatting
      
      * Trun clang format off
      
      * Disable clang-format
      
      * Improve visiting tuples
      
      * Formatting
      
      * Add more argument tests
      
      * Formatting
      
      * Handle tuple in load
      
      * Formatting
      
      * Remove .o files
      
      * Add tuple type to api
      
      * Formatting
      
      * Fix tidy warnings
      
      * Fix tidy warnings
      
      * Add a test for share method
      
      * Formatting
      
      * Add a test cpp_type
      
      * Suppress tidy warning
      Co-authored-by: default avatarShucai Xiao <Shucai.Xiao@amd.com>
      66aa4cc8
  19. 26 Mar, 2021 1 commit
    • Paul Fultz II's avatar
      Add initial code generation (#762) · 581d31b0
      Paul Fultz II authored
      
      
      * Add code object op
      
      * Formattting
      
      * Add more value tests
      
      * Formatting
      
      * Fix from_value conversion from binary
      
      * Formatting
      
      * Dont use offload copy
      
      * Remove iostream header
      
      * Fix compilation errors
      
      * Formatting
      
      * Rename var
      
      * Add missing files
      
      * Formatting
      
      * Remove duplicate variable
      
      * Remove comment
      
      * Template the function so sfinae will work
      
      * Formatting
      
      * Use template specialization since ADL is broken on hcc
      
      * Formatting
      
      * Annotate the constructor with HD for hcc
      
      * Make variable const
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      581d31b0