1. 06 Oct, 2023 1 commit
  2. 29 Sep, 2023 1 commit
  3. 13 Sep, 2023 1 commit
  4. 17 May, 2023 1 commit
  5. 30 Mar, 2023 1 commit
  6. 16 Feb, 2023 1 commit
  7. 31 Jan, 2023 1 commit
    • Umang Yadav's avatar
      hipRTC fixes (#1531) · 91cc7242
      Umang Yadav authored
      Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC.
      Added stages in Jenkins for hipRTC.
      Fixes for some of the pending issues from hipRTC.
      91cc7242
  8. 11 Dec, 2022 1 commit
    • Umang Yadav's avatar
      change target flag (#1488) · b41c1f01
      Umang Yadav authored
      HIP had change in previous rocm releases to use --offload-arch instead of --cuda-gpu-arch.
      
      This should be backwards compatbile. hipRTC also supports --offload-arch.
      b41c1f01
  9. 08 Jul, 2022 1 commit
  10. 22 Jun, 2022 1 commit
  11. 26 May, 2022 1 commit
  12. 09 May, 2022 1 commit
  13. 05 May, 2022 1 commit
    • Paul Fultz II's avatar
      Cppcheck fixes (#1195) · d582425b
      Paul Fultz II authored
      Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
      d582425b
  14. 17 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Reduce with runtime compilation (#1150) · f9a5b81e
      Paul Fultz II authored
      There is significant improvement on larger tensors with half almost 50% faster:
      
      lens: [1024, 384, 768]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
      gpu::reduce_sum[axes={2}]: 1.73126ms
      Also for non-trivial layouts this can sometimes be over 2x faster:
      
      lens: [64, 1024, 768, 4]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
      gpu::reduce_sum[axes={1}]: 2.63375ms
      Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.
      
      Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
      f9a5b81e
  15. 29 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
      Paul Fultz II authored
      This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.
      
      This also makes it easier to add new runtime compiled kernels in the future.
      661046c6
  16. 28 Mar, 2022 1 commit
  17. 03 Mar, 2022 1 commit
  18. 28 Oct, 2021 1 commit
  19. 10 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add option to compile with hiprtc (#892) · 91c9ebbc
      Paul Fultz II authored
      * Add hiprtc compile option
      * Add cross compile test
      * Update error reporting
      * Add tests for errors and warnings
      * Fix tidy warning
      * Add comment to ifdefs
      * Skip null character at end of log
      * Assert there is null at the end
      91c9ebbc
  20. 05 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666
      Paul Fultz II authored
      
      
      * Add method to compile pointwise
      
      * Formatting
      
      * Add lambda
      
      * Add semicolon
      
      * Rename variable
      
      * Add driver to run jit kernels
      
      * Formatting
      
      * Add context
      
      * Formatting
      
      * Make seperate driver folder
      
      * Add more general gpu driver
      
      * Formatting
      
      * Print out wll time
      
      * Formatting
      
      * Run multiple times and skip first run
      
      * Formatting
      
      * Seperate time_op
      
      * Run an op for comparison
      
      * Formatting
      
      * Add debug asserts
      
      * Formatting
      
      * Change parameer name
      
      * Formatting
      
      * Fix argument order
      
      * Formatting
      
      * Add preloading
      
      * Formatting
      
      * Allow a different data type
      
      * Formatting
      
      * Pipeline transformations
      
      * Formatting
      
      * Add vectorization
      
      * Formatting
      
      * Reduce dims
      
      * Formatting
      
      * Compile with launch params as constant
      
      * Formatting
      
      * Make sure buffer can be vecotrized
      
      * Formatting
      
      * Enable vectorization and preloading
      
      * Formatting
      
      * Add print header
      
      * Formatting
      
      * Avoid allocating to large of LDS
      
      * Formatting
      
      * Add some vec functions to a seperate header
      
      * Formatting
      
      * Add stride loops
      
      * Formatting
      
      * Improve the transform pipeline
      
      * Formatting
      
      * Add const
      
      * Fix shape check
      
      * Formatting
      
      * Just check stride axis is zero
      
      * Remove extra finc_vector_axis overload
      
      * Simplify some mroe functions
      
      * Formatting
      
      * Remove some more extra functions
      
      * Formatting
      
      * Simplify more decltypes
      
      * Add another const
      
      * Fix test
      
      * Get buffer pointer different for older compilers
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      29fa2666
  21. 19 Apr, 2021 1 commit
    • Paul Fultz II's avatar
      Add code generation for pointwise operators (#780) · 35d1bcc2
      Paul Fultz II authored
      * Add definitions for all pointwise operators
      
      * Formatting
      
      * Add cpp generator class
      
      * Formatting
      
      * Move compilation to core
      
      * Formatting
      
      * Add clock to tmp name
      
      * Add dynamic loader
      
      * Formatting
      
      * Add tests for code gen
      
      * Formatting
      
      * Add test for literals
      
      * Formatting
      
      * Use with_char
      
      * Add missing header
      
      * Fix mismerge
      
      * Ignore tidy warning
      
      * Fxx gcc 5 errors
      
      * Apply fixits
      
      * Skip signed bitwise of status
      
      * Remove unused parameters
      
      * Explicitly add c++14 flag
      
      * Fix tidy warning
      
      * Remove .o files
      35d1bcc2
  22. 26 Mar, 2021 1 commit
    • Paul Fultz II's avatar
      Add initial code generation (#762) · 581d31b0
      Paul Fultz II authored
      
      
      * Add code object op
      
      * Formattting
      
      * Add more value tests
      
      * Formatting
      
      * Fix from_value conversion from binary
      
      * Formatting
      
      * Dont use offload copy
      
      * Remove iostream header
      
      * Fix compilation errors
      
      * Formatting
      
      * Rename var
      
      * Add missing files
      
      * Formatting
      
      * Remove duplicate variable
      
      * Remove comment
      
      * Template the function so sfinae will work
      
      * Formatting
      
      * Use template specialization since ADL is broken on hcc
      
      * Formatting
      
      * Annotate the constructor with HD for hcc
      
      * Make variable const
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      581d31b0
  23. 18 Jan, 2021 1 commit
  24. 09 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Add hip compilation (#664) · f71af72a
      Paul Fultz II authored
      
      
      * Add compiler flags
      
      * Add missing include
      
      * Add filesystem header
      
      * Formatting
      
      * Add tmp_dir to run
      
      * Formatting
      
      * Kernel compilation and launching
      
      * Formatting
      
      * Seperate pack_args
      
      * Formatting
      
      * Add alignment tests
      
      * Formatting
      
      * Add compile test
      
      * Formatting
      
      * Complete compile test
      
      * Formatting
      
      * Use is_regular_file free function
      
      * Fix is_regular_file call
      
      * Fix tidy issues
      
      * Fix tidy
      
      * Fix tidy issue
      
      * Print size in read_buffer to debug issue on jenkins
      
      * Add hip flags before src file
      
      * Fix reading output files
      
      * Fix unsued variable warning
      
      * Formatting
      
      * Formatting
      
      * Disable tidy check
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      f71af72a