1. 16 Feb, 2023 1 commit
  2. 31 Jan, 2023 1 commit
  3. 03 May, 2022 1 commit
  4. 17 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Reduce with runtime compilation (#1150) · f9a5b81e
      Paul Fultz II authored
      There is significant improvement on larger tensors with half almost 50% faster:
      
      lens: [1024, 384, 768]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
      gpu::reduce_sum[axes={2}]: 1.73126ms
      Also for non-trivial layouts this can sometimes be over 2x faster:
      
      lens: [64, 1024, 768, 4]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
      gpu::reduce_sum[axes={1}]: 2.63375ms
      Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.
      
      Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
      f9a5b81e
  5. 27 Sep, 2021 1 commit
  6. 23 Jun, 2021 1 commit
  7. 08 Dec, 2020 1 commit
    • Paul Fultz II's avatar
      Refactor to use make_op almost everywhere (#696) · 8d21fdc9
      Paul Fultz II authored
      * Load op when serializing
      
      * Formatting
      
      * Add missing clip field
      
      * Use make_op almost everywhere
      
      * Formatting
      
      * More make ops for rnns
      
      * Get rid of spaces
      
      * Formatting
      
      * Remove operators headers
      
      * Formatting
      
      * Remove unused op headers
      
      * Increase line threshold
      8d21fdc9
  8. 09 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Add hip compilation (#664) · f71af72a
      Paul Fultz II authored
      
      
      * Add compiler flags
      
      * Add missing include
      
      * Add filesystem header
      
      * Formatting
      
      * Add tmp_dir to run
      
      * Formatting
      
      * Kernel compilation and launching
      
      * Formatting
      
      * Seperate pack_args
      
      * Formatting
      
      * Add alignment tests
      
      * Formatting
      
      * Add compile test
      
      * Formatting
      
      * Complete compile test
      
      * Formatting
      
      * Use is_regular_file free function
      
      * Fix is_regular_file call
      
      * Fix tidy issues
      
      * Fix tidy
      
      * Fix tidy issue
      
      * Print size in read_buffer to debug issue on jenkins
      
      * Add hip flags before src file
      
      * Fix reading output files
      
      * Fix unsued variable warning
      
      * Formatting
      
      * Formatting
      
      * Disable tidy check
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      f71af72a
  9. 30 Sep, 2020 1 commit
    • Paul Fultz II's avatar
      Add hip clang builds to jenkins (#651) · f28a62ea
      Paul Fultz II authored
      * Make global variables const
      
      * Tidy fixes
      
      * Disable some lints
      
      * Formatting
      
      * Fix tidy const
      
      * Formatting
      
      * Add missing const keywords
      
      * Formatting
      
      * More fixes
      
      * Fix remaining tidy issues
      
      * Formatting
      
      * Fix rocblas function call
      
      * Formatting
      
      * Fix nodiscard warnings
      
      * Formatting
      
      * Use named parameters
      
      * Remove overload
      
      * Add overload
      
      * Remove noncps
      
      * Use named param for node
      
      * Add auto register header
      
      * Use named parameters
      
      * Refactor jenkinsfile
      
      * Fix shadow
      
      * Add missing body variable
      
      * Add more const methods
      
      * Add hip-clang docker builds
      
      * Remove comments
      
      * Add clang-format
      
      * Add more const
      
      * Formatting
      
      * Rename stage
      
      * Disable check
      
      * Add another const
      
      * Add python 2 dev packages
      
      * Add sphinx to dockerfile
      f28a62ea
  10. 18 Aug, 2020 1 commit
    • Paul Fultz II's avatar
      Register all operators in migraphx (#604) · e8be8548
      Paul Fultz II authored
      * Register ops for main migraphx
      
      * Formatting
      
      * Register cpu ops
      
      * Formatting
      
      * Show list of operators in the driver
      
      * Formatting
      
      * Simplify regiter
      
      * Try to register gpu ops
      
      * Fix compiler errors
      
      * Register rest of the gpu operators
      
      * Add some tests
      
      * Formatting
      
      * Fix gcc compiler warnings
      
      * Formatting
      
      * Fix tidy warnings
      
      * Fix compile error
      
      * Use correct op name
      
      * Register layer norm
      
      * Use const ref
      
      * Make run const
      e8be8548
  11. 02 May, 2019 1 commit
  12. 11 Dec, 2018 2 commits
  13. 28 Nov, 2018 1 commit
  14. 06 Nov, 2018 1 commit
  15. 12 Sep, 2018 1 commit
  16. 18 Aug, 2018 1 commit
  17. 02 Jul, 2018 1 commit
  18. 04 Jun, 2018 1 commit
  19. 23 Apr, 2018 5 commits