- 16 Feb, 2023 1 commit
-
-
Umang Yadav authored
* deprecate HCC
-
- 31 Jan, 2023 1 commit
-
-
Chris Austen authored
upgrade to ROCm 5.4.2 in CI
-
- 03 May, 2022 1 commit
-
-
Paul Fultz II authored
Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 27 Sep, 2021 1 commit
-
-
kahmed10 authored
Checks wavefront size, then changes implementation and number of threads for DPP reduce
-
- 23 Jun, 2021 1 commit
-
-
Paul Fultz II authored
-
- 08 Dec, 2020 1 commit
-
-
Paul Fultz II authored
* Load op when serializing * Formatting * Add missing clip field * Use make_op almost everywhere * Formatting * More make ops for rnns * Get rid of spaces * Formatting * Remove operators headers * Formatting * Remove unused op headers * Increase line threshold
-
- 09 Nov, 2020 1 commit
-
-
Paul Fultz II authored
* Add compiler flags * Add missing include * Add filesystem header * Formatting * Add tmp_dir to run * Formatting * Kernel compilation and launching * Formatting * Seperate pack_args * Formatting * Add alignment tests * Formatting * Add compile test * Formatting * Complete compile test * Formatting * Use is_regular_file free function * Fix is_regular_file call * Fix tidy issues * Fix tidy * Fix tidy issue * Print size in read_buffer to debug issue on jenkins * Add hip flags before src file * Fix reading output files * Fix unsued variable warning * Formatting * Formatting * Disable tidy check Co-authored-by:
Shucai Xiao <shucai.xiao@amd.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 30 Sep, 2020 1 commit
-
-
Paul Fultz II authored
* Make global variables const * Tidy fixes * Disable some lints * Formatting * Fix tidy const * Formatting * Add missing const keywords * Formatting * More fixes * Fix remaining tidy issues * Formatting * Fix rocblas function call * Formatting * Fix nodiscard warnings * Formatting * Use named parameters * Remove overload * Add overload * Remove noncps * Use named param for node * Add auto register header * Use named parameters * Refactor jenkinsfile * Fix shadow * Add missing body variable * Add more const methods * Add hip-clang docker builds * Remove comments * Add clang-format * Add more const * Formatting * Rename stage * Disable check * Add another const * Add python 2 dev packages * Add sphinx to dockerfile
-
- 18 Aug, 2020 1 commit
-
-
Paul Fultz II authored
* Register ops for main migraphx * Formatting * Register cpu ops * Formatting * Show list of operators in the driver * Formatting * Simplify regiter * Try to register gpu ops * Fix compiler errors * Register rest of the gpu operators * Add some tests * Formatting * Fix gcc compiler warnings * Formatting * Fix tidy warnings * Fix compile error * Use correct op name * Register layer norm * Use const ref * Make run const
-
- 02 May, 2019 1 commit
-
-
Paul authored
-
- 11 Dec, 2018 2 commits
- 28 Nov, 2018 1 commit
-
-
Paul authored
-
- 06 Nov, 2018 1 commit
-
-
Shucai Xiao authored
-
- 12 Sep, 2018 1 commit
-
-
Paul authored
-
- 18 Aug, 2018 1 commit
-
-
Paul authored
-
- 02 Jul, 2018 1 commit
-
-
Paul authored
-
- 04 Jun, 2018 1 commit
-
-
Paul authored
-
- 23 Apr, 2018 5 commits