"...targets/git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "6cdd0c9ccfaf92bde0abb7dcdbe303e1574b194f"
- 16 Feb, 2023 1 commit
-
-
Umang Yadav authored
* deprecate HCC
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 28 Mar, 2022 1 commit
-
-
Paul Fultz II authored
* Use ccache for runtime compilation
-
- 06 Jul, 2021 1 commit
-
-
Paul Fultz II authored
* Improve handling of exceptions in test driver * Formatting * Auto print exception * Formatting * Fork each test case * Formatting * Exclude gcc 5 debug build * Fix tidy issues * Add color * Formatting * Create driver class * Formatting * Customize test_case names * Formatting * Report status from forked processes * Formatting * Update the verify driver * Formatting * Print out failed tests * Formatting * Fix tidy issues * Formatting * Expect passing * Improve failure reporting on non-linux systems * Fix ifdef * Flush code code cov * Formatting * Fix tidy * Check if weak symbols is linked * Formatting * Add continue flag * Formatting * Set exe name * Use stringstream and use quotes Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 19 Apr, 2021 1 commit
-
-
Paul Fultz II authored
* Add definitions for all pointwise operators * Formatting * Add cpp generator class * Formatting * Move compilation to core * Formatting * Add clock to tmp name * Add dynamic loader * Formatting * Add tests for code gen * Formatting * Add test for literals * Formatting * Use with_char * Add missing header * Fix mismerge * Ignore tidy warning * Fxx gcc 5 errors * Apply fixits * Skip signed bitwise of status * Remove unused parameters * Explicitly add c++14 flag * Fix tidy warning * Remove .o files
-