- 15 Sep, 2022 1 commit
-
-
Lixun Zhang authored
* Replaced `find_library` with `find_package` to locate MLIR static library * Unified the include dir for headers and remove backward compatibility * Embedded the external/include dir into the exported library
-
- 02 Aug, 2022 1 commit
-
-
jungpark-mlir authored
-
- 03 Jul, 2022 1 commit
-
-
Paul Fultz II authored
* Add mlir c api * Formatting * Create a type attribute * Formatting * Parse module * Formatting * Add mlir dump function * Add test case * Formatting * Fix tidy issues * Update mlit version * Update to newer mlir * Format * Move mlir to the gpu and update the test * Formatting * Fix bug when appending module * Format * Remove old cmake flag * Update message * Add return * Format * Add mlir_compile * Format * Register dialect * Handle unsinged integers * Dont provide output for return instruction * Format * Add code to insert memrefs * Format * Add mlir verification * Formatting * Enable pointwise_fusion * Disable eliminate_data_type * Set kernal name * Format * Fix device name * Formatting * Fix output arg * Format * Updates * Upate hash * Add fuse_mlir pass * Format * Add fuse mlir * Format * Update mlir * Sort parameter names * Format * Reenable disabled passes * Remove old mlir conv * Remove asym default padding * Add more verbose tracing * Format * Fix compilation errors * Format * Whitelist operators * Format * Add namespace * Format * Update triple * Format * Use func dialect * Format * Use func.return * Format * Upgrade mlir version * Add comment * Handle symetrical padding * Format * Cleanup debug output * Format * List failed tests * Move mlir compile to jit pipeline * Format * Update version * Add source locations * Format * Correctly add module * Format * Update failed tests * Fix failures when mlir is disabled * Format * Update mlir version * Check type for fp32 * Format * Remove failed test * Update mlir in driver * Tidy fixes * Foramt * Tidy fixes * Format * Fix const * Remove from requirements * Fix cmake version * Fix tidy warning * Use another ifdef * Fix tidy * Other tidy fix * Format * Update hash * Add missing license files * Format * Format * Fix fnction name
-
- 23 Jun, 2022 1 commit
-
-
kahmed10 authored
* remove eliminate workspace * remove sync device and other tags
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 10 Jun, 2022 1 commit
-
-
Paul Fultz II authored
Consolidate the vectorize and preload Add vectorization to reduction Co-authored-by:kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 11 May, 2022 1 commit
-
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
- 10 May, 2022 1 commit
-
-
Paul authored
-
- 06 May, 2022 1 commit
-
-
Chris Austen authored
Move to CI containers to rocm 5.0.2 upgrade to 20.04 free up some more file space in github action environments
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 29 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape. This also makes it easier to add new runtime compiled kernels in the future.
-
- 28 Mar, 2022 1 commit
-
-
Paul Fultz II authored
* Use ccache for runtime compilation
-
- 03 Mar, 2022 1 commit
-
-
turneram authored
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
-
- 24 Feb, 2022 1 commit
-
-
Paul Fultz II authored
Make doc/CMakeLists.txt standalone Switch to use rocm-cmake modules for document generation Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run Add STRINGS property for build type to make it easier to switch build types with ccmake Various fixes and improvements
-
- 21 Jan, 2022 1 commit
-
-
Paul Fultz II authored
* Improve handling of generator expressions when getting the flags for hip
-
- 24 Nov, 2021 1 commit
-
-
Paul Fultz II authored
* Check jit kernels files with clang-tidy
-
- 11 Nov, 2021 1 commit
-
-
Paul Fultz II authored
This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored. This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
-
- 28 Oct, 2021 1 commit
-
-
Shucai Xiao authored
GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
-
- 08 Oct, 2021 1 commit
-
-
Shucai Xiao authored
This PR is for the nonzero operator with static output shape. Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 01 Oct, 2021 1 commit
-
-
turneram authored
Add multinomial op to onnx parser with ref and GPU implementations. The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution. Resolves #821 Co-authored-by:Shucai Xiao <shucai@gmail.com>
-
- 16 Sep, 2021 1 commit
-
-
Shucai Xiao authored
Add Loop operator for opset version 13. Notes: 1) Default max iteration number is 10 if no max iteration number is provided 2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model. 3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later. Co-authored-by:
Paul <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 02 Sep, 2021 2 commits
-
-
turneram authored
Implement the Where operator for the CPU and GPU. This is for better performance.
-
Shucai Xiao authored
* add topk operator doe ref, cpu and gpu * Hash modules for quicker lookup of modules * add onnx unit test * add unit tests for the topk operator Co-authored-by:
Paul <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 01 Sep, 2021 1 commit
-
-
Chris Austen authored
In ROCm 4.5.0 hip compile flags are coming in differently. This has caused some parsing issues for the HIP_COMPILER_FLAGS variable. As an example ROCm 4.3.0: --offload-arch=gfx900 ROCm 4.5.0: <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900> Using existing code... $<$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900> Becomes... $<$<COMPILE_LANGUAGE:CXX>:SHELL: There are two problems with that. 1) The "<" is not balanced with a "> due to the regex consuming the ">" 2) There is still a `SHELL:` label. This commit repairs both. I took the regex parsing code from ROCmSoftwarePlatform/MIOpen/blame/develop/CMakeLists.txt but improved it to support handling of target features like <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900:xxx+> Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 10 Aug, 2021 1 commit
-
-
Paul Fultz II authored
* Add hiprtc compile option * Add cross compile test * Update error reporting * Add tests for errors and warnings * Fix tidy warning * Add comment to ifdefs * Skip null character at end of log * Assert there is null at the end
-
- 05 Aug, 2021 1 commit
-
-
Paul Fultz II authored
* Add method to compile pointwise * Formatting * Add lambda * Add semicolon * Rename variable * Add driver to run jit kernels * Formatting * Add context * Formatting * Make seperate driver folder * Add more general gpu driver * Formatting * Print out wll time * Formatting * Run multiple times and skip first run * Formatting * Seperate time_op * Run an op for comparison * Formatting * Add debug asserts * Formatting * Change parameer name * Formatting * Fix argument order * Formatting * Add preloading * Formatting * Allow a different data type * Formatting * Pipeline transformations * Formatting * Add vectorization * Formatting * Reduce dims * Formatting * Compile with launch params as constant * Formatting * Make sure buffer can be vecotrized * Formatting * Enable vectorization and preloading * Formatting * Add print header * Formatting * Avoid allocating to large of LDS * Formatting * Add some vec functions to a seperate header * Formatting * Add stride loops * Formatting * Improve the transform pipeline * Formatting * Add const * Fix shape check * Formatting * Just check stride axis is zero * Remove extra finc_vector_axis overload * Simplify some mroe functions * Formatting * Remove some more extra functions * Formatting * Simplify more decltypes * Add another const * Fix test * Get buffer pointer different for older compilers Co-authored-by:
Shucai Xiao <shucai@gmail.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 14 Jul, 2021 1 commit
-
-
Paul Fultz II authored
* Unify device_name function * Formatting Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 08 Jul, 2021 2 commits
-
-
Paul Fultz II authored
* Add initial scan operator * Formatting * Fix with a working test * Fix bugs * Formatting * Formatting * Simplify * Formatting * Use non-power of 2 for test * Make pointer Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
Paul Fultz II authored
* Add preallocate method * Add preallocate_param pass * Preallocate buffers on the cpu * Formatting * Preallocate on the gpu * Add missing cpp file * Formatting * Add lifetime function * Formatting * Always allocate * Fix tidy warning * Add const * Add missing lifetime annotations Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 25 Jun, 2021 1 commit
-
-
Shucai Xiao authored
-
- 08 Jun, 2021 1 commit
-
-
Cagri Eryilmaz authored
* init reverseOp branch: ref op + ref test. WIP * first passing basic test * cleanup * additional axis implementation * additional test * ref op implementation vec to int for axis * ref op test change for axis * initial gpu files and test * updates to implementation and test * fixed some issues * clang format * cleanup * formatting * removing comments * remove local size, back to default * update tests: replace with std functions * multiple axis for reverse op * fix a build error * clang format * more tests * fix a bug for the reverse device function * clang format * fix a bug * clang format * ref test updates, multiaxis * formatting Co-authored-by:
Shucai Xiao <Shucai.Xiao@amd.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 29 Apr, 2021 1 commit
-
-
SJW authored
* MLIR MIOpen Dialect integration (phase 1) (#768) * Added Findmlir.cmake (using environment variables to import) * Added mlir_conv pass to GPU target * Apply to any gpu::convolution if supported by MLIR * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution * Capture binary in dictionary for matching convolutions * Build a code_object_op with the binary and execution dimensions * Substitute for the gpu::convolution * Changed the parameters for the code_object to reflect the generated MLIR kernel * Expanded out MemRefDescriptor fields in param list * Also updated for MLIR C-API changes * * fixed global_size calculation * MLIR MIOpen Dialect integration (phase 1) (#768) * Added Findmlir.cmake (using environment variables to import) * Added mlir_conv pass to GPU target * Apply to any gpu::convolution if supported by MLIR * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution * Capture binary in dictionary for matching convolutions * Build a code_object_op with the binary and execution dimensions * Substitute for the gpu::convolution * Changed the parameters for the code_object to reflect the generated MLIR kernel * Expanded out MemRefDescriptor fields in param list * Also updated for MLIR C-API changes * * Added command line option: --enable_mlir * * fixed command line switch * updated for new MLIR API changes * * Added cget llvm-project-mlir to import MIIR API libraries into Dockerfile * removed cmake Findmlir * updated for changes in MIIR C-API * * updated CMakeLists.txt to allow disable of MLIR import * fixed memory leaks and removed copies * updated for 5D memrefs * * formatting * * fixed review comments * * fixed merge issues * hip gcnDeviceName now includes specifiers at the end * use major/minor values instead * * disable MLIR by default * * removed command-line switch --enable-mlir * * fix unused when MLIR disabled * * enable jenkins enable/test MLIR * * format * * fixed clang-tidy * * added new type Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 27 Apr, 2021 1 commit
-
-
Paul Fultz II authored
-
- 09 Apr, 2021 1 commit
-
-
Paul Fultz II authored
* Fix tidy warnings for 4.1 * Formatting * Upgrade to 4.1 in docker * Remove hcc build and enable ubsan on clang debug * Add missing openmp package * Construct directly * Construct directly * Upgrade rocm-cmake version
-
- 05 Apr, 2021 1 commit
-
-
Shucai Xiao authored
* code cleanup * clang format * backup code * clang format * remove unnecessary code * clang format * add module print function * code backup * refine the module::print function * refine the module:to_value() function * code backup * backup code changes * code backup * remove to_value and from_value function from the module class * rename a function * rename the if operator * refine the if operator * refine the print function of module and program * code backup * code backup * fix a build warning * fix overload of compute_shape function * code backup * fix unit test error * fix cppcheck error * fix the issue related to the overload of compute_shape * fix review comments * fix cppcheck error * change the return name of if_op to be if * clang format * fix two unit tests * clang format * rename variables * clang format * remove the unused compute_op function * clang format * add lowering of if operator and compute_op function * clang format * add parsing if operator in onnx file * clang format * fix clang tidy format * clang format * add the gpu implementation of the if operator * enhance the validate function and uncomment a unit test * clang format * remove unnecessary code * add sub_module processing in ref passes * clang format * clang format * fix a hang issue related to the valid function * fix an issue in replace_refs * clang format * fix review comments * clang format * fix cppcheck error * clang format * add a unit test for more code coverage * clang format * fix review comments and add test for more code coverage * clang format * fix cppcheck error * clang format * fix cppcheck error * fix a cppcheck error * clang format * backup code * clang format * fix cppcheck error * clang format * some code refinement * clang format * code backup to handle submodules in module compilation * clang format * code backup * clang format * code backup * clang format * fix a bug related to literal id * fix a bug in gpu execution * change the way of compiling a graph * clang format * backup more changes * clang format * refine pass log information * remove unnecessary code * clang format * temp changes backup * clang format * add module name prefix to scratch memory id in hip_memory_allocation * clang format * change to copy the cond input by inserting a copy instruction * clang format * change to use the if output argument as the submodule output so can remove a gpu_copy * clang format * consider submodule in some compile passes * clang format * fix review comments * clang format * fix issues related to scratch memory * clang format * remove unnecessary code * fix cppcheck error * clang format * reslove the implicit dependencies issue related to submodule * clang format * fix cppcheck error * clang format * backup temp changes * clang format * fixed an bug in the has_instruction function * clang format * fix the return value of the gpu implementation of the if operator * fix a bug in the compute_shape function in the gpu implementation * add an if onnx unit test * clang format * add more unit tests * clang format * tmp code backup * clang format * fix a sync problem related to copy cond argument from gpu to cpu * clang format * change the compile offload copy flag setting * clang format * enable copy from cpu to be able to do synchronous copy * clang format * add more unit tests * add more unit tests * add more ref unit tests * clang format * fixed a bug error * tmp code backup * clang format * fixed an onnx verify unit test * add more unit tests * clang format * reverse a change * fix cppcheck error * fix cppcheck error * fix to print all instructions in program execution * clang format * fix bugs related to memory coloring and offload copy to be true * clang format * remove unnecessary include header file * sort test cases in ref_cpu_ops alphabetically * clang format * add a flag to disable cpu target in verification test * change the way to disable some tests * clang format * disable verify unit test of the if operators * add a function call to have more code coverage * fix a build error * fix review comments * fix review comments * clang format * add a api gpu unit test for more code coverage * clang format * change to use instruction.size() as node index * move the calc_implicit_deps function to module class as a member function * clang format * move the offload_copy flag setting to lowering * clang format * assign the module_eval lambda function to a variable to simplify code * clang format * move the compute function from ref/gpu implementation to the main if operator * clang format * fix cpp check error * add a unit test for more code coverage * clang format * add unit test to calculate implicit deps * add a python unit test * clang format * refine a unit test to have more code coverage * clang format * chang the way of wrap up arguments for sub modules * clang format * fix some build errors * code cleanup * refine unit tests to have more code coverage * clang format * refine unit test to have more code coverage * code backup * clang format * add memory coloring test * refine memory coloring unit test * clang format * remove an unnecessary line * remove an unused line * remove an unnecessary parameter in the lambda function * clang format * refine a unit test * remove an unnecessary line * refine unit tests to have more code coverage * clang format * combine two lines * add one more unit test for more code coverage * clang format * add one more unit test * clang format * fix review comments * refine a print out information * fix review comments * clang format * change the sync copy to using a gpu device sync * clang format * remove unnecessary code Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 26 Mar, 2021 1 commit
-
-
Paul Fultz II authored
* Add code object op * Formattting * Add more value tests * Formatting * Fix from_value conversion from binary * Formatting * Dont use offload copy * Remove iostream header * Fix compilation errors * Formatting * Rename var * Add missing files * Formatting * Remove duplicate variable * Remove comment * Template the function so sfinae will work * Formatting * Use template specialization since ADL is broken on hcc * Formatting * Annotate the constructor with HD for hcc * Make variable const Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 26 Feb, 2021 1 commit
-
-
Cagri Eryilmaz authored
* changes for not operator * changed name of the op from unary_not to not * Added tests for op and onnx parsing * reordering not_test in onnx_test.cpp * not operator -- gpu implementation * added bool test for not operator * Added test and missing links for not operator on GPU * typo fix * adding .onnx test files for not operator * formatting Co-authored-by:
Shucai Xiao <shucai@gmail.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 25 Feb, 2021 1 commit
-
-
Paul Fultz II authored
* Add code object op * Formattting * Add more value tests * Formatting * Fix from_value conversion from binary * Formatting * Dont use offload copy * Remove iostream header Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 19 Jan, 2021 1 commit
-
-
Shucai Xiao authored
* add the and operator * clang format * add unit tests for the and operator * clang format * change the and name to logical_and and add the logical_or, logical_xor * clang format * add onnx unit tests for or and xor * add more unit tests
-
- 07 Jan, 2021 1 commit
-
-
Paul Fultz II authored
-