- 16 Feb, 2023 1 commit
-
-
Umang Yadav authored
* deprecate HCC
-
- 10 Feb, 2023 1 commit
-
-
Umang Yadav authored
-
- 31 Jan, 2023 1 commit
-
-
Umang Yadav authored
Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC. Added stages in Jenkins for hipRTC. Fixes for some of the pending issues from hipRTC.
-
- 20 Nov, 2022 1 commit
-
-
Paul Fultz II authored
-
- 18 Nov, 2022 1 commit
-
-
Umang Yadav authored
Disabling it untill int8 fix is in mainline from MIOpen and also so that QA tests could run migraphx-driver and unittests from MIGraphX.
-
- 02 Nov, 2022 1 commit
-
-
Paul Fultz II authored
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
-
- 19 Oct, 2022 2 commits
-
-
Charlie Lin authored
Refactor dynamic compute - add a compute_output_shape object that implicitly converts to a new dyn_output or shape object - dyn_output object can handle computing the static output shape of an operator given the input arguments shapes change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to use dyn_output object Dynamic ref unary functions - Included these changes to have an example of the refactored dynamic compute being used - Changes to unary base class to handle dynamic shapes - Changed elu and leaky_relu to use unary base class and pointwise JIT
-
Umang Yadav authored
* use find2.0 for the convolution Co-authored-by:
Vasilii Filippov <DrizztDoUrden@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 13 Oct, 2022 1 commit
-
-
Charlie Lin authored
Rewrites the TF batch norm like operators to other MIGX operators Removes the code related to batch_norm_inference
-
- 06 Oct, 2022 1 commit
-
-
charlie authored
-
- 29 Sep, 2022 1 commit
-
-
Umang Yadav authored
Improvements/Additions to be made: changes for the quant_convolution, changes for the deconvolution, Macros for MIOpen status checks
-
- 23 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Remove device functions * Update tests
-
- 15 Sep, 2022 1 commit
-
-
Lixun Zhang authored
* Replaced `find_library` with `find_package` to locate MLIR static library * Unified the include dir for headers and remove backward compatibility * Embedded the external/include dir into the exported library
-
- 02 Aug, 2022 1 commit
-
-
jungpark-mlir authored
-
- 03 Jul, 2022 1 commit
-
-
Paul Fultz II authored
* Add mlir c api * Formatting * Create a type attribute * Formatting * Parse module * Formatting * Add mlir dump function * Add test case * Formatting * Fix tidy issues * Update mlit version * Update to newer mlir * Format * Move mlir to the gpu and update the test * Formatting * Fix bug when appending module * Format * Remove old cmake flag * Update message * Add return * Format * Add mlir_compile * Format * Register dialect * Handle unsinged integers * Dont provide output for return instruction * Format * Add code to insert memrefs * Format * Add mlir verification * Formatting * Enable pointwise_fusion * Disable eliminate_data_type * Set kernal name * Format * Fix device name * Formatting * Fix output arg * Format * Updates * Upate hash * Add fuse_mlir pass * Format * Add fuse mlir * Format * Update mlir * Sort parameter names * Format * Reenable disabled passes * Remove old mlir conv * Remove asym default padding * Add more verbose tracing * Format * Fix compilation errors * Format * Whitelist operators * Format * Add namespace * Format * Update triple * Format * Use func dialect * Format * Use func.return * Format * Upgrade mlir version * Add comment * Handle symetrical padding * Format * Cleanup debug output * Format * List failed tests * Move mlir compile to jit pipeline * Format * Update version * Add source locations * Format * Correctly add module * Format * Update failed tests * Fix failures when mlir is disabled * Format * Update mlir version * Check type for fp32 * Format * Remove failed test * Update mlir in driver * Tidy fixes * Foramt * Tidy fixes * Format * Fix const * Remove from requirements * Fix cmake version * Fix tidy warning * Use another ifdef * Fix tidy * Other tidy fix * Format * Update hash * Add missing license files * Format * Format * Fix fnction name
-
- 23 Jun, 2022 1 commit
-
-
kahmed10 authored
* remove eliminate workspace * remove sync device and other tags
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 10 Jun, 2022 1 commit
-
-
Paul Fultz II authored
Consolidate the vectorize and preload Add vectorization to reduction Co-authored-by:kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 11 May, 2022 1 commit
-
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
- 06 May, 2022 1 commit
-
-
Chris Austen authored
Move to CI containers to rocm 5.0.2 upgrade to 20.04 free up some more file space in github action environments
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 29 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape. This also makes it easier to add new runtime compiled kernels in the future.
-
- 28 Mar, 2022 1 commit
-
-
Paul Fultz II authored
* Use ccache for runtime compilation
-
- 03 Mar, 2022 1 commit
-
-
turneram authored
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
-
- 24 Feb, 2022 1 commit
-
-
Paul Fultz II authored
Make doc/CMakeLists.txt standalone Switch to use rocm-cmake modules for document generation Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run Add STRINGS property for build type to make it easier to switch build types with ccmake Various fixes and improvements
-
- 21 Jan, 2022 1 commit
-
-
Paul Fultz II authored
* Improve handling of generator expressions when getting the flags for hip
-
- 24 Nov, 2021 1 commit
-
-
Paul Fultz II authored
* Check jit kernels files with clang-tidy
-
- 11 Nov, 2021 1 commit
-
-
Paul Fultz II authored
This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored. This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
-
- 28 Oct, 2021 1 commit
-
-
Shucai Xiao authored
GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
-
- 08 Oct, 2021 1 commit
-
-
Shucai Xiao authored
This PR is for the nonzero operator with static output shape. Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 01 Oct, 2021 1 commit
-
-
turneram authored
Add multinomial op to onnx parser with ref and GPU implementations. The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution. Resolves #821 Co-authored-by:Shucai Xiao <shucai@gmail.com>
-
- 16 Sep, 2021 1 commit
-
-
Shucai Xiao authored
Add Loop operator for opset version 13. Notes: 1) Default max iteration number is 10 if no max iteration number is provided 2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model. 3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later. Co-authored-by:
Paul <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 02 Sep, 2021 2 commits
-
-
turneram authored
Implement the Where operator for the CPU and GPU. This is for better performance.
-
Shucai Xiao authored
* add topk operator doe ref, cpu and gpu * Hash modules for quicker lookup of modules * add onnx unit test * add unit tests for the topk operator Co-authored-by:
Paul <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 01 Sep, 2021 1 commit
-
-
Chris Austen authored
In ROCm 4.5.0 hip compile flags are coming in differently. This has caused some parsing issues for the HIP_COMPILER_FLAGS variable. As an example ROCm 4.3.0: --offload-arch=gfx900 ROCm 4.5.0: <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900> Using existing code... $<$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900> Becomes... $<$<COMPILE_LANGUAGE:CXX>:SHELL: There are two problems with that. 1) The "<" is not balanced with a "> due to the regex consuming the ">" 2) There is still a `SHELL:` label. This commit repairs both. I took the regex parsing code from ROCmSoftwarePlatform/MIOpen/blame/develop/CMakeLists.txt but improved it to support handling of target features like <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900:xxx+> Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 10 Aug, 2021 1 commit
-
-
Paul Fultz II authored
* Add hiprtc compile option * Add cross compile test * Update error reporting * Add tests for errors and warnings * Fix tidy warning * Add comment to ifdefs * Skip null character at end of log * Assert there is null at the end
-
- 05 Aug, 2021 1 commit
-
-
Paul Fultz II authored
* Add method to compile pointwise * Formatting * Add lambda * Add semicolon * Rename variable * Add driver to run jit kernels * Formatting * Add context * Formatting * Make seperate driver folder * Add more general gpu driver * Formatting * Print out wll time * Formatting * Run multiple times and skip first run * Formatting * Seperate time_op * Run an op for comparison * Formatting * Add debug asserts * Formatting * Change parameer name * Formatting * Fix argument order * Formatting * Add preloading * Formatting * Allow a different data type * Formatting * Pipeline transformations * Formatting * Add vectorization * Formatting * Reduce dims * Formatting * Compile with launch params as constant * Formatting * Make sure buffer can be vecotrized * Formatting * Enable vectorization and preloading * Formatting * Add print header * Formatting * Avoid allocating to large of LDS * Formatting * Add some vec functions to a seperate header * Formatting * Add stride loops * Formatting * Improve the transform pipeline * Formatting * Add const * Fix shape check * Formatting * Just check stride axis is zero * Remove extra finc_vector_axis overload * Simplify some mroe functions * Formatting * Remove some more extra functions * Formatting * Simplify more decltypes * Add another const * Fix test * Get buffer pointer different for older compilers Co-authored-by:
Shucai Xiao <shucai@gmail.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 14 Jul, 2021 1 commit
-
-
Paul Fultz II authored
* Unify device_name function * Formatting Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 08 Jul, 2021 2 commits
-
-
Paul Fultz II authored
* Add initial scan operator * Formatting * Fix with a working test * Fix bugs * Formatting * Formatting * Simplify * Formatting * Use non-power of 2 for test * Make pointer Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
Paul Fultz II authored
* Add preallocate method * Add preallocate_param pass * Preallocate buffers on the cpu * Formatting * Preallocate on the gpu * Add missing cpp file * Formatting * Add lifetime function * Formatting * Always allocate * Fix tidy warning * Add const * Add missing lifetime annotations Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-