- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 07 Jun, 2022 1 commit
-
-
Zhuoran Yin authored
prioritizing int8 over int8x4 when it is applicable Amend return to continue in apply loop Adding error handling in case int8x4 compilation failed Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 02 Jun, 2022 1 commit
-
-
Paul Fultz II authored
-
- 26 May, 2022 1 commit
-
-
Paul Fultz II authored
* Upgrade to cppcheck 2.8
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 14 Apr, 2022 1 commit
-
-
bpickrel authored
Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type. Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases: tensor size not divisible by 2 tensor size divisible by 2 but not by 4 tensor size divisible by 4
-
- 11 Apr, 2022 1 commit
-
-
bpickrel authored
Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.) Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
-
- 29 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape. This also makes it easier to add new runtime compiled kernels in the future.
-
- 18 Mar, 2022 1 commit
-
-
turneram authored
Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum
-
- 04 Mar, 2022 1 commit
-
-
bpickrel authored
Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
-
- 03 Mar, 2022 1 commit
-
-
turneram authored
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
-
- 02 Mar, 2022 1 commit
-
-
Charlie Lin authored
Implements the IsNaN operator, ref, gpu, and onnx parser.
-
- 24 Feb, 2022 1 commit
-
-
Paul Fultz II authored
Make doc/CMakeLists.txt standalone Switch to use rocm-cmake modules for document generation Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run Add STRINGS property for build type to make it easier to switch build types with ccmake Various fixes and improvements
-
- 09 Feb, 2022 1 commit
-
-
Paul Fultz II authored
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
-
- 08 Feb, 2022 1 commit
-
-
Charlie Lin authored
Changed MessagePack file extensions to mxr.
-
- 27 Jan, 2022 1 commit
-
-
Umang Yadav authored
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
-
- 20 Jan, 2022 1 commit
-
-
Paul Fultz II authored
-
- 17 Jan, 2022 1 commit
-
-
Paul Fultz II authored
Make clip a pointwise op
-
- 02 Dec, 2021 1 commit
-
-
Paul Fultz II authored
Fix pointwise compile error with half sqrt
-
- 25 Nov, 2021 1 commit
-
-
Shucai Xiao authored
Resolves a problem in parsing the ssd-10 model. The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw. For example, if we pass the following model: Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather. It works fine, and no contiguous is required. In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown. The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.
-
- 10 Nov, 2021 1 commit
-
-
Shucai Xiao authored
This PR is to turn on a few gemm unit test with int8 input datatype. Before rocm4.4, int8 input data type requires matrix size to be no less than 4 in rocblas implementation. Because of this limitation, we turned off a few gemm unit tests with int8 input data type. This limitation is removed in rocm4.4, so after we upgrade to rocm4.5, we can turn on these unit tests. Also we change to unit test conv_bn_add to adding instructions to module instead of program. Co-authored-by:kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 28 Oct, 2021 2 commits
-
-
Shucai Xiao authored
This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
-
Shucai Xiao authored
GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
-
- 20 Oct, 2021 1 commit
-
-
Shucai Xiao authored
Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
-
- 08 Oct, 2021 2 commits
-
-
Shucai Xiao authored
This PR is for the nonzero operator with static output shape. Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
Umang Yadav authored
Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs. Aim is to have the definition of dot operator as C = A . B without having alpha or beta. In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
-
- 01 Oct, 2021 1 commit
-
-
turneram authored
Add multinomial op to onnx parser with ref and GPU implementations. The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution. Resolves #821 Co-authored-by:Shucai Xiao <shucai@gmail.com>
-
- 27 Sep, 2021 1 commit
-
-
kahmed10 authored
Checks wavefront size, then changes implementation and number of threads for DPP reduce
-
- 17 Sep, 2021 2 commits
-
-
Paul Fultz II authored
This reverts commit 9e43cb8b.
-
Umang Yadav authored
This PR aims to remove alpha and beta attributes from dot operator completely. Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs. Aim is to have the definition of dot operator as C = A . B without having alpha or beta. In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
-
- 16 Sep, 2021 1 commit
-
-
Shucai Xiao authored
Add Loop operator for opset version 13. Notes: 1) Default max iteration number is 10 if no max iteration number is provided 2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model. 3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later. Co-authored-by:
Paul <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 02 Sep, 2021 2 commits
-
-
turneram authored
Implement the Where operator for the CPU and GPU. This is for better performance.
-
Shucai Xiao authored
* add topk operator doe ref, cpu and gpu * Hash modules for quicker lookup of modules * add onnx unit test * add unit tests for the topk operator Co-authored-by:
Paul <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 24 Aug, 2021 1 commit
-
-
Umang Yadav authored
* rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same * change the reshape attribute from dims to out_lens * change transpose attribute's name from dims to perm to reflect better meaning * use permutation instead of perm for transpose clang formaating * use dims instead of out_lens for reshape clang formatting
-
- 18 Aug, 2021 1 commit
-
-
turneram authored
Co-authored-by:Chris Austen <causten@users.noreply.github.com>
-
- 09 Aug, 2021 1 commit
-
-
Cagri Eryilmaz authored
* check for divisor encodable or not, fallback if needed * verify test for retinaface case
-
- 15 Jul, 2021 1 commit
-
-
turneram authored
* Add operators, refactor parsers, add rewrite passes, add tests * Formatting * Fix cppcheck * Review comments * Formatting * Combine rewrite passes * Formatting * Add ref implementations * Formatting * Review comments * Formatting * Tidy warnings * Apply review comments * Formatting * Fix CI error * Formatting * Increase code coverage * Formatting * Move broadcasting of scales and zero points to onnx parser * Formatting * Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type * Formatting * Increase code coverage * Formatting * Switch certain variables to int64_t * Formatting * Fix overflow in implicit constant conversion * Formatting * Increase code coverage * Formatting * Remove operators.hpp from includes in tf_test.cpp * Formatting * Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes * Formatting * Switch dequantizelinear math from int32 to float * Formatting * Remove changes to operators.hpp * Simplify apply_quantizelinear * Formatting * Add verify test for int32 data * Add rewrite_quantization back to CMakeLists
-
- 13 Jul, 2021 1 commit
-
-
Paul Fultz II authored
* Add build for ubuntu 20.04 * Fix ambiguous overload resolution with stream * Fix warning * Capture by value * Format
-
- 08 Jul, 2021 1 commit
-
-
Paul Fultz II authored
* Add initial scan operator * Formatting * Fix with a working test * Fix bugs * Formatting * Formatting * Simplify * Formatting * Use non-power of 2 for test * Make pointer Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-