- 03 Nov, 2022 1 commit
-
-
Umang Yadav authored
Local Threads of multiples 32 were introduced in #1348 But LocalThreads that are not multiple of 64 are causing correctness issues.
-
- 02 Nov, 2022 1 commit
-
-
Chris Austen authored
updated GPU pad to now use JIT version. added range functions for JIT kernels. Co-authored-by:kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 28 Oct, 2022 1 commit
-
-
Chris Austen authored
* rearrange default pass list; adjust_allocation must be run after rep… (#1418) * Regenerate driver models (#1422) * Add support in mlir for transposed and broadcasted shaped (#1378) * Add relaxed standard shape assertion (#1416) Co-authored-by:
Brian Pickrell <95253842+bpickrel@users.noreply.github.com> Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com> Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
jungpark-mlir <jungwook.park@amd.com>
-
- 13 Oct, 2022 2 commits
-
-
Charlie Lin authored
Removes use_dynamic_same_auto_pad Change padding_mode to be used for dynamic padding Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases Fix same_lower compute_padded_shape bug and add a test.
-
Charlie Lin authored
Rewrites the TF batch norm like operators to other MIGX operators Removes the code related to batch_norm_inference
-
- 04 Oct, 2022 1 commit
-
-
Paul Fultz II authored
optimize the softmax operator
-
- 27 Sep, 2022 1 commit
-
-
Ted Themistokleous authored
Implement operator for CPU and GPU implementations
-
- 21 Sep, 2022 1 commit
-
-
kahmed10 authored
This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.
-
- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 14 Sep, 2022 2 commits
-
-
turneram authored
The verify tests from pr #1354 were still causing some codecov timeouts after merge. This PR further reduces the problem sizes to avoid these failures.
-
Paul Fultz II authored
* Implement concat using jit compilation
-
- 13 Sep, 2022 1 commit
-
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
-
- 07 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Fix accuracy bug when vectorizing slices
-
- 06 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
-
- 31 Aug, 2022 1 commit
-
-
turneram authored
Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)
-
- 27 Aug, 2022 1 commit
-
-
Paul Fultz II authored
This will rewrite dot operators like X(Y + b) to XY + Xb when b is constant as we can fold the add away. This improves handling pointwise with broadcasted operators, this helps improves const propagation. Improve gemm fusion with a mul_add Improve support for broadcast shapes in gemm
-
- 17 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 16 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 25 Jul, 2022 1 commit
-
-
varunsh authored
* Add is_supported to the target * Add get_target_assignments * Rename assignment to target_assignments * Add ref target header to test * Add fpga target * Make context const in compute
-
- 06 Jul, 2022 1 commit
-
-
Paul Fultz II authored
*In the verification tests, check that saving and reloading the program is the same program. This also fixes serialization to always load instructions in the same order. There is also fixes for deconv and quant_conv which didn't save the solution id, and was broken for serialization.
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 07 Jun, 2022 1 commit
-
-
Zhuoran Yin authored
prioritizing int8 over int8x4 when it is applicable Amend return to continue in apply loop Adding error handling in case int8x4 compilation failed Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 02 Jun, 2022 1 commit
-
-
Paul Fultz II authored
-
- 26 May, 2022 1 commit
-
-
Paul Fultz II authored
* Upgrade to cppcheck 2.8
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 14 Apr, 2022 1 commit
-
-
bpickrel authored
Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type. Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases: tensor size not divisible by 2 tensor size divisible by 2 but not by 4 tensor size divisible by 4
-
- 11 Apr, 2022 1 commit
-
-
bpickrel authored
Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.) Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
-
- 29 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape. This also makes it easier to add new runtime compiled kernels in the future.
-
- 18 Mar, 2022 1 commit
-
-
turneram authored
Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum
-
- 04 Mar, 2022 1 commit
-
-
bpickrel authored
Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
-
- 03 Mar, 2022 1 commit
-
-
turneram authored
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
-
- 02 Mar, 2022 1 commit
-
-
Charlie Lin authored
Implements the IsNaN operator, ref, gpu, and onnx parser.
-
- 24 Feb, 2022 1 commit
-
-
Paul Fultz II authored
Make doc/CMakeLists.txt standalone Switch to use rocm-cmake modules for document generation Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run Add STRINGS property for build type to make it easier to switch build types with ccmake Various fixes and improvements
-
- 09 Feb, 2022 1 commit
-
-
Paul Fultz II authored
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
-
- 08 Feb, 2022 1 commit
-
-
Charlie Lin authored
Changed MessagePack file extensions to mxr.
-
- 27 Jan, 2022 1 commit
-
-
Umang Yadav authored
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
-
- 20 Jan, 2022 1 commit
-
-
Paul Fultz II authored
-
- 17 Jan, 2022 1 commit
-
-
Paul Fultz II authored
Make clip a pointwise op
-
- 02 Dec, 2021 1 commit
-
-
Paul Fultz II authored
Fix pointwise compile error with half sqrt
-