- 28 Jul, 2023 1 commit
-
-
Paul Fultz II authored
* Improve performance of pointwise/reduction kernels when using NHWC layouts * Format * Add nhwc test * Format * Remove inline namespace * Add reduce test
-
- 22 Jul, 2023 1 commit
-
-
Charlie Lin authored
Throwing on these calls catches dynamic shape errors earlier rather than having to backpedal from a bad call
-
- 13 Jul, 2023 1 commit
-
-
Charlie Lin authored
Renames deconvolution -> convolution_backwards to be more consistent with the literature Note: this is not the cross-correlation operator (which is the adjoint of convolution). This is technically a standard convolution operator combined with an upsampling operator rather than a downsampling operator. Adds unit tests for the padding, strides, dilations, and other op attributes. Throws on auto_pad attribute since it has not been implemented Previously it read the attribute and set it but then did nothing with it Extended for dynamic shapes Does not support using asymmetric padding (padding_L != padding_R) and output_shape with dynamic shapes.
-
- 10 Jul, 2023 1 commit
-
-
Brian Pickrell authored
Changes to the way Pooling operation calculates pooling when there's padding. Old code would clip off any padding values before computing; for instance if an Average pooling window contained 0 1 2 where the 0 is padding, the result was 1.5 instead of 1.0. See Issue 1766
-
- 08 Jul, 2023 1 commit
-
-
Artur Wojcik authored
-
- 06 Jul, 2023 1 commit
-
-
Artur Wojcik authored
-
- 02 Jul, 2023 1 commit
-
-
Paul Fultz II authored
Add a CI job to test CK Add MIGRAPHX_TUNE_CK env variable to only do tuning for CK Continue tuning even when there is invalid configs Fix a bug with parallel compilation not using all available threads Add additional test for gemms using half types Removed int32 as supported type since it doesnt pass our test suite
-
- 23 Jun, 2023 1 commit
-
-
Umang Yadav authored
Fixes #1852 Fixes #1847
-
- 01 Jun, 2023 1 commit
-
-
Umang Yadav authored
By converting to fp32 : fp16 3d-unet model accuracy comes out the same as FP32 accuracy. By using reduce_sum method on Fp16 : accuracy comes out ~0.9% lower compared to fp32 while keeping entire model in fp16.
-
- 25 May, 2023 1 commit
-
-
Ted Themistokleous authored
Use std::numeric_limits::min/max() functions plus the appropriate value to encode -inf/inf
-
- 20 May, 2023 1 commit
-
-
Umang Yadav authored
* use half hip functions to compute max and min * add verify test for min and max
-
- 04 May, 2023 1 commit
-
-
Paul Fultz II authored
When multiplying either the input or output across the K dimensions then the multiple can be applied to the constant which can then be folded with propagate_const.
-
- 28 Apr, 2023 1 commit
-
-
Charlie Lin authored
-
- 24 Apr, 2023 2 commits
-
-
Paul Fultz II authored
This fixes #1700
-
Paul Fultz II authored
-
- 07 Apr, 2023 1 commit
-
-
Paul Fultz II authored
Converts can be inserted when the scales and input differ in the onnx file(we are already doing this implicit conversion in the ref implementation). This will also improve the compile-time of quantizelinear.hpp since we can remove the nested visit method.
-
- 05 Apr, 2023 1 commit
-
-
Paul Fultz II authored
This will replace conv(x+a, w) with conv(x, w) + conv(a, w) where a is a constant so conv(a, w) can be replaced with a constant.
-
- 31 Mar, 2023 1 commit
-
-
Charlie Lin authored
Adds a new GPU compiler pass split_single_dyn_dim that handles when one input parameter has a single non-fixed dynamic_dimension. commonly occurs for dynamic batch or BERT sequence length Splits the dynamic shape into several submodules will static input parameters to handle all of the cases in the dynamic_dimension range. Essentially does what I manually did for the select_module verify tests Adds a compile option split_single_dyn_dim that toggles the pass on/off. Defaults to false. Updates verify_program.hpp and run_verify.cpp to allow for the tests to change the compile_options
-
- 29 Mar, 2023 1 commit
-
-
Paul Fultz II authored
-
- 21 Mar, 2023 1 commit
-
-
Charlie Lin authored
Refactor to have select_module use output parameters Disable select_module verify tests on cpu
-
- 18 Mar, 2023 1 commit
-
-
Umang Yadav authored
Fixes #1595
-
- 17 Mar, 2023 2 commits
-
-
Paul Fultz II authored
-
Paul Fultz II authored
This is the original testcase that sparked the error with missing proper const folding. Pushing changes up to this branch and closing out the PR #1622
-
- 10 Mar, 2023 2 commits
-
-
Paul Fultz II authored
-
Paul Fultz II authored
-
- 28 Feb, 2023 1 commit
-
-
Charlie Lin authored
Creates the select_module operator that selects one of the submodules passed to it to run based on the submodule parameters. The submodule is selected by having the exact same static shapes for the arguments to select_module as the parameters in the submodule
-
- 23 Feb, 2023 1 commit
-
-
shivadbhavsar authored
-
- 16 Feb, 2023 1 commit
-
-
Paul Fultz II authored
Avoids double global loads. Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant. Updated to handle large reductions so which results with a better stable diffusion result
-
- 17 Jan, 2023 1 commit
-
-
Paul Fultz II authored
-
- 13 Jan, 2023 1 commit
-
-
shivadbhavsar authored
This PR resolves the bug addressed in #1496.
-
- 11 Jan, 2023 1 commit
-
-
Paul Fultz II authored
* Use cosine to compute half sin
-
- 09 Jan, 2023 1 commit
-
-
Ted Themistokleous authored
JIT implementation of the gather operator Added a few more unit tests to this one as well since I saw some odd behavior during bring up.
-
- 02 Nov, 2022 1 commit
-
-
Paul Fultz II authored
-
- 28 Oct, 2022 1 commit
-
-
Umang Yadav authored
Local Threads of multiples 32 were introduced in #1348 But LocalThreads that are not multiple of 64 are causing correctness issues.
-
- 27 Oct, 2022 1 commit
-
-
kahmed10 authored
updated GPU pad to now use JIT version. added range functions for JIT kernels.
-
- 26 Oct, 2022 1 commit
-
-
Brian Pickrell authored
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
-
- 19 Oct, 2022 2 commits
-
-
Charlie Lin authored
Refactor dynamic compute - add a compute_output_shape object that implicitly converts to a new dyn_output or shape object - dyn_output object can handle computing the static output shape of an operator given the input arguments shapes change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to use dyn_output object Dynamic ref unary functions - Included these changes to have an example of the refactored dynamic compute being used - Changes to unary base class to handle dynamic shapes - Changed elu and leaky_relu to use unary base class and pointwise JIT
-
Umang Yadav authored
* use find2.0 for the convolution Co-authored-by:
Vasilii Filippov <DrizztDoUrden@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 13 Oct, 2022 2 commits
-
-
Charlie Lin authored
Removes use_dynamic_same_auto_pad Change padding_mode to be used for dynamic padding Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases Fix same_lower compute_padded_shape bug and add a test.
-
Charlie Lin authored
Rewrites the TF batch norm like operators to other MIGX operators Removes the code related to batch_norm_inference
-