- 31 Mar, 2023 1 commit
-
-
Charlie Lin authored
Adds a new GPU compiler pass split_single_dyn_dim that handles when one input parameter has a single non-fixed dynamic_dimension. commonly occurs for dynamic batch or BERT sequence length Splits the dynamic shape into several submodules will static input parameters to handle all of the cases in the dynamic_dimension range. Essentially does what I manually did for the select_module verify tests Adds a compile option split_single_dyn_dim that toggles the pass on/off. Defaults to false. Updates verify_program.hpp and run_verify.cpp to allow for the tests to change the compile_options
-
- 29 Mar, 2023 1 commit
-
-
Paul Fultz II authored
-
- 21 Mar, 2023 1 commit
-
-
Charlie Lin authored
Refactor to have select_module use output parameters Disable select_module verify tests on cpu
-
- 18 Mar, 2023 1 commit
-
-
Umang Yadav authored
Fixes #1595
-
- 17 Mar, 2023 2 commits
-
-
Paul Fultz II authored
-
Paul Fultz II authored
This is the original testcase that sparked the error with missing proper const folding. Pushing changes up to this branch and closing out the PR #1622
-
- 10 Mar, 2023 2 commits
-
-
Paul Fultz II authored
-
Paul Fultz II authored
-
- 28 Feb, 2023 1 commit
-
-
Charlie Lin authored
Creates the select_module operator that selects one of the submodules passed to it to run based on the submodule parameters. The submodule is selected by having the exact same static shapes for the arguments to select_module as the parameters in the submodule
-
- 23 Feb, 2023 1 commit
-
-
shivadbhavsar authored
-
- 16 Feb, 2023 1 commit
-
-
Paul Fultz II authored
Avoids double global loads. Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant. Updated to handle large reductions so which results with a better stable diffusion result
-
- 17 Jan, 2023 1 commit
-
-
Paul Fultz II authored
-
- 13 Jan, 2023 1 commit
-
-
shivadbhavsar authored
This PR resolves the bug addressed in #1496.
-
- 11 Jan, 2023 1 commit
-
-
Paul Fultz II authored
* Use cosine to compute half sin
-
- 09 Jan, 2023 1 commit
-
-
Ted Themistokleous authored
JIT implementation of the gather operator Added a few more unit tests to this one as well since I saw some odd behavior during bring up.
-
- 02 Nov, 2022 1 commit
-
-
Paul Fultz II authored
-
- 28 Oct, 2022 1 commit
-
-
Umang Yadav authored
Local Threads of multiples 32 were introduced in #1348 But LocalThreads that are not multiple of 64 are causing correctness issues.
-
- 27 Oct, 2022 1 commit
-
-
kahmed10 authored
updated GPU pad to now use JIT version. added range functions for JIT kernels.
-
- 26 Oct, 2022 1 commit
-
-
Brian Pickrell authored
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
-
- 19 Oct, 2022 2 commits
-
-
Charlie Lin authored
Refactor dynamic compute - add a compute_output_shape object that implicitly converts to a new dyn_output or shape object - dyn_output object can handle computing the static output shape of an operator given the input arguments shapes change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to use dyn_output object Dynamic ref unary functions - Included these changes to have an example of the refactored dynamic compute being used - Changes to unary base class to handle dynamic shapes - Changed elu and leaky_relu to use unary base class and pointwise JIT
-
Umang Yadav authored
* use find2.0 for the convolution Co-authored-by:
Vasilii Filippov <DrizztDoUrden@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 13 Oct, 2022 2 commits
-
-
Charlie Lin authored
Removes use_dynamic_same_auto_pad Change padding_mode to be used for dynamic padding Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases Fix same_lower compute_padded_shape bug and add a test.
-
Charlie Lin authored
Rewrites the TF batch norm like operators to other MIGX operators Removes the code related to batch_norm_inference
-
- 04 Oct, 2022 1 commit
-
-
Paul Fultz II authored
optimize the softmax operator
-
- 27 Sep, 2022 1 commit
-
-
Ted Themistokleous authored
Implement operator for CPU and GPU implementations
-
- 21 Sep, 2022 1 commit
-
-
kahmed10 authored
This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.
-
- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 14 Sep, 2022 2 commits
-
-
turneram authored
The verify tests from pr #1354 were still causing some codecov timeouts after merge. This PR further reduces the problem sizes to avoid these failures.
-
Paul Fultz II authored
* Implement concat using jit compilation
-
- 13 Sep, 2022 1 commit
-
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
-
- 07 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Fix accuracy bug when vectorizing slices
-
- 06 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
-
- 31 Aug, 2022 1 commit
-
-
turneram authored
Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)
-
- 27 Aug, 2022 1 commit
-
-
Paul Fultz II authored
This will rewrite dot operators like X(Y + b) to XY + Xb when b is constant as we can fold the add away. This improves handling pointwise with broadcasted operators, this helps improves const propagation. Improve gemm fusion with a mul_add Improve support for broadcast shapes in gemm
-
- 17 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 16 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 25 Jul, 2022 1 commit
-
-
varunsh authored
* Add is_supported to the target * Add get_target_assignments * Rename assignment to target_assignments * Add ref target header to test * Add fpga target * Make context const in compute
-
- 06 Jul, 2022 1 commit
-
-
Paul Fultz II authored
*In the verification tests, check that saving and reloading the program is the same program. This also fixes serialization to always load instructions in the same order. There is also fixes for deconv and quant_conv which didn't save the solution id, and was broken for serialization.
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 07 Jun, 2022 1 commit
-
-
Zhuoran Yin authored
prioritizing int8 over int8x4 when it is applicable Amend return to continue in apply loop Adding error handling in case int8x4 compilation failed Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-