- 30 Sep, 2022 1 commit
-
-
charlie authored
Not needed, special case with dynamic padding
-
- 29 Sep, 2022 6 commits
-
-
charlie authored
-
charlie authored
-
charlie authored
-
Umang Yadav authored
Improvements/Additions to be made: changes for the quant_convolution, changes for the deconvolution, Macros for MIOpen status checks
-
charlie authored
-
Paul Fultz II authored
* Fix invalid program from find_splits
-
- 28 Sep, 2022 3 commits
-
-
charlie authored
-
charlie authored
-
Umang Yadav authored
test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.
-
- 27 Sep, 2022 4 commits
-
-
charlie authored
-
charlie authored
-
charlie authored
-
Ted Themistokleous authored
Implement operator for CPU and GPU implementations
-
- 26 Sep, 2022 4 commits
-
-
Charlie Lin authored
Rewrites the BatchNormalization ONNX operator into other MIGX operators - Added handling of 1D input tensor case (edge case in ONNX spec) Removes the spatial and per_activation functionality (not in the ONNX spec) - Did not remove the batch_norm_inference related code as the TensorFlow parser still uses it - Can remove that code when the TF version is updated
-
charlie authored
-
Paul Fultz II authored
-
Paul Fultz II authored
Upgrade cppcheck to 2.9
-
- 23 Sep, 2022 2 commits
-
-
charlie authored
-
Paul Fultz II authored
* Remove device functions * Update tests
-
- 22 Sep, 2022 2 commits
- 21 Sep, 2022 2 commits
-
-
kahmed10 authored
This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.
-
Charlie Lin authored
Change find_mul_conv to work with multibroadcast also. Checks the strides instead of the broadcast axis.
-
- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 16 Sep, 2022 6 commits
-
-
Umang Yadav authored
* fix typo for add_sigmoid
-
charlie authored
-
charlie authored
-
charlie authored
-
charlie authored
Weird bug with ref padding shape still need to change parse_convolution
-
Umang Yadav authored
* remove deprecated constructor
-
- 15 Sep, 2022 2 commits
-
-
Lixun Zhang authored
* Replaced `find_library` with `find_package` to locate MLIR static library * Unified the include dir for headers and remove backward compatibility * Embedded the external/include dir into the exported library
-
charlie authored
-
- 14 Sep, 2022 3 commits
-
-
Umang Yadav authored
* fix slice_dim1 for case
-
Paul Fultz II authored
* Implement concat using jit compilation
-
shivadbhavsar authored
expose underlying p data inter for migraphx.argument Update python api documentation
-
- 13 Sep, 2022 1 commit
-
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
-
- 08 Sep, 2022 2 commits
-
-
Paul Fultz II authored
* Remove unused headers
-
Charlie Lin authored
Fixes TF literal parsing for relu6. previously always made a float type literal, breaks for float16 as an example
-
- 07 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Fix accuracy bug when vectorizing slices
-