- 15 Feb, 2022 3 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 09 Feb, 2022 5 commits
-
-
Paul Fultz II authored
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
- 08 Feb, 2022 5 commits
-
-
Paul Fultz II authored
This causes incorrect memory coloring, which was causing the accuracy failures in the vision model when enabling the pointwise fusions. Resnet50, inceptionv3 and inceptionv4 do verify now in the driver.
-
Paul Fultz II authored
Enforce types to avoid compilation error in pointwise fusions This fixes compile failure: gpt-2, fp16 on Navi
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
- 04 Feb, 2022 2 commits
-
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
- 31 Jan, 2022 1 commit
-
-
Khalique Ahmed authored
-
- 28 Jan, 2022 2 commits
-
-
Paul Fultz II authored
* Enable auto vectorization * Handle vector types with convert function * Dont vectorize when it will cause problems with preload
-
Shucai Xiao authored
-
- 27 Jan, 2022 1 commit
-
-
Umang Yadav authored
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
-
- 21 Jan, 2022 1 commit
-
-
Paul Fultz II authored
* Improve handling of generator expressions when getting the flags for hip
-
- 10 Jan, 2022 1 commit
-
-
Paul Fultz II authored
* Add matcher for conv_bias pointwise * Add fusion op
-
- 09 Dec, 2021 1 commit
-
-
Shucai Xiao authored
Changed the number of threads in a block from 256 to 128 Increased the max number of blocks in the kernel from 256 to 1M. For the case that the axis is the last dimension, we removed the computation of index since it is not required. With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
-
- 08 Dec, 2021 1 commit
-
-
Paul Fultz II authored
-
- 07 Dec, 2021 1 commit
-
-
Paul Fultz II authored
simple variable rename
-
- 02 Dec, 2021 1 commit
-
-
Paul Fultz II authored
Fix pointwise compile error with half sqrt
-
- 30 Nov, 2021 2 commits
-
-
turneram authored
Fix whitespace bug in fusable_conv matcher and add unit test
-
Paul Fultz II authored
-
- 24 Nov, 2021 1 commit
-
-
Paul Fultz II authored
* Check jit kernels files with clang-tidy
-
- 18 Nov, 2021 1 commit
-
-
Paul Fultz II authored
Do compilation in parallel
-
- 11 Nov, 2021 1 commit
-
-
Paul Fultz II authored
This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored. This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
-
- 09 Nov, 2021 1 commit
-
-
turneram authored
* Add workaround for devices that do not support miopen conv fusions
-
- 05 Nov, 2021 1 commit
-
-
kahmed10 authored
Moving our Docker file from ROCm 4.3 to 4.5 Add Navi base GPUs in to the CI infrastructure
-
- 28 Oct, 2021 2 commits
-
-
Shucai Xiao authored
This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
-
Shucai Xiao authored
GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
-
- 20 Oct, 2021 1 commit
-
-
Shucai Xiao authored
Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
-
- 19 Oct, 2021 1 commit
-
-
Paul Fultz II authored
pthread linking errors on SLES.
-
- 08 Oct, 2021 2 commits
-
-
Shucai Xiao authored
This PR is for the nonzero operator with static output shape. Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
Umang Yadav authored
Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs. Aim is to have the definition of dot operator as C = A . B without having alpha or beta. In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
-
- 01 Oct, 2021 1 commit
-
-
turneram authored
Add multinomial op to onnx parser with ref and GPU implementations. The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution. Resolves #821 Co-authored-by:Shucai Xiao <shucai@gmail.com>
-
- 27 Sep, 2021 1 commit
-
-
kahmed10 authored
Checks wavefront size, then changes implementation and number of threads for DPP reduce
-