"test/wmma_op/wmma_op.cpp" did not exist on "8784a72e23538d594ea6b1bd527478fba2962d30"
- 21 Jan, 2022 3 commits
-
-
turneram authored
Add onnx parser and unit tests for Softsign
-
turneram authored
* Add onnx parser and unit test
-
Paul Fultz II authored
* Improve handling of generator expressions when getting the flags for hip
-
- 20 Jan, 2022 2 commits
-
-
Paul Fultz II authored
-
Chris Austen authored
There have been hangs in the CI runs recently. Github runner jobs are failing due to exceeding file system size. Upgrading to 0.0.11 resolves this issue.
-
- 17 Jan, 2022 1 commit
-
-
Paul Fultz II authored
Make clip a pointwise op
-
- 11 Jan, 2022 1 commit
-
-
turneram authored
Add HardSigmoid onnx parser and unit tests Produces mathematical equivalent to ONNX operator through combination of existing pointwise ops. Resolves #1028
-
- 10 Jan, 2022 1 commit
-
-
Paul Fultz II authored
* Add matcher for conv_bias pointwise * Add fusion op
-
- 05 Jan, 2022 1 commit
-
-
turneram authored
Fix bug caused by casting time seed to float
-
- 10 Dec, 2021 1 commit
-
-
Cagri authored
nfnet update 3dunet requirements via pip 3dunet requirement and nb-clean
-
- 09 Dec, 2021 2 commits
-
-
Shucai Xiao authored
Changed the number of threads in a block from 256 to 128 Increased the max number of blocks in the kernel from 256 to 1M. For the case that the axis is the last dimension, we removed the computation of index since it is not required. With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
-
Paul Fultz II authored
Fuse last instruction in fuse_pointwise This is also fixes a bug with using an invalid iterator.
-
- 08 Dec, 2021 1 commit
-
-
Paul Fultz II authored
-
- 07 Dec, 2021 2 commits
-
-
Paul Fultz II authored
simple variable rename
-
Shucai Xiao authored
1. Previous implementation assumes inputs and outputs .pb files are ordered, but it is not the case. So, we should use the name of the tensors in the input/output .pb files to match the input and output in the onnx model. (This change applies to the BERT_Squad model) 2. When parsing a model with dynamic input shape, current implementation uses the default batch_size for the unknown dims, which can cause parsing error for some cases (e.g. mask_rcnn model). The solution is we first read an input to get the shape, then use these shapes to parse the onnx model.
-
- 05 Dec, 2021 1 commit
-
-
Cagri authored
Adds description for roctx knob of migraphx-driver in documentation.
-
- 02 Dec, 2021 1 commit
-
-
Paul Fultz II authored
Fix pointwise compile error with half sqrt
-
- 30 Nov, 2021 2 commits
-
-
turneram authored
Fix whitespace bug in fusable_conv matcher and add unit test
-
Paul Fultz II authored
-
- 25 Nov, 2021 2 commits
-
-
Shucai Xiao authored
Resolves a problem in parsing the ssd-10 model. The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw. For example, if we pass the following model: Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather. It works fine, and no contiguous is required. In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown. The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.
-
dependabot[bot] authored
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.1 to 2.5.2. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.1...v2.5.2 ) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 24 Nov, 2021 1 commit
-
-
Paul Fultz II authored
* Check jit kernels files with clang-tidy
-
- 22 Nov, 2021 3 commits
-
-
Cagri authored
This provides a helper script to run rocTX markers with migraphx-driver and reduces the number of steps a user would go through running rocTX knob. Run: python roctx.py --run --onnx_file <ONNX_FILE> --migraphx_args "--onnx --gpu --fp16 --batch 16" --out outputfolder Runs and parses the run output (JSON file). An example output is given below: SUM MIN MAX Marker start: gpu::convolution 5272 10 563 Marker start: gpu::add_relu 605 12 18 Marker start: gpu::gather 299 145 154 Marker start: gpu::mul_add 227 14 57 Marker start: gpu::sub 177 13 42 Marker start: gpu::concat 169 22 31 Marker start: gpu::triadd_relu 163 15 18 Marker start: load 141 0 3 Marker start: hip::hip_copy_literal 111 0 3 Marker start: gpu::add 58 13 17 Marker start: broadcast 52 0 3 Marker start: gpu::convert 31 15 16 Marker start: slice 11 0 1 Marker start: gpu::pooling 9 9 9 Marker start: step 2 2 2 Marker start: @param 2 0 1 Marker start: reshape 1 0 1 Marker start: hip::hip_allocate_memory 1 1 1 Marker start: check_context::migraphx::version_... 0 ERR ERR TOTAL TIME: 7331 us JSON FILE PATH: [...]/rpl_data_211019_195229_9369/input_results_211019_195229/trace.json Parse: python roctx.py --parse --json_path <JSON PATH FROM RUN> Note: The parse knob is made available if the user wants to parse an already existing JSON output. -
kahmed10 authored
Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.
-
Paul authored
-
- 18 Nov, 2021 1 commit
-
-
Paul Fultz II authored
Do compilation in parallel
-
- 17 Nov, 2021 1 commit
-
-
Paul Fultz II authored
Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape. - Update to pass the module inputs correctly to compute_shape - Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs - Add tests with contiguous and pointwise module function. - Move add_pointwise function to a seperate header to reuse across different tests
-
- 15 Nov, 2021 1 commit
-
-
kahmed10 authored
Currently we have the option of passing in --batch to the driver to change the batch size when the model has a dynamic dim value. We can use this flag to adjust the perf report's rate.
-
- 11 Nov, 2021 1 commit
-
-
Paul Fultz II authored
This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored. This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
-
- 10 Nov, 2021 1 commit
-
-
Shucai Xiao authored
This PR is to turn on a few gemm unit test with int8 input datatype. Before rocm4.4, int8 input data type requires matrix size to be no less than 4 in rocblas implementation. Because of this limitation, we turned off a few gemm unit tests with int8 input data type. This limitation is removed in rocm4.4, so after we upgrade to rocm4.5, we can turn on these unit tests. Also we change to unit test conv_bn_add to adding instructions to module instead of program. Co-authored-by:kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 09 Nov, 2021 1 commit
-
-
turneram authored
* Add workaround for devices that do not support miopen conv fusions
-
- 08 Nov, 2021 1 commit
-
-
Paul Fultz II authored
* Install pcre from github since the ftp.pcre.org site is no longer available
-
- 05 Nov, 2021 1 commit
-
-
kahmed10 authored
Moving our Docker file from ROCm 4.3 to 4.5 Add Navi base GPUs in to the CI infrastructure
-
- 03 Nov, 2021 1 commit
-
-
Umang Yadav authored
In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape. If there is trailing binary pointwise operator after depthToSpace then, migraphx can move binary operator before contiguous and reshape of the depthtospce. So, it becomes reshape-->transpose-->binary_op-->contiguous-->reshape. Explicit contiguous wouldn't be required since binary_op outputs standard shape. So, it becomes reshape-->transpose-->binary-->reshape. simplify_reshapes already has matcher that can do this transformation. This PR adds test for cases like depthtospace +binary op. solves #905
-
- 28 Oct, 2021 4 commits
-
-
Shucai Xiao authored
This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
-
Umang Yadav authored
In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape. This PR adds matcher to find d2s + unary pointwise ops. Application of the matcher moves the pointwise unary operation before the contiguous and reshape of the d2s. So it becomes reshape --> transpose --> unary --> contiguous --> reshape. Motivation is that, later pointwise module would be created out of unary --> contiguous --> reshape. Codegen for this pointwise module can write out buffer such that explicit contiguous and reshape wouldn't be required. This transformation is not always guaranteed to improve performance, since unary op will operate on non-standard shape. So, we would need some tuning mechanism to make decision. #905 pending PR for binary operations.
-
Shucai Xiao authored
GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
-
kahmed10 authored
Updates the theme of our documentation so that it matches the rest of the ROCm libraries.
-
- 20 Oct, 2021 1 commit
-
-
Shucai Xiao authored
Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
-
- 19 Oct, 2021 1 commit
-
-
Paul Fultz II authored
pthread linking errors on SLES.
-