- 09 Oct, 2022 6 commits
- 08 Oct, 2022 2 commits
- 07 Oct, 2022 4 commits
- 04 Oct, 2022 2 commits
-
-
Ted Themistokleous authored
Stream sync changes and associated API level changes
-
Paul Fultz II authored
optimize the softmax operator
-
- 03 Oct, 2022 1 commit
-
-
Umang Yadav authored
Adds two methods for the custom_ops virtual class. bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory. output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides. Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.
-
- 29 Sep, 2022 1 commit
-
-
Umang Yadav authored
Improvements/Additions to be made: changes for the quant_convolution, changes for the deconvolution, Macros for MIOpen status checks
-
- 28 Sep, 2022 1 commit
-
-
Umang Yadav authored
test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.
-
- 27 Sep, 2022 1 commit
-
-
Ted Themistokleous authored
Implement operator for CPU and GPU implementations
-
- 26 Sep, 2022 1 commit
-
-
Paul Fultz II authored
-
- 23 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Remove device functions * Update tests
-
- 21 Sep, 2022 1 commit
-
-
kahmed10 authored
This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.
-
- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 16 Sep, 2022 1 commit
-
-
Umang Yadav authored
* fix typo for add_sigmoid
-
- 15 Sep, 2022 1 commit
-
-
Lixun Zhang authored
* Replaced `find_library` with `find_package` to locate MLIR static library * Unified the include dir for headers and remove backward compatibility * Embedded the external/include dir into the exported library
-
- 14 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Implement concat using jit compilation
-
- 13 Sep, 2022 1 commit
-
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
-
- 08 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Remove unused headers
-
- 07 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Fix accuracy bug when vectorizing slices
-
- 06 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
-
- 31 Aug, 2022 1 commit
-
-
turneram authored
Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)
-
- 27 Aug, 2022 2 commits
-
-
Paul Fultz II authored
* Track kernel time
-
Paul Fultz II authored
This will rewrite dot operators like X(Y + b) to XY + Xb when b is constant as we can fold the add away. This improves handling pointwise with broadcasted operators, this helps improves const propagation. Improve gemm fusion with a mul_add Improve support for broadcast shapes in gemm
-
- 21 Aug, 2022 1 commit
-
-
varunsh authored
* Update is_supported * Return object from is_supported * Return by reference in interator
-
- 19 Aug, 2022 1 commit
-
-
Charlie Lin authored
remove print from source
-
- 17 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 16 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 12 Aug, 2022 1 commit
-
-
Krzysztof Drewniak authored
Once https://github.com/ROCmSoftwarePlatform/llvm-project-mlir/pull/690 lands, the ABI for MLIR-generated kernels will change. This commit prepares MIGraphX for the change by conditionally selecting the new ABI if MLIR reports a sufficiently high API version in its headers.
-
- 04 Aug, 2022 1 commit
-
-
Charlie Lin authored
* Dynamic shape handling in shape object * rewrite empty lens multibroadcast test * Shape class changes to handle dynamic * More throw errors for functions that don't make sense for dynamic shape * Print output changes * Serialization changes * Fixing serialization errors * Remove const on dyn_dim copy getters * Dynamic shape tests * Fix serialize errors * Add dyn_data struct to avoid ambiguous constructor * Tidy fix: emplace_back() over for loop * Tidy fix: use move * Use std::initializer_list in constructor Reverts the dyn_data struct change Should get around the ambiguous braced initialization list error * avoid typedef * element_space, min,max,opt _lens change * formatting * Comments fix * dynamic bytes() test * Seralize and reflect changes * formatting * Test the dynamic lens functions * progress * Formatting * Dynamic conv draft progress * Add operator<< tests for coverage * Coverage update * Add to conv dynamic batch test * Dynamic image size test * Dynamic weight handling * Dyn image shape test change, fix dyn weight cond * Comment update * Dynamic weights shape test and fix * Use ternary operator * Tidy fixes * Handle dynamic graph input shapes in ONNX parser * Formatting * Handle dynamic shape for convolution * formatting * cppcheck fixes * Add onnx test files * Fix typo * Disable auto_pad for dynamic input shape * check_shapes object checks for allowing dynamic shapes * Fix any_of * Change to maintain const objectness * Formatting * Check shapes allow dynamic * Refactor compute_shape() call into op.compute() Allows for per operator differences with handling dynamic shape Fix operation.hpp change to use the generator * Comment fix * Refactor normalize_attributes() calls to use max_lens() * Comment addition * Update other normalize_attributes() calls * Change to using constructor and add tests * Use const member function * Add more dynamic shape support * Add tests for error code coverage * Fix opt shape bug and add shape tests * capture all by ref * Fix typo with img shape calculation * Add more tests * dynamic auto pad attempt Linker error with pad_calc.cpp * Fix parse dyn auto_pad Should only need to use dynamic auto pad when the image shape or kernel shape are dynamic. For a dynamic batch size, the auto pad calculation is the same. * Fix linking error * Fix auto_pad bug Fixed input tensor with auto_pad setting on * auto_pad onnx tests * Fix auto_pad calculation, evaluate in ref_conv add ref_ops tests * Add shape tests, fix bugs * Refactor first two output dynamic len calculation * Conv MLIR test update * i64 MLIR test fix * Fix MLIR test typo Co-authored-by:Chris Austen <causten@users.noreply.github.com>
-
- 02 Aug, 2022 1 commit
-
-
jungpark-mlir authored
-
- 29 Jul, 2022 1 commit
-
-
Umang Yadav authored
Currently, while copying a host buffer to the device, it first registers/maps the host buffer pointer to address space of the device. If the host buffer has been allocated by the hipHostMalloc then, it is implicitly registered to the device's address space, and no need to register again. This PR adds a check for the same.
-