- 31 Oct, 2023 2 commits
-
-
Artur Wojcik authored
-
Artur Wojcik authored
Signed-off-by:Artur Wojcik <artur.wojcik@amd.com>
-
- 19 Oct, 2023 1 commit
-
-
Umang Yadav authored
* Disable -Wunsafe-buffer-usage when compiling gpu code
-
- 13 Oct, 2023 1 commit
-
-
turneram authored
-
- 17 Jul, 2023 1 commit
-
-
Chris Austen authored
* add support for rocm 5.6 in CI * Disable anonymous namespace check * add default c'tors to avoid warnings
-
- 02 Jul, 2023 1 commit
-
-
Paul Fultz II authored
Add a CI job to test CK Add MIGRAPHX_TUNE_CK env variable to only do tuning for CK Continue tuning even when there is invalid configs Fix a bug with parallel compilation not using all available threads Add additional test for gemms using half types Removed int32 as supported type since it doesnt pass our test suite
-
- 15 Jun, 2023 1 commit
-
-
Umang Yadav authored
-
- 08 Jun, 2023 1 commit
-
-
Paul Fultz II authored
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
-
- 23 May, 2023 1 commit
-
-
Umang Yadav authored
back out changes for rocm-5.5
-
- 20 May, 2023 1 commit
-
-
Umang Yadav authored
* use half hip functions to compute max and min * add verify test for min and max
-
- 19 May, 2023 1 commit
-
-
Zhuoran Yin authored
Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 08 May, 2023 1 commit
-
-
Umang Yadav authored
-
- 24 Apr, 2023 2 commits
-
-
Paul Fultz II authored
This fixes #1700
-
Paul Fultz II authored
-
- 06 Apr, 2023 1 commit
-
-
Paul Fultz II authored
Automatically fuse multiple reductions and pointwise operations.
-
- 10 Mar, 2023 1 commit
-
-
Paul Fultz II authored
-
- 23 Feb, 2023 1 commit
-
-
shivadbhavsar authored
-
- 16 Feb, 2023 2 commits
-
-
Paul Fultz II authored
Avoids double global loads. Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant. Updated to handle large reductions so which results with a better stable diffusion result
-
Umang Yadav authored
* deprecate HCC
-
- 31 Jan, 2023 1 commit
-
-
Umang Yadav authored
Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC. Added stages in Jenkins for hipRTC. Fixes for some of the pending issues from hipRTC.
-
- 17 Jan, 2023 1 commit
-
-
Paul Fultz II authored
-
- 11 Jan, 2023 1 commit
-
-
Paul Fultz II authored
* Use cosine to compute half sin
-
- 09 Jan, 2023 1 commit
-
-
Ted Themistokleous authored
JIT implementation of the gather operator Added a few more unit tests to this one as well since I saw some odd behavior during bring up.
-
- 07 Dec, 2022 1 commit
-
-
Paul Fultz II authored
* Add implicit_conversion
-
- 02 Nov, 2022 1 commit
-
-
Paul Fultz II authored
-
- 27 Oct, 2022 1 commit
-
-
kahmed10 authored
updated GPU pad to now use JIT version. added range functions for JIT kernels.
-
- 04 Oct, 2022 1 commit
-
-
Paul Fultz II authored
optimize the softmax operator
-
- 27 Sep, 2022 1 commit
-
-
Ted Themistokleous authored
Implement operator for CPU and GPU implementations
-
- 21 Sep, 2022 1 commit
-
-
kahmed10 authored
This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.
-
- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 14 Sep, 2022 1 commit
-
-
Paul Fultz II authored
* Implement concat using jit compilation
-
- 06 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
-
- 17 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 16 Aug, 2022 1 commit
-
-
Paul Fultz II authored
-
- 11 Jul, 2022 1 commit
-
-
Paul Fultz II authored
* Only run __syncthreads when there is data to preload * Improve loops * Add const attribute to improve optimizations
-
- 05 Jul, 2022 1 commit
-
-
Paul Fultz II authored
* Add softmax kernel
-
- 25 Jun, 2022 1 commit
-
-
Paul Fultz II authored
* Jit contiguous
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 10 Jun, 2022 1 commit
-
-
Paul Fultz II authored
Consolidate the vectorize and preload Add vectorization to reduction Co-authored-by:kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 24 May, 2022 1 commit
-
-
Paul Fultz II authored
Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system
-