1. 27 Sep, 2023 1 commit
  2. 17 Jun, 2023 1 commit
    • turneram's avatar
      Update CK commit hash and add gfx940 to supported archs (#1842) · b8898d7e
      turneram authored
      * Add initial ck_gemm code
      
      * Format
      
      * Add additional src files
      
      * Format
      
      * Add include
      
      * Simplify fuse_ck
      
      * Format
      
      * Rename var
      
      * Enable pass
      
      * Update ck version
      
      * Fix include
      
      * Add group stride
      
      * Disable warnings for ck headers
      
      * Format
      
      * Add unpack array
      
      * Add interface to enable tuning
      
      * Format
      
      * Update compile_ops to handle tuning config
      
      * Format
      
      * Add some comments
      
      * Move time_op to migraphx_gpu
      
      * Add banchmarking
      
      * Refactor
      
      * Format
      
      * Add lift class macro
      
      * Use device name
      
      * Format
      
      * Generate configs
      
      * Format
      
      * Pass tuning parameter
      
      * Move data type to is_ck_gemm matcher
      
      * Format
      
      * Add problem_cache to avoid retuning same configs
      
      * Format
      
      * Format
      
      * Mark the problems
      
      * Format
      
      * Use is_null
      
      * Format
      
      * Resize vector
      
      * Only tune with exaustive tuning
      
      * Format
      
      * Use assert
      
      * FOrmat
      
      * Tidy fixes
      
      * More tidy fixes
      
      * Format
      
      * Add license to missing files
      
      * Format
      
      * Use transform
      
      * Format
      
      * Fix tidy
      
      * Format
      
      * Fix cppcheck issues
      
      * Format
      
      * Add static_assert
      
      * Add ops header
      
      * Add assertion in batcher
      
      * Format
      
      * Improve the batch fold check
      
      * Format
      
      * Add where op workaround for CK
      
      * Skip if any input is not a supported ck type
      
      * Format
      
      * Check batch is standard
      
      * Format
      
      * Remove redundant static keyword
      
      * Update commit hash
      
      * Fix error when running without --exhaustive-tune
      
      * Formatting
      
      * Formatting
      
      * Remove fuse_ck_gemm_softmax_gemm
      
      * Update ck hash
      
      * Correct spelling mistake
      
      * Remove commented out logic from fuse_ck
      
      * Remove unused include and add comment
      
      * Formatting
      
      * Remove redundant get_shape and remove ck_gemm from names
      
      * Formatting
      
      * Allow for mixed types with int8 gemms
      
      * Formatting
      
      * Add back find_package from merge
      
      * Update CK commit hash and add gfx940 to fuse_ops supported archs
      
      * Formatting
      
      * Update CK hash
      b8898d7e
  3. 31 Jan, 2023 1 commit
  4. 02 Nov, 2022 1 commit
  5. 19 Oct, 2022 1 commit
  6. 23 Sep, 2022 1 commit
  7. 16 Sep, 2022 1 commit
  8. 08 Sep, 2022 1 commit
  9. 27 Aug, 2022 1 commit
  10. 17 Aug, 2022 1 commit
  11. 03 Jul, 2022 1 commit
    • Paul Fultz II's avatar
      Add mlir fusion (#1251) · ca8a54fe
      Paul Fultz II authored
      * Add mlir c api
      
      * Formatting
      
      * Create a type attribute
      
      * Formatting
      
      * Parse module
      
      * Formatting
      
      * Add mlir dump function
      
      * Add test case
      
      * Formatting
      
      * Fix tidy issues
      
      * Update mlit version
      
      * Update to newer mlir
      
      * Format
      
      * Move mlir to the gpu and update the test
      
      * Formatting
      
      * Fix bug when appending module
      
      * Format
      
      * Remove old cmake flag
      
      * Update message
      
      * Add return
      
      * Format
      
      * Add mlir_compile
      
      * Format
      
      * Register dialect
      
      * Handle unsinged integers
      
      * Dont provide output for return instruction
      
      * Format
      
      * Add code to insert memrefs
      
      * Format
      
      * Add mlir verification
      
      * Formatting
      
      * Enable pointwise_fusion
      
      * Disable eliminate_data_type
      
      * Set kernal name
      
      * Format
      
      * Fix device name
      
      * Formatting
      
      * Fix output arg
      
      * Format
      
      * Updates
      
      * Upate hash
      
      * Add fuse_mlir pass
      
      * Format
      
      * Add fuse mlir
      
      * Format
      
      * Update mlir
      
      * Sort parameter names
      
      * Format
      
      * Reenable disabled passes
      
      * Remove old mlir conv
      
      * Remove asym default padding
      
      * Add more verbose tracing
      
      * Format
      
      * Fix compilation errors
      
      * Format
      
      * Whitelist operators
      
      * Format
      
      * Add namespace
      
      * Format
      
      * Update triple
      
      * Format
      
      * Use func dialect
      
      * Format
      
      * Use func.return
      
      * Format
      
      * Upgrade mlir version
      
      * Add comment
      
      * Handle symetrical padding
      
      * Format
      
      * Cleanup debug output
      
      * Format
      
      * List failed tests
      
      * Move mlir compile to jit pipeline
      
      * Format
      
      * Update version
      
      * Add source locations
      
      * Format
      
      * Correctly add module
      
      * Format
      
      * Update failed tests
      
      * Fix failures when mlir is disabled
      
      * Format
      
      * Update mlir version
      
      * Check type for fp32
      
      * Format
      
      * Remove failed test
      
      * Update mlir in driver
      
      * Tidy fixes
      
      * Foramt
      
      * Tidy fixes
      
      * Format
      
      * Fix const
      
      * Remove from requirements
      
      * Fix cmake version
      
      * Fix tidy warning
      
      * Use another ifdef
      
      * Fix tidy
      
      * Other tidy fix
      
      * Format
      
      * Update hash
      
      * Add missing license files
      
      * Format
      
      * Format
      
      * Fix fnction name
      ca8a54fe
  12. 25 Jun, 2022 2 commits
  13. 22 Jun, 2022 1 commit
  14. 02 Jun, 2022 1 commit
  15. 26 May, 2022 1 commit
  16. 24 May, 2022 1 commit
  17. 17 May, 2022 1 commit
  18. 11 May, 2022 1 commit
  19. 08 Feb, 2022 1 commit
  20. 10 Jan, 2022 1 commit
  21. 30 Nov, 2021 1 commit
  22. 09 Nov, 2021 1 commit
  23. 08 Oct, 2021 1 commit
    • Umang Yadav's avatar
      Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87
      Umang Yadav authored
      Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.
      
      Aim is to have the definition of dot operator as C = A . B without having alpha or beta.
      
      In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
      21193e87
  24. 17 Sep, 2021 2 commits
    • Paul Fultz II's avatar
      985f58b0
    • Umang Yadav's avatar
      Remove alpha and beta attributes from dot operator (#945) · 9e43cb8b
      Umang Yadav authored
      This PR aims to remove alpha and beta attributes from dot operator completely.
      
      Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.
      
      Aim is to have the definition of dot operator as C = A . B without having alpha or beta.
      
      In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
      9e43cb8b
  25. 09 Jun, 2021 1 commit
    • kahmed10's avatar
      Asym pad refactor (#791) · 9a5e0c06
      kahmed10 authored
      
      
      * alternative impl
      
      * formatting
      
      * add gpu pass to insert pad
      
      * formatting
      
      * update onnx test, still need cleanup
      
      * formatting
      
      * update tf_test
      
      * modify existing tests
      
      * formatting
      
      * remove print
      
      * code cleanup
      
      * formatting
      
      * code cleanup
      
      * formatting
      
      * fix tidy and cppcheck
      
      * remove variable
      
      * add test
      
      * formatting
      
      * add test and address comments
      
      * formatting
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      9a5e0c06
  26. 25 Mar, 2021 1 commit
    • Paul Fultz II's avatar
      Add cpu fusion for gelu and layernorm (#761) · 728d083d
      Paul Fultz II authored
      
      
      * Add eliminate_data_type pass
      
      * Formatting
      
      * Auto convert quant ops
      
      * Formatting
      
      * Flip the order of decompose
      
      * Compute max size differently
      
      * Formatting
      
      * Clamp values in convert
      
      * Formatting
      
      * Fix loss of precision in reduce
      
      * Formatting
      
      * Fix bugs in reduction
      
      * Fix accumulator type in reference softmax implementation
      
      * Formatting
      
      * Update convert test
      
      * Remove unused variables
      
      * Remove unnecessary quant_dot check
      
      * Formatting
      
      * Add tests
      
      * Formatting
      
      * Remove unused code
      
      * Remove duplicate ops
      
      * Remove blaze dependency
      
      * Use set since shape::type_t is no hashable on gcc 5
      
      * Formatting
      
      * Add dnnl binary op
      
      * Formatting
      
      * Add binary and eltwise
      
      * Formatting
      
      * Add softmax
      
      * Formatting
      
      * Remove unused operators
      
      * Add missing files
      
      * Formatting
      
      * Add lrn
      
      * Formatting
      
      * Add deconvolution
      
      * Formatting
      
      * Change allocate default
      
      * Add reorder
      
      * Formatting
      
      * Add reductions
      
      * Formatting
      
      * Sort lines
      
      * Change literals in another loop
      
      * Add pow operator
      
      * Formatting
      
      * Add pow operator
      
      * Formatting
      
      * Make sure shapes are packed
      
      * Allow broadcasted inputs
      
      * Remove unused operators
      
      * Simplify functions
      
      * Remove softmax
      
      * Add sub and erf functions
      
      * Formatting
      
      * Fix bug
      
      * Formatting
      
      * Improve parallism
      
      * Formatting
      
      * Allow multiple batch dimensions
      
      * Formatting
      
      * Move literal transforms out of lowering
      
      * Formatting
      
      * Add gather operator
      
      * Sort lines
      
      * Add early exit for carry
      
      * Formatting
      
      * Add missing concat
      
      * Rename macro
      
      * Fix deep nesting
      
      * Formatting
      
      * Fix cppcheck issues
      
      * Remov else
      
      * Move attribute to typedef
      
      * Formatting
      
      * Disable maybe-uninitialized warning since its broken on gcc
      
      * Add constexpr default constructor
      
      * Formatting
      
      * Fix compiler warnings
      
      * Fix adjust_allocation test
      
      * Add layernorm matcher
      
      * Add gelu_erf matcher
      
      * Formatting
      
      * Add gelu_tanh matcher
      
      * Formatting
      
      * Remove match namespace
      
      * Formatting
      
      * Use matcher instead of string
      
      * Formatting
      
      * Add fusions
      
      * Formatting
      
      * Make input a const ref
      
      * Make this explicit for gcc 5
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      728d083d
  27. 08 Jan, 2021 1 commit
    • Paul Fultz II's avatar
      Revamp CI infrastucture (#706) · ceb4ca09
      Paul Fultz II authored
      
      
      * Add build and test github workflow
      
      * Fix cget command
      
      * Remove def-requirements.txt
      
      * Add tmate session to debug workflow
      
      * Run tmate session after installing dependencies
      
      * Print date periodically
      
      * Add clang tidy action
      
      * Seperate build and run container in two different jobs
      
      * Run bash script
      
      * Remove interactive flag
      
      * Try to mount the files
      
      * Try to use the github workspace
      
      * WIthout double braces
      
      * Use env variable
      
      * Pipe bash script in
      
      * Run using hip-clang
      
      * Use correct path
      
      * Add verbose
      
      * Remove j flag
      
      * Only run for onnx file to debug
      
      * Manually run clang-tidy
      
      * Remove quiet flag
      
      * Print header file
      
      * Printout environment
      
      * Remove extra defines
      
      * Remove fixits and config flag
      
      * Show ldd
      
      * Add tmate session
      
      * Run onnx protobuf first
      
      * Generate proto for tensorflow
      
      * Update cppcheck version
      
      * Fix some cppcheck issues
      
      * Add const
      
      * Cppcheck fixes
      
      * Formatting
      
      * Fix more cppcheck issues
      
      * Run two jobs
      
      * Cache analysis and run format checking
      
      * Fix yaml issues
      
      * Fix yaml issues
      
      * Fix indentation
      
      * Switch to hip-clang for main docker file
      
      * Use hip-clang in the readme
      
      * Fixes for jenkins
      
      * Use ccache to build
      
      * Combine file
      
      * Set restore keys
      
      * Change stage name
      
      * Build with ccache
      
      * Add missing dependency for ccache
      
      * Build debug with codecov
      
      * Fix workflow syntax
      
      * Fix list
      
      * Use quotes
      
      * Got to correct build path
      
      * Install lcov
      
      * Use sudo
      
      * Echo all commands
      
      * Setup tmate
      
      * Add verbose output
      
      * Build with cmake directly
      
      * Add pthread flag
      
      * Remove python config
      
      * Continue on error
      
      * Use on or off for cmake flag
      
      * Use always upload cache
      
      * Verbose output
      
      * Verbose output from build
      
      * Build one target
      
      * Reduce debug symbols
      
      * Increase garbage collection
      
      * Remove dmesg
      
      * Increase it to 20
      
      * Update rocm cmake version
      
      * Remove jobs from jenkins
      
      * Run on all 3 ubuntus
      
      * Remove gcc 5 jobs
      
      * Dont add flag on 16.04
      
      * Only upload coverage on 18.04
      
      * Dont build for ubuntu 20.04
      
      * Use matrix.os
      
      * Use O2 for hip-clang since lower optimizations are broken
      
      * Use rocm 3.0
      
      * Pass ccache as cmake variable instead of env variable
      
      * Build miopen from source
      
      * Show ccache statistics
      
      * Print log information
      
      * Set compression level
      
      * Use hash dir
      
      * Set hashdir
      
      * Install clang ocl from system
      
      * Up compression level
      
      * Add locale
      
      * Increase cache size to 1G
      
      * Lower compression level to 9
      
      * Remove split dwarf
      
      * Remove Og
      
      * Add back Og
      
      * Seperate debug and codecov
      
      * Add missing backlash
      
      * Garbage collect more often
      
      * Add missing locales package
      
      * Use Os
      
      * Install onednn in docker and run tests
      
      * Include target headers in tests
      
      * Increase timeout
      
      * Remove if condtion
      
      * Make flag public
      
      * Suppress memory leaks in onednn
      
      * Use equal
      
      * Add gh annotations
      
      * Update rocm-cmake version
      
      * Add ldconfig
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      ceb4ca09
  28. 06 Jan, 2021 1 commit
    • Shucai Xiao's avatar
      Module impl (#678) · c9b86f1c
      Shucai Xiao authored
      
      
      * add an api get_main_module
      
      * clang format
      
      * modify onnx unit test for module
      
      * clang format
      
      * refactor ops unit test with the get_main_module
      
      * clang format
      
      * code backup
      
      * clang format
      
      * refine module c api
      
      * add python api for module
      
      * clang format
      
      * fix a python api issue
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * refine unit tests changes
      
      * clang format
      
      * code backup
      
      * code backup
      
      * clang format
      
      * defer some changes to later PRs
      
      * change return of get_main_module from ref to pointer
      
      * clang format
      
      * add unit tests for the get_main_module_api
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * add more unit tests for more code change coverage
      
      * clang format
      
      * fixed a unit test error
      
      * clang format
      
      * fix unit test
      
      * clang format
      
      * code backup
      
      * code change for more code coverage
      
      * change program to module in various passes and matcher
      
      * clang format
      
      * modify the pass API
      
      * code backup
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * Add option to no generate a destroy method
      
      * Formatting
      
      * fix some review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * clang format
      
      * code backup
      
      * code backup
      
      * clang format
      
      * fix cppcheck errors
      
      * clang format
      
      * clang format
      
      * fix build errors
      
      * clang format
      
      * modify gpu unit tests to using module
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * Add flag to enable cpu backend
      
      * Make buffers shared
      
      * Enable optimizations
      
      * Formatting
      
      * fix review comments
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * fix a bug related to a unit test
      
      * clang format
      
      * clang format
      
      * fix a build error
      
      * remove unnecessary code
      
      * remove unnecessary files
      
      * code backup
      
      * clang format
      
      * remove the compile function from the module class
      
      * clang format
      
      * clang format
      
      * remove the context parameter from the from_value method of the module class
      
      * code refinement
      
      * clang format
      
      * merge changes from develop branch
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix a build error
      
      * fixed a merge error
      
      * fix cppcheck error
      
      * fixed review comments
      
      * clang format
      
      * fix cppcheck error
      
      * fix a cppcheck error
      
      * fix cppcheck error
      
      * fix build error caused by merge
      
      * Add missing has_op function
      
      * Formatting
      
      * merge changes from develop branch
      
      * fix a cppcheck error
      
      * fixed some review comments
      
      * clang format
      
      * remove the begin/end function of the program class
      
      * clang format
      
      * refine code and fix cppcheck error
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * add unit tests for more code coverage
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix a build error in debug mode
      
      * clang format
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      c9b86f1c
  29. 26 Nov, 2020 1 commit
  30. 20 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Fuse skip layernorm (#683) · 1bfb147d
      Paul Fultz II authored
      
      
      * Unify the vectorized and non-vectorized path
      
      * Formatting
      
      * Make fusion easily extendable
      
      * Add skip layernorm fusion
      
      * Formatting
      
      * Call correct layernorm function
      
      * Fix compile errors
      
      * Add DCE
      
      * Add test for skip layernorm
      
      * Formatting
      
      * Remove unused typedef
      
      * Formatting
      
      * Fix tidy issues
      
      * Formatting
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      1bfb147d
  31. 11 Nov, 2020 1 commit
  32. 28 Oct, 2020 1 commit
  33. 08 Oct, 2020 1 commit
    • kahmed10's avatar
      Add build flag for fast math (#639) · a5065265
      kahmed10 authored
      
      
      * add flag
      
      * formatting
      
      * remove env variable
      
      * fix api expression
      
      * add api test
      
      * add api test
      
      * add op test
      
      * formatting
      
      * fix function name
      
      * fix syntax
      
      * formatting
      
      * modify test
      
      * remove test and update doc
      
      * move test to new file
      
      * formatting
      
      * revert test files
      
      * rewrite check
      
      * New
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      a5065265
  34. 14 Sep, 2020 1 commit
  35. 25 Aug, 2020 1 commit
    • Paul Fultz II's avatar
      Improve layernorm performance (#613) · 56b3bf58
      Paul Fultz II authored
      * Use increment instead of division to compute register offset
      
      * Formatting
      
      * Limit layernorm to 1024 elements
      
      * Formatting
      
      * Add verification to driver
      
      * Formatting
      
      * Remove early return
      
      * Use block_size 256
      
      * Vectorize the kernel
      
      * Formatting
      
      * Convert to vector type
      
      * Add layernorm tests
      
      * Formatting
      
      * Formatting
      
      * Refactor layernorm to run both algos
      
      * Formatting
      
      * Fix compile error
      
      * Fix tidy warnings
      
      * Formatting
      
      * Add layernorm function
      
      * Formatting
      56b3bf58
  36. 21 Aug, 2020 1 commit
  37. 19 Aug, 2020 1 commit
  38. 18 Aug, 2020 1 commit
    • Paul Fultz II's avatar
      Register all operators in migraphx (#604) · e8be8548
      Paul Fultz II authored
      * Register ops for main migraphx
      
      * Formatting
      
      * Register cpu ops
      
      * Formatting
      
      * Show list of operators in the driver
      
      * Formatting
      
      * Simplify regiter
      
      * Try to register gpu ops
      
      * Fix compiler errors
      
      * Register rest of the gpu operators
      
      * Add some tests
      
      * Formatting
      
      * Fix gcc compiler warnings
      
      * Formatting
      
      * Fix tidy warnings
      
      * Fix compile error
      
      * Use correct op name
      
      * Register layer norm
      
      * Use const ref
      
      * Make run const
      e8be8548