1. 17 Aug, 2022 3 commits
  2. 16 Aug, 2022 2 commits
  3. 12 Aug, 2022 3 commits
  4. 02 Aug, 2022 1 commit
  5. 29 Jul, 2022 1 commit
    • Umang Yadav's avatar
      Avoid registering host buffer ptr multiple times during hip copies (#1245) · 7596f3f1
      Umang Yadav authored
      Currently, while copying a host buffer to the device, it first registers/maps the host buffer pointer to address space of the device.
      
      If the host buffer has been allocated by the hipHostMalloc then, it is implicitly registered to the device's address space, and no need to register again. This PR adds a check for the same.
      7596f3f1
  6. 25 Jul, 2022 1 commit
    • Ted Themistokleous's avatar
      Add onnx mod operator (#1302) · 77e80b8e
      Ted Themistokleous authored
      * Add in changes for onnx Mod operator
      
      Initial operator for mod implementation and test cases for integer and floating based types.
      
      Need to use fmod from stdlib for floating point types. half_float::half thankfully is specced to the use the existing std::fmod() call when looking at the half.hpp implementation.
      
      fmod_flag should mirror the onnx fmod attribute. Right now using a floating point type without setting that on the user side to true will result in an exception.
      
      Ref ticket #1283 
      77e80b8e
  7. 19 Jul, 2022 1 commit
    • Charlie Lin's avatar
      Fix op includes (#1308) · 39b307b2
      Charlie Lin authored
      Changes to operator includes:
      
      removed some includes that were not used
      included argument.hpp where clang-tidy wanted it
      39b307b2
  8. 12 Jul, 2022 1 commit
  9. 11 Jul, 2022 2 commits
  10. 08 Jul, 2022 1 commit
  11. 06 Jul, 2022 1 commit
    • Paul Fultz II's avatar
      Verify load and save (#1265) · f2531606
      Paul Fultz II authored
      *In the verification tests, check that saving and reloading the program is the same program. This also fixes serialization to always load instructions in the same order. There is also fixes for deconv and quant_conv which didn't save the solution id, and was broken for serialization.
      f2531606
  12. 05 Jul, 2022 1 commit
  13. 03 Jul, 2022 1 commit
    • Paul Fultz II's avatar
      Add mlir fusion (#1251) · ca8a54fe
      Paul Fultz II authored
      * Add mlir c api
      
      * Formatting
      
      * Create a type attribute
      
      * Formatting
      
      * Parse module
      
      * Formatting
      
      * Add mlir dump function
      
      * Add test case
      
      * Formatting
      
      * Fix tidy issues
      
      * Update mlit version
      
      * Update to newer mlir
      
      * Format
      
      * Move mlir to the gpu and update the test
      
      * Formatting
      
      * Fix bug when appending module
      
      * Format
      
      * Remove old cmake flag
      
      * Update message
      
      * Add return
      
      * Format
      
      * Add mlir_compile
      
      * Format
      
      * Register dialect
      
      * Handle unsinged integers
      
      * Dont provide output for return instruction
      
      * Format
      
      * Add code to insert memrefs
      
      * Format
      
      * Add mlir verification
      
      * Formatting
      
      * Enable pointwise_fusion
      
      * Disable eliminate_data_type
      
      * Set kernal name
      
      * Format
      
      * Fix device name
      
      * Formatting
      
      * Fix output arg
      
      * Format
      
      * Updates
      
      * Upate hash
      
      * Add fuse_mlir pass
      
      * Format
      
      * Add fuse mlir
      
      * Format
      
      * Update mlir
      
      * Sort parameter names
      
      * Format
      
      * Reenable disabled passes
      
      * Remove old mlir conv
      
      * Remove asym default padding
      
      * Add more verbose tracing
      
      * Format
      
      * Fix compilation errors
      
      * Format
      
      * Whitelist operators
      
      * Format
      
      * Add namespace
      
      * Format
      
      * Update triple
      
      * Format
      
      * Use func dialect
      
      * Format
      
      * Use func.return
      
      * Format
      
      * Upgrade mlir version
      
      * Add comment
      
      * Handle symetrical padding
      
      * Format
      
      * Cleanup debug output
      
      * Format
      
      * List failed tests
      
      * Move mlir compile to jit pipeline
      
      * Format
      
      * Update version
      
      * Add source locations
      
      * Format
      
      * Correctly add module
      
      * Format
      
      * Update failed tests
      
      * Fix failures when mlir is disabled
      
      * Format
      
      * Update mlir version
      
      * Check type for fp32
      
      * Format
      
      * Remove failed test
      
      * Update mlir in driver
      
      * Tidy fixes
      
      * Foramt
      
      * Tidy fixes
      
      * Format
      
      * Fix const
      
      * Remove from requirements
      
      * Fix cmake version
      
      * Fix tidy warning
      
      * Use another ifdef
      
      * Fix tidy
      
      * Other tidy fix
      
      * Format
      
      * Update hash
      
      * Add missing license files
      
      * Format
      
      * Format
      
      * Fix fnction name
      ca8a54fe
  14. 25 Jun, 2022 2 commits
  15. 23 Jun, 2022 1 commit
  16. 22 Jun, 2022 1 commit
  17. 20 Jun, 2022 1 commit
  18. 17 Jun, 2022 2 commits
    • Umang Yadav's avatar
      Update lowering of Dot operator (#1247) · c99be32c
      Umang Yadav authored
      
      
      * remove code for allocation of C param in dot lowering
      
      * formatting
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      c99be32c
    • kahmed10's avatar
      Create allocate op and replace_allocate pass (#1183) · add6fb3b
      kahmed10 authored
      
      
      * add allocate op header
      
      * formatting
      
      * add replace_allocate pass
      
      * formatting
      
      * move output param to remove_allocate pass
      
      * formatting
      
      * fix bugs in replace_allocate pass
      
      * formatting
      
      * fix verify if tests
      
      * formatting
      
      * move if op logic
      
      * formatting
      
      * cleanup lowering
      
      * cleanup lowering
      
      * formatting
      
      * fix tidy
      
      * formatting
      
      * fix tidy
      
      * add cpu allocate check
      
      * formatting
      
      * change cpu allocate in pass
      
      * formatting
      
      * add some tests for replace_allocate pass
      
      * formatting
      
      * pass by ref
      
      * fix run_pass
      
      * formatting
      
      * update variable name for module
      
      * update dce to use contains() and fix tidy
      
      * formatting
      
      * update cppcheck
      
      * add if test
      
      * formatting
      
      * add if test
      
      * rename var to mod_output_names
      
      * formatting
      
      * remove conditional
      
      * update allocate op and tests
      
      * formatting
      
      * update replace_allocate tests
      
      * update create_output_names() and conditional in replace_allocate
      
      * formatting
      
      * remove extra variable in replace_allocate
      
      * update tools script for allocation_model
      Co-authored-by: default avatarUmang Yadav <29876643+umangyadav@users.noreply.github.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      add6fb3b
  19. 10 Jun, 2022 1 commit
  20. 07 Jun, 2022 1 commit
  21. 03 Jun, 2022 1 commit
    • Paul Fultz II's avatar
      Group code objects by kernel name in perf report summary (#1234) · 7271ddbc
      Paul Fultz II authored
      Break up the gpu::code_object  print to show the actual kernels...
      
      gpu::code_object::add_kernel: 0.646121ms, 5%
      gpu::code_object::mul_kernel: 0.623822ms, 5%
      gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
      gpu::code_object::mul_add_kernel: 0.478352ms, 4%
      7271ddbc
  22. 02 Jun, 2022 1 commit
  23. 26 May, 2022 1 commit
  24. 24 May, 2022 3 commits
  25. 20 May, 2022 1 commit
    • kahmed10's avatar
      Rename pointwise ops (#1145) · 4a312201
      kahmed10 authored
      For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
      4a312201
  26. 17 May, 2022 1 commit
  27. 11 May, 2022 1 commit
  28. 09 May, 2022 1 commit
  29. 06 May, 2022 1 commit
  30. 05 May, 2022 1 commit
    • Paul Fultz II's avatar
      Cppcheck fixes (#1195) · d582425b
      Paul Fultz II authored
      Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
      d582425b