1. 05 Jul, 2022 2 commits
  2. 03 Jul, 2022 1 commit
    • Paul Fultz II's avatar
      Add mlir fusion (#1251) · ca8a54fe
      Paul Fultz II authored
      * Add mlir c api
      
      * Formatting
      
      * Create a type attribute
      
      * Formatting
      
      * Parse module
      
      * Formatting
      
      * Add mlir dump function
      
      * Add test case
      
      * Formatting
      
      * Fix tidy issues
      
      * Update mlit version
      
      * Update to newer mlir
      
      * Format
      
      * Move mlir to the gpu and update the test
      
      * Formatting
      
      * Fix bug when appending module
      
      * Format
      
      * Remove old cmake flag
      
      * Update message
      
      * Add return
      
      * Format
      
      * Add mlir_compile
      
      * Format
      
      * Register dialect
      
      * Handle unsinged integers
      
      * Dont provide output for return instruction
      
      * Format
      
      * Add code to insert memrefs
      
      * Format
      
      * Add mlir verification
      
      * Formatting
      
      * Enable pointwise_fusion
      
      * Disable eliminate_data_type
      
      * Set kernal name
      
      * Format
      
      * Fix device name
      
      * Formatting
      
      * Fix output arg
      
      * Format
      
      * Updates
      
      * Upate hash
      
      * Add fuse_mlir pass
      
      * Format
      
      * Add fuse mlir
      
      * Format
      
      * Update mlir
      
      * Sort parameter names
      
      * Format
      
      * Reenable disabled passes
      
      * Remove old mlir conv
      
      * Remove asym default padding
      
      * Add more verbose tracing
      
      * Format
      
      * Fix compilation errors
      
      * Format
      
      * Whitelist operators
      
      * Format
      
      * Add namespace
      
      * Format
      
      * Update triple
      
      * Format
      
      * Use func dialect
      
      * Format
      
      * Use func.return
      
      * Format
      
      * Upgrade mlir version
      
      * Add comment
      
      * Handle symetrical padding
      
      * Format
      
      * Cleanup debug output
      
      * Format
      
      * List failed tests
      
      * Move mlir compile to jit pipeline
      
      * Format
      
      * Update version
      
      * Add source locations
      
      * Format
      
      * Correctly add module
      
      * Format
      
      * Update failed tests
      
      * Fix failures when mlir is disabled
      
      * Format
      
      * Update mlir version
      
      * Check type for fp32
      
      * Format
      
      * Remove failed test
      
      * Update mlir in driver
      
      * Tidy fixes
      
      * Foramt
      
      * Tidy fixes
      
      * Format
      
      * Fix const
      
      * Remove from requirements
      
      * Fix cmake version
      
      * Fix tidy warning
      
      * Use another ifdef
      
      * Fix tidy
      
      * Other tidy fix
      
      * Format
      
      * Update hash
      
      * Add missing license files
      
      * Format
      
      * Format
      
      * Fix fnction name
      ca8a54fe
  3. 30 Jun, 2022 1 commit
  4. 29 Jun, 2022 2 commits
  5. 26 Jun, 2022 1 commit
  6. 25 Jun, 2022 2 commits
  7. 24 Jun, 2022 1 commit
  8. 23 Jun, 2022 1 commit
  9. 22 Jun, 2022 1 commit
  10. 20 Jun, 2022 1 commit
  11. 17 Jun, 2022 3 commits
    • Umang Yadav's avatar
      Update lowering of Dot operator (#1247) · c99be32c
      Umang Yadav authored
      
      
      * remove code for allocation of C param in dot lowering
      
      * formatting
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      c99be32c
    • Ted Themistokleous's avatar
      Update tf_parser to have add_common_op() for parse_relu6 (#1241) · 421a5621
      Ted Themistokleous authored
      
      
      * [#935] Update tf_parser to have add_common_op() for parse_relu6
      
      Similar to that of the onnx_parser.cpp add a add_common_op template and functionality to support clip based operations. This is done so clip operations can be guarenteed to have the same dimensions.
      
      * fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6
      
      * fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6
      
      * fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6
      
      * fixup! fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6
      
      * Formatting
      
      * fixup! Formatting
      Co-authored-by: default avatarUmang Yadav <29876643+umangyadav@users.noreply.github.com>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      421a5621
    • kahmed10's avatar
      Create allocate op and replace_allocate pass (#1183) · add6fb3b
      kahmed10 authored
      
      
      * add allocate op header
      
      * formatting
      
      * add replace_allocate pass
      
      * formatting
      
      * move output param to remove_allocate pass
      
      * formatting
      
      * fix bugs in replace_allocate pass
      
      * formatting
      
      * fix verify if tests
      
      * formatting
      
      * move if op logic
      
      * formatting
      
      * cleanup lowering
      
      * cleanup lowering
      
      * formatting
      
      * fix tidy
      
      * formatting
      
      * fix tidy
      
      * add cpu allocate check
      
      * formatting
      
      * change cpu allocate in pass
      
      * formatting
      
      * add some tests for replace_allocate pass
      
      * formatting
      
      * pass by ref
      
      * fix run_pass
      
      * formatting
      
      * update variable name for module
      
      * update dce to use contains() and fix tidy
      
      * formatting
      
      * update cppcheck
      
      * add if test
      
      * formatting
      
      * add if test
      
      * rename var to mod_output_names
      
      * formatting
      
      * remove conditional
      
      * update allocate op and tests
      
      * formatting
      
      * update replace_allocate tests
      
      * update create_output_names() and conditional in replace_allocate
      
      * formatting
      
      * remove extra variable in replace_allocate
      
      * update tools script for allocation_model
      Co-authored-by: default avatarUmang Yadav <29876643+umangyadav@users.noreply.github.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      add6fb3b
  12. 16 Jun, 2022 1 commit
  13. 10 Jun, 2022 1 commit
  14. 07 Jun, 2022 1 commit
  15. 03 Jun, 2022 1 commit
    • Paul Fultz II's avatar
      Group code objects by kernel name in perf report summary (#1234) · 7271ddbc
      Paul Fultz II authored
      Break up the gpu::code_object  print to show the actual kernels...
      
      gpu::code_object::add_kernel: 0.646121ms, 5%
      gpu::code_object::mul_kernel: 0.623822ms, 5%
      gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
      gpu::code_object::mul_add_kernel: 0.478352ms, 4%
      7271ddbc
  16. 02 Jun, 2022 1 commit
  17. 30 May, 2022 1 commit
    • shivadbhavsar's avatar
      Improve eliminate contiguous pass (#1223) · 86061b4d
      shivadbhavsar authored
      Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.
      86061b4d
  18. 26 May, 2022 2 commits
    • shivadbhavsar's avatar
      Parallelize evaluations in propagate_constant (#1220) · bf603a76
      shivadbhavsar authored
      Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.
      
      New approach:
      
      Perform single pass though instructions in the module to determine which instructions can be evaluated
      Evaluate selected instructions in parallel
      Replace the selected instructions with the corresponding literal
      bf603a76
    • Paul Fultz II's avatar
      Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
      Paul Fultz II authored
      * Upgrade to cppcheck 2.8
      a401e72a
  19. 24 May, 2022 4 commits
  20. 20 May, 2022 2 commits
  21. 17 May, 2022 1 commit
  22. 11 May, 2022 1 commit
  23. 10 May, 2022 1 commit
  24. 09 May, 2022 1 commit
  25. 06 May, 2022 1 commit
  26. 05 May, 2022 1 commit
    • Paul Fultz II's avatar
      Cppcheck fixes (#1195) · d582425b
      Paul Fultz II authored
      Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
      d582425b
  27. 03 May, 2022 1 commit
  28. 29 Apr, 2022 1 commit
  29. 27 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Add lane reduction (#1180) · 4c72cc95
      Paul Fultz II authored
      With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:
      
      # lane
      gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
      # block
      gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
      # original
      gpu::reduce_sum[axes={1}]: 6.73456ms
      There is some basic logic to pick between lane and block reduce automatically.
      4c72cc95
  30. 26 Apr, 2022 1 commit