1. 07 Dec, 2023 1 commit
  2. 08 Nov, 2023 1 commit
    • Zakor Gyula's avatar
      Fix Round operator inaccuracy (#2244) · 48c4453c
      Zakor Gyula authored
      The inaccuracy was caused by ONNX round requires nearest integer rounding for halway (0.5) cases.
      std::round rounds away from zero, thus giving wrong results with halfway cases.
      Replaced std::round with std::nearbyint which uses the correct rounding by default.
      48c4453c
  3. 20 Oct, 2023 1 commit
    • turneram's avatar
      CK GEMM Int8 Bug Fixes (#2229) · f47e0b5b
      turneram authored
      Adds workarounds to avoid passing capture ops and scalar literals from quantization as arguments to ck_gemm.
      f47e0b5b
  4. 08 Aug, 2023 1 commit
  5. 28 May, 2023 1 commit
  6. 22 Jun, 2022 1 commit
  7. 15 Jul, 2021 1 commit
    • turneram's avatar
      Quantize linear ops (#843) · 3282e01a
      turneram authored
      * Add operators, refactor parsers, add rewrite passes, add tests
      
      * Formatting
      
      * Fix cppcheck
      
      * Review comments
      
      * Formatting
      
      * Combine rewrite passes
      
      * Formatting
      
      * Add ref implementations
      
      * Formatting
      
      * Review comments
      
      * Formatting
      
      * Tidy warnings
      
      * Apply review comments
      
      * Formatting
      
      * Fix CI error
      
      * Formatting
      
      * Increase code coverage
      
      * Formatting
      
      * Move broadcasting of scales and zero points to onnx parser
      
      * Formatting
      
      * Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
      
      * Formatting
      
      * Increase code coverage
      
      * Formatting
      
      * Switch certain variables to int64_t
      
      * Formatting
      
      * Fix overflow in implicit constant conversion
      
      * Formatting
      
      * Increase code coverage
      
      * Formatting
      
      * Remove operators.hpp from includes in tf_test.cpp
      
      * Formatting
      
      * Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes
      
      * Formatting
      
      * Switch dequantizelinear math from int32 to float
      
      * Formatting
      
      * Remove changes to operators.hpp
      
      * Simplify apply_quantizelinear
      
      * Formatting
      
      * Add verify test for int32 data
      
      * Add rewrite_quantization back to CMakeLists
      3282e01a