- 08 Nov, 2023 1 commit
-
-
Zakor Gyula authored
The inaccuracy was caused by ONNX round requires nearest integer rounding for halway (0.5) cases. std::round rounds away from zero, thus giving wrong results with halfway cases. Replaced std::round with std::nearbyint which uses the correct rounding by default.
-
- 20 Oct, 2023 1 commit
-
-
turneram authored
Adds workarounds to avoid passing capture ops and scalar literals from quantization as arguments to ck_gemm.
-
- 08 Aug, 2023 1 commit
-
-
kahmed10 authored
* add quant_dot fusion, clip literal opt
-
- 28 May, 2023 1 commit
-
-
Paul Fultz II authored
* Allow quantizing for both int8 and fp16
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 15 Jul, 2021 1 commit
-
-
turneram authored
* Add operators, refactor parsers, add rewrite passes, add tests * Formatting * Fix cppcheck * Review comments * Formatting * Combine rewrite passes * Formatting * Add ref implementations * Formatting * Review comments * Formatting * Tidy warnings * Apply review comments * Formatting * Fix CI error * Formatting * Increase code coverage * Formatting * Move broadcasting of scales and zero points to onnx parser * Formatting * Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type * Formatting * Increase code coverage * Formatting * Switch certain variables to int64_t * Formatting * Fix overflow in implicit constant conversion * Formatting * Increase code coverage * Formatting * Remove operators.hpp from includes in tf_test.cpp * Formatting * Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes * Formatting * Switch dequantizelinear math from int32 to float * Formatting * Remove changes to operators.hpp * Simplify apply_quantizelinear * Formatting * Add verify test for int32 data * Add rewrite_quantization back to CMakeLists
-