Commits · dfc18d6c3fe6540903273e052ad720768d16550e · gaoqiong / MIGraphX

08 Nov, 2023 1 commit

Fix Round operator inaccuracy (#2244) · 48c4453c

Zakor Gyula authored Nov 08, 2023

The inaccuracy was caused by ONNX round requires nearest integer rounding for halway (0.5) cases.
std::round rounds away from zero, thus giving wrong results with halfway cases.
Replaced std::round with std::nearbyint which uses the correct rounding by default.

48c4453c

20 Oct, 2023 1 commit

CK GEMM Int8 Bug Fixes (#2229) · f47e0b5b

turneram authored Oct 19, 2023

Adds workarounds to avoid passing capture ops and scalar literals from quantization as arguments to ck_gemm.

f47e0b5b

08 Aug, 2023 1 commit
- int8 optimizations (#1973) · f787d5bd
  kahmed10 authored Aug 08, 2023
```
* add quant_dot fusion, clip literal opt
```
  f787d5bd
28 May, 2023 1 commit
- Enable quantizing both int8 and fp16 in the driver (#1757) · 26c1efa5
  Paul Fultz II authored May 28, 2023
```
* Allow quantizing for both int8 and fp16
```
  26c1efa5
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
15 Jul, 2021 1 commit

Quantize linear ops (#843) · 3282e01a

turneram authored Jul 15, 2021

* Add operators, refactor parsers, add rewrite passes, add tests

* Formatting

* Fix cppcheck

* Review comments

* Formatting

* Combine rewrite passes

* Formatting

* Add ref implementations

* Formatting

* Review comments

* Formatting

* Tidy warnings

* Apply review comments

* Formatting

* Fix CI error

* Formatting

* Increase code coverage

* Formatting

* Move broadcasting of scales and zero points to onnx parser

* Formatting

* Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type

* Formatting

* Increase code coverage

* Formatting

* Switch certain variables to int64_t

* Formatting

* Fix overflow in implicit constant conversion

* Formatting

* Increase code coverage

* Formatting

* Remove operators.hpp from includes in tf_test.cpp

* Formatting

* Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes

* Formatting

* Switch dequantizelinear math from int32 to float

* Formatting

* Remove changes to operators.hpp

* Simplify apply_quantizelinear

* Formatting

* Add verify test for int32 data

* Add rewrite_quantization back to CMakeLists

3282e01a