1. 09 May, 2023 2 commits
    • Tim Moon's avatar
    • Neta Zmora's avatar
      ONNX export refactoring (#197) · 83911ddb
      Neta Zmora authored
      
      
      * ONNX export refactoring
      
      * Remove infer_ort (to enable more testing)
      * Add BF16 ORT tests for Q/DQ ops and GELU.
        * Use FP32 i/o instead of BF16 (because ORT doesn't support BF16 i/o) and add casts from FP32 to BF16 (this is only for subgraph inputs and outputs).
        * We'll need to add more BF16 testing.
      * GEMM:
        * Add cast after DQ to achieve better performance (matmul at sub-fp32 precisions).
        * Fold bias into Gemm operation (=> smaller graphs)
        * Wrap GEMM-GELU with FP32 (TE implements GELU in FP32)
      * Enable tests for cross attention (test_export_multihead_attention)
      * Reduce test thresholds for test_export_layernorm_mlp, test_export_layernorm_linear, test_export_layernorm
      Signed-off-by: default avatarNeta Zmora <nzmora@nvidia.com>
      
      * Loosen MHA export validation thresholds for FP16
      Signed-off-by: default avatarNeta Zmora <nzmora@nvidia.com>
      
      ---------
      Signed-off-by: default avatarNeta Zmora <nzmora@nvidia.com>
      83911ddb
  2. 03 May, 2023 1 commit
  3. 02 May, 2023 3 commits
  4. 01 May, 2023 1 commit
  5. 30 Apr, 2023 1 commit
  6. 28 Apr, 2023 4 commits
  7. 27 Apr, 2023 1 commit
  8. 26 Apr, 2023 1 commit
  9. 22 Apr, 2023 2 commits
  10. 21 Apr, 2023 4 commits
  11. 20 Apr, 2023 1 commit
  12. 19 Apr, 2023 1 commit
  13. 18 Apr, 2023 1 commit
  14. 17 Apr, 2023 1 commit
  15. 14 Apr, 2023 1 commit
  16. 13 Apr, 2023 3 commits
  17. 08 Apr, 2023 1 commit
  18. 07 Apr, 2023 3 commits
  19. 05 Apr, 2023 1 commit
  20. 04 Apr, 2023 1 commit
  21. 03 Apr, 2023 1 commit
  22. 30 Mar, 2023 2 commits
  23. 29 Mar, 2023 2 commits
  24. 28 Mar, 2023 1 commit