- 24 Mar, 2022 1 commit
-
-
Shucai Xiao authored
-
- 18 Mar, 2022 1 commit
-
-
turneram authored
Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum
-
- 15 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign. To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST. Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.
-
- 14 Mar, 2022 1 commit
-
-
Shucai Xiao authored
change max number of groups in a kernel to 1B for greater performance
-
- 10 Mar, 2022 4 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 09 Mar, 2022 1 commit
-
-
Shucai Xiao authored
-
- 08 Mar, 2022 5 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 07 Mar, 2022 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 04 Mar, 2022 6 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
bpickrel authored
Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 03 Mar, 2022 5 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Paul Fultz II authored
Boost the max number of workgroups for pointwise ops by matching what we are doing in launch.hpp
-
kahmed10 authored
better performance doing it this way
-
turneram authored
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
-
- 02 Mar, 2022 2 commits
-
-
Charlie Lin authored
Implements the IsNaN operator, ref, gpu, and onnx parser.
-
bpickrel authored
Update the base version of clang-format from 5.0 to 10.0
-
- 01 Mar, 2022 4 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 28 Feb, 2022 7 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-