"src/vscode:/vscode.git/clone" did not exist on "0905762b14c846b4b910d6aa37755d75b9897a9b"
- 10 Mar, 2022 3 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 08 Mar, 2022 5 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 07 Mar, 2022 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 04 Mar, 2022 5 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 03 Mar, 2022 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 02 Mar, 2022 1 commit
-
-
bpickrel authored
Update the base version of clang-format from 5.0 to 10.0
-
- 01 Mar, 2022 4 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 28 Feb, 2022 8 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 26 Feb, 2022 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 08 Feb, 2022 3 commits
-
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
- 31 Jan, 2022 1 commit
-
-
Khalique Ahmed authored
-
- 09 Dec, 2021 1 commit
-
-
Shucai Xiao authored
Changed the number of threads in a block from 256 to 128 Increased the max number of blocks in the kernel from 256 to 1M. For the case that the axis is the last dimension, we removed the computation of index since it is not required. With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
-
- 08 Oct, 2021 1 commit
-
-
Shucai Xiao authored
This PR is for the nonzero operator with static output shape. Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com> Co-authored-by:
mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 01 Oct, 2021 1 commit
-
-
turneram authored
Add multinomial op to onnx parser with ref and GPU implementations. The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution. Resolves #821 Co-authored-by:Shucai Xiao <shucai@gmail.com>
-
- 27 Sep, 2021 1 commit
-
-
kahmed10 authored
Checks wavefront size, then changes implementation and number of threads for DPP reduce
-