- 29 May, 2023 1 commit
-
-
aiss authored
-
- 30 Mar, 2023 1 commit
-
-
aiss authored
-
- 25 May, 2022 1 commit
-
-
aiss authored
-
- 28 Feb, 2021 1 commit
-
-
zmx authored
hi, i take a look at the code of column_sum_reduce, i have 2 questions: 1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable 2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf , THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ? thanks !!!! Co-authored-by:
Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 01 Dec, 2020 1 commit
-
-
Reza Yazdani authored
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 22 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
Co-authored-by:Conglong Li <conglong.li@gmail.com>
-
- 29 May, 2020 1 commit
-
-
Jeff Rasley authored
* Transformer kernels release Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com>
-