"vscode:/vscode.git/clone" did not exist on "fc7a867ae5cb6dfc6a7394d9571f5a0cd3bd05da"
- 29 May, 2023 1 commit
-
-
aiss authored
-
- 26 Apr, 2023 1 commit
-
-
aiss authored
-
- 30 Mar, 2023 1 commit
-
-
aiss authored
-
- 10 Aug, 2022 1 commit
-
-
aiss authored
-
- 25 May, 2022 1 commit
-
-
aiss authored
-
- 03 Mar, 2021 1 commit
-
-
Reza Yazdani authored
* fixing buffers in transformer kernel when gelu-checkpoint is enabled * fixing the test issue for other memory optimization flags * fixing a bug for when attn_dropout_checkpoint is enabled
-
- 28 Feb, 2021 1 commit
-
-
zmx authored
hi, i take a look at the code of column_sum_reduce, i have 2 questions: 1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable 2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf , THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ? thanks !!!! Co-authored-by:
Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 24 Feb, 2021 1 commit
-
-
Reza Yazdani authored
* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer * add ACC_HALF config * use defined to check if ACC_Half is defined
-
- 18 Feb, 2021 2 commits
-
-
Reza Yazdani authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Conglong Li authored
-
- 26 Jan, 2021 1 commit
-
-
Ying Xiong authored
* fix wrong idx bug in invertible LayerNormBackward1 this index bug cause wrong scale grad * fix unexpected deletion * fix idx for LayerNormBackward1_fused_add * move pos defination in LayerNormBackward1 kernels * fix format error Co-authored-by:Reza Yazdani <reyazda@microsoft.com>
-
- 13 Jan, 2021 1 commit
-
-
Reza Yazdani authored
* move workspace memory-allocation to PyTorch * refine the code based on the comments * remove unnecessary options * remove bsz from set_seq_len function
-
- 17 Dec, 2020 1 commit
-
-
Reza Yazdani authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 01 Dec, 2020 2 commits
-
-
Reza Yazdani authored
* tracking optimizer step in cpu-adam when loading checkpoint * add warning/error message for updating optimizer step count * resolve build issue * supporting state update from the python side * track step from python in all cases * remove comma
-
Reza Yazdani authored
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 12 Nov, 2020 1 commit
-
-
Jeff Rasley authored
Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com>
-
- 05 Nov, 2020 1 commit
-
-
Reza Yazdani authored
* fixing cpu-adam * fixing copy with optimizer for data and model parallelism * fixing cpu-adam * fix cpu-adam * fix cpu-adam
-
- 30 Oct, 2020 2 commits
-
-
Reza Yazdani authored
-
Reza Yazdani authored
* add adamW to CPU-ADAM implementation * supporting cpu-adam optimizer for zero-offload on deepspeed side * bump DSE to match cpu-adam updates Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 01 Oct, 2020 1 commit
-
-
Bruno authored
* Towards Windows build * formatting Co-authored-by:
Bruno Cabral <bruno@potelo.com.br> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com>
-
- 22 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
Co-authored-by:Conglong Li <conglong.li@gmail.com>
-
- 21 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
-
- 11 Sep, 2020 2 commits
-
-
Jeff Rasley authored
This reverts commit e549be60.
-
RezaYazdaniAminabadi authored
* supporting different intermediate sizes other than 4*hidden_dim * run precommit * uncommnet the unit tests Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 10 Sep, 2020 1 commit
-
-
Jeff Rasley authored
* ZeRO-Offload (squash) (#381) Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Jie <37380896+jren73@users.noreply.github.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com>
-
- 02 Sep, 2020 1 commit
-
-
Jeff Rasley authored
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com>
-
- 29 May, 2020 1 commit
-
-
Jeff Rasley authored
* Transformer kernels release Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com>
-
- 16 Apr, 2020 1 commit
-
-
Jeff Rasley authored
-
- 03 Feb, 2020 2 commits
-
-
Samyam Rajbhandari authored
Lamb CUDA Kernels
-
Jeff Rasley authored
-