Commits · c25a91b60c5192065dfdcabd373b947aa2234fe1 · OpenDAS / deepspeed

"vscode:/vscode.git/clone" did not exist on "fc7a867ae5cb6dfc6a7394d9571f5a0cd3bd05da"

29 May, 2023 1 commit
- update v0.9.2 · 5bcc463d
  aiss authored May 29, 2023
  
  5bcc463d
26 Apr, 2023 1 commit
- delete hip file · 4acf0e01
  aiss authored Apr 26, 2023
  
  4acf0e01
30 Mar, 2023 1 commit
- push dsv0.8.2 version · 67ea635f
  aiss authored Mar 30, 2023
  
  67ea635f
10 Aug, 2022 1 commit
- modify version code · 1b2721ad
  aiss authored Aug 10, 2022
  
  1b2721ad
25 May, 2022 1 commit
- push Deepspeed 0.6.3 rocm version · 7d1a83a9
  aiss authored May 25, 2022
  
  7d1a83a9
03 Mar, 2021 1 commit

Fixing gelu_checkpointing memory issue (#812) · 8295d7a8

Reza Yazdani authored Mar 03, 2021

* fixing buffers in transformer kernel when gelu-checkpoint is enabled

* fixing the test issue for other memory optimization flags

* fixing a bug for when attn_dropout_checkpoint is enabled

8295d7a8

28 Feb, 2021 1 commit

issue with the implementation of column_sum_reduce (#804) · 937c5cee

zmx authored Mar 01, 2021

hi, i take a look at the code of column_sum_reduce, i have 2 questions:
   1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable
   2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf

, THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ?
thanks !!!!
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

937c5cee

24 Feb, 2021 1 commit

Fix the bias-add and add the layer-norm-eps parameter (#791) · e2dfcadf

Reza Yazdani authored Feb 24, 2021

* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer

* add ACC_HALF config

* use defined to check if ACC_Half is defined

e2dfcadf

18 Feb, 2021 2 commits
- CPU-Adam fix for scalar mode (#735) · ee1ffe2e
  Reza Yazdani authored Feb 18, 2021
```
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
```
  ee1ffe2e
- Fix transformer kernel CUDA illegal memory access error (#765) · 1fcc5f7a
  Conglong Li authored Feb 17, 2021
  
  1fcc5f7a
26 Jan, 2021 1 commit

Fix wrong idx bug in invertible LayerNormBackward1 (#692) · 5221832e

Ying Xiong authored Jan 26, 2021



* fix wrong idx bug in invertible LayerNormBackward1

this index bug cause wrong scale grad

* fix unexpected deletion

* fix idx for LayerNormBackward1_fused_add

* move pos defination in LayerNormBackward1 kernels

* fix format error
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

5221832e

13 Jan, 2021 1 commit

Move workspace memory-allocation to PyTorch (#661) · 981bc7d4

Reza Yazdani authored Jan 12, 2021

* move workspace memory-allocation to PyTorch

* refine the code based on the comments

* remove unnecessary options

* remove bsz from set_seq_len function

981bc7d4

17 Dec, 2020 1 commit
- Transformer-kernel - supporting any arbitrary sequence-length (#587) · fd2f970b
  Reza Yazdani authored Dec 17, 2020
```
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
```
  fd2f970b
01 Dec, 2020 2 commits

tracking optimizer step in cpu-adam when loading checkpoint (#564) · 9f52a36f

Reza Yazdani authored Dec 01, 2020

* tracking optimizer step in cpu-adam when loading checkpoint

* add warning/error message for updating optimizer step count

* resolve build issue

* supporting state update from the python side

* track step from python in all cases

* remove comma

9f52a36f

supporting different hidden dimensions (#559) · c78c29f9

Reza Yazdani authored Dec 01, 2020



* supporting different hidden dimensions

* add support for larger hidden dimensions (greater than 8K)

* remove empty line

* add loop unrolling factor for dropout kernels

* update different kernels based on the reviews
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

c78c29f9

12 Nov, 2020 1 commit

DeepSpeed JIT op + PyPI support (#496) · 31f46fee

Jeff Rasley authored Nov 12, 2020


Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

31f46fee

05 Nov, 2020 1 commit

Fixing CPU-Adam convergence issue (#503) · 7d4d742b

Reza Yazdani authored Nov 05, 2020

* fixing cpu-adam

* fixing copy with optimizer for data and model parallelism

* fixing cpu-adam

* fix cpu-adam

* fix cpu-adam

7d4d742b

30 Oct, 2020 2 commits
- fixing the AVX_256 compatibility (#497) · 4c37d705
  Reza Yazdani authored Oct 30, 2020
  
  4c37d705
- Add CPUAdam optimizer for zero-offload in deepspeed engine (#484) · f5aa2547
  Reza Yazdani authored Oct 30, 2020
```
* add adamW to CPU-ADAM implementation

* supporting cpu-adam optimizer for zero-offload on deepspeed side

* bump DSE to match cpu-adam updates
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
```
  f5aa2547
01 Oct, 2020 1 commit

Use parentesis around min and max to enable Windows build (#449) · 95575579

Bruno authored Oct 01, 2020



* Towards Windows build

* formatting
Co-authored-by: Bruno Cabral <bruno@potelo.com.br>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

95575579

22 Sep, 2020 1 commit
- support dynamic sequence length in transformer kernels (#424) · f0f2a702
  RezaYazdaniAminabadi authored Sep 21, 2020
```
Co-authored-by: Conglong Li <conglong.li@gmail.com>
```
  f0f2a702
21 Sep, 2020 1 commit
- Add configurable intermediate size to transformer kernels (#423) · a148bd33
  RezaYazdaniAminabadi authored Sep 21, 2020
  
  a148bd33
11 Sep, 2020 2 commits
- Revert "supporting different intermediate sizes other than 4 * hidden_dim (#389)" (#404) · 4ac9bf60
  Jeff Rasley authored Sep 11, 2020
```
This reverts commit e549be60.
```
  4ac9bf60
- supporting different intermediate sizes other than 4 * hidden_dim (#389) · e549be60
  RezaYazdaniAminabadi authored Sep 11, 2020
```
* supporting different intermediate sizes other than 4*hidden_dim

* run precommit

* uncommnet the unit tests
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
```
  e549be60
10 Sep, 2020 1 commit

ZeRO-Offload release (#391) · 41db1c2f

Jeff Rasley authored Sep 09, 2020



* ZeRO-Offload (squash) (#381)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jie <37380896+jren73@users.noreply.github.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>
Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>

41db1c2f

02 Sep, 2020 1 commit

Sparse attn + ops/runtime refactor + v0.3.0 (#343) · e5bbc2e5

Jeff Rasley authored Sep 01, 2020



* Sparse attn + ops/runtime refactor + v0.3.0
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>

e5bbc2e5

29 May, 2020 1 commit

Transformer kernel release (#242) · 734d8991

Jeff Rasley authored May 29, 2020



* Transformer kernels release
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Tunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Tunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>

734d8991

16 Apr, 2020 1 commit
- Delete tmp · 675d73e0
  Jeff Rasley authored Apr 15, 2020
  
  675d73e0
03 Feb, 2020 2 commits
- Add files via upload · 7e7b0a8d
  Samyam Rajbhandari authored Feb 03, 2020
```
Lamb CUDA Kernels
```
  7e7b0a8d
- add tmp file · c04ae78a
  Jeff Rasley authored Feb 03, 2020
  
  c04ae78a