- 03 Mar, 2021 1 commit
-
-
Reza Yazdani authored
* fixing buffers in transformer kernel when gelu-checkpoint is enabled * fixing the test issue for other memory optimization flags * fixing a bug for when attn_dropout_checkpoint is enabled
-
- 28 Feb, 2021 1 commit
-
-
zmx authored
hi, i take a look at the code of column_sum_reduce, i have 2 questions: 1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable 2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf , THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ? thanks !!!! Co-authored-by:
Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 27 Feb, 2021 1 commit
-
-
vfdev authored
-
- 26 Feb, 2021 3 commits
-
-
Stas Bekman authored
-
Reza Yazdani authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
vfdev authored
-
- 24 Feb, 2021 2 commits
-
-
Reza Yazdani authored
* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer * add ACC_HALF config * use defined to check if ACC_Half is defined
-
Reza Yazdani authored
-
- 20 Feb, 2021 1 commit
-
-
Stas Bekman authored
Invalid param name Thanks.
-
- 19 Feb, 2021 1 commit
-
-
Jeff Rasley authored
-
- 18 Feb, 2021 2 commits
-
-
Reza Yazdani authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Conglong Li authored
-
- 17 Feb, 2021 2 commits
-
-
Jeff Rasley authored
-
Takuya Makino authored
-
- 16 Feb, 2021 2 commits
-
-
Olatunji Ruwase authored
* Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming
-
Cheng Li authored
* check none tensors when splitting buckets
-
- 12 Feb, 2021 4 commits
-
-
Olatunji Ruwase authored
* Activation checkpoint support for non tensor input/output * Format fixes * Address PR comments; Add ordering edge case tests
-
Jeff Rasley authored
* add -e/--examples flag to checkout submodules * bump DSE commit
-
Stas Bekman authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Sean Naren authored
* Use log dist function instead of print * Expose ranks Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 11 Feb, 2021 4 commits
-
-
Conglong Li authored
* 1-bit adam doc fix * 1-bit adam doc fix * 1-bit adam doc fix Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
sdtblck authored
-
Sean Naren authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Cheng Li authored
* work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial
-
- 10 Feb, 2021 2 commits
-
-
Olatunji Ruwase authored
* Fix docstring * Make screenshots clickable for easier viewing
-
Stas Bekman authored
-
- 09 Feb, 2021 1 commit
-
-
TheDudeFromCI authored
-
- 08 Feb, 2021 1 commit
-
-
Jon Eyolfson authored
* Improve starred expressions `deepspeed/profiling/flops_profiler/profiler.py` uses starred expressions that are no longer valid with [PEP 617][1]. The new Python parser is in 3.9, and this change allows DeepSpeed to run with the newest Python version. I have not checked all locations that has this issue. However, this change allows me to run simple examples. [1]: https://www.python.org/dev/peps/pep-0617/ * Match style for "Improve starred expressions", although readability suffers The style guide might need to be updated for this new use case of expressions. Python [Issue 40631][1] includes more discussion on the change. [1]: https://bugs.python.org/issue40631 Co-authored-by:
Cheng Li <pistasable@gmail.com>
-
- 04 Feb, 2021 1 commit
-
-
Stas Bekman authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 02 Feb, 2021 1 commit
-
-
Jeff Rasley authored
-
- 01 Feb, 2021 3 commits
-
-
Jeff Rasley authored
-
Jon Eyolfson authored
* Add executable permission to `ds_elastic` and `ds_report` in `bin`. * Automatic `ds_elastic` formatting Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Jeff Rasley authored
-
- 29 Jan, 2021 2 commits
-
-
Shaden Smith authored
-
Jeff Rasley authored
-
- 27 Jan, 2021 3 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
-
Jeff Rasley authored
-
- 26 Jan, 2021 1 commit
-
-
Ying Xiong authored
* fix wrong idx bug in invertible LayerNormBackward1 this index bug cause wrong scale grad * fix unexpected deletion * fix idx for LayerNormBackward1_fused_add * move pos defination in LayerNormBackward1 kernels * fix format error Co-authored-by:Reza Yazdani <reyazda@microsoft.com>
-
- 25 Jan, 2021 1 commit
-
-
sdtblck authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-