- 15 May, 2020 1 commit
-
-
rohithkrn authored
-
- 13 May, 2020 2 commits
- 12 May, 2020 3 commits
-
-
Chaitanya Sri Krishna Lolla authored
-
rohithkrn authored
-
rohithkrn authored
-
- 11 May, 2020 1 commit
-
-
rohithkrn authored
-
- 09 May, 2020 1 commit
-
-
rohithkrn authored
-
- 08 May, 2020 2 commits
- 07 May, 2020 3 commits
-
-
Chaitanya Sri Krishna Lolla authored
-
Chaitanya Sri Krishna Lolla authored
-
Chaitanya Sri Krishna Lolla authored
* fix dropout scaling from p to 1/(1-p) (#816) Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com> * Improvements to apex.mlp (#804) * update fused bias relu backward kernel * adding support for not require first layer dgrad * fix bug: wrong layer in requires grad * add infrastructure for optional bias and activation, currently only support no bias and no relu * make bias and relu optional separately * add sigmoid activation option * enable wider load/store for multi_tensor_apply kernels (#763) * modify MTA axpby for wider load/store * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load * Changes to make xentropysoftmax load/store vectorized when possible: (#725) * Changes to make xentropysoftmax load/store vectorized when possible: Increase default ILP so that each thread handle 16 Bytes data in one step Make thread load/store longest vector possible Make unroll case handle adjacent data instead of strided...
-
- 28 Apr, 2020 1 commit
-
-
Chaitanya Sri Krishna Lolla authored
* Initial commit to hipify all cuda code * enable multi_tensor_apply extension * added generatedFileCleaner to handle nested hip files
-
- 23 Apr, 2020 1 commit
-
-
ptrblck authored
* add CUDAGenerator guard * fix generator_flag * add guards for gen pointer/ref issue * change mutex_ to mutex() * add check_generator Co-authored-by:pbialecki <pbialecki@nvidia.com>
-
- 22 Apr, 2020 2 commits
-
-
Deyu Fu authored
-
Vinicius Reis authored
The LARC optimizer wraps an underlying optimizer and then needs to be passed to amp.initialize for mixed precision. There were 3 different crashes happening in this situation, fix all of them and add a unit test. I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is defined seems more reliable though.
-
- 20 Apr, 2020 2 commits
- 13 Apr, 2020 1 commit
-
-
Mannat Singh authored
-
- 05 Apr, 2020 2 commits
- 03 Apr, 2020 4 commits
- 02 Apr, 2020 1 commit
-
-
Kexin Yu authored
-
- 01 Apr, 2020 2 commits
- 31 Mar, 2020 2 commits
-
-
Kexin Yu authored
-
Jeff Bowles authored
-
- 25 Mar, 2020 1 commit
-
-
msbaines authored
The cuda kernel used by fused-adam was using the default stream on the default device. The kernel needs use the same device as the parameter tensor. Fixed by using context manager to set correct default device. For the use_mt case, raised an error. Alternatively, the use_mt case could launch one kernel per cuda device. The non-contrib version will also need to be fixed. Co-authored-by:Mandeep Singh Baines <msb@fb.com>
-
- 23 Mar, 2020 2 commits
- 21 Mar, 2020 2 commits
- 20 Mar, 2020 3 commits
- 17 Mar, 2020 1 commit
-
-
Kexin Yu authored
add additional loop for lists of params when loading state_dict in apex.contrib.optimizers.FP16_Optimizer
-