- 12 May, 2020 2 commits
-
-
Thor Johnsen authored
Reversible fused adam with mt support
-
Thor Johnsen authored
-
- 08 May, 2020 1 commit
-
-
Thor Johnsen authored
-
- 07 May, 2020 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 06 May, 2020 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 05 May, 2020 1 commit
-
-
Thor Johnsen authored
-
- 04 May, 2020 1 commit
-
-
Thor Johnsen authored
-
- 01 May, 2020 1 commit
-
-
Deyu Fu authored
* Changes to make xentropysoftmax load/store vectorized when possible: Increase default ILP so that each thread handle 16 Bytes data in one step Make thread load/store longest vector possible Make unroll case handle adjacent data instead of strided, so same order compare to vector case * Add shift for not aligned case. Remove less than 16 bytes aligned access
-
- 30 Apr, 2020 5 commits
-
-
Deyu Fu authored
* modify MTA axpby for wider load/store * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
-
Deyu Fu authored
* update fused bias relu backward kernel * adding support for not require first layer dgrad * fix bug: wrong layer in requires grad * add infrastructure for optional bias and activation, currently only support no bias and no relu * make bias and relu optional separately * add sigmoid activation option
-
Burc Eryilmaz authored
Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 29 Apr, 2020 5 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 23 Apr, 2020 1 commit
-
-
ptrblck authored
* add CUDAGenerator guard * fix generator_flag * add guards for gen pointer/ref issue * change mutex_ to mutex() * add check_generator Co-authored-by:pbialecki <pbialecki@nvidia.com>
-
- 22 Apr, 2020 2 commits
-
-
Deyu Fu authored
-
Vinicius Reis authored
The LARC optimizer wraps an underlying optimizer and then needs to be passed to amp.initialize for mixed precision. There were 3 different crashes happening in this situation, fix all of them and add a unit test. I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is defined seems more reliable though.
-
- 20 Apr, 2020 3 commits
-
-
Kexin Yu authored
add additional loop for lists of params in FP16_Optimizer's load_state_dict
-
Thor Johnsen authored
-
Kexin Yu authored
-
- 16 Apr, 2020 5 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 15 Apr, 2020 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 13 Apr, 2020 1 commit
-
-
Mannat Singh authored
-
- 10 Apr, 2020 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 09 Apr, 2020 1 commit
-
-
Thor Johnsen authored
-
- 08 Apr, 2020 1 commit
-
-
Thor Johnsen authored
-
- 07 Apr, 2020 1 commit
-
-
Thor Johnsen authored
-