- 08 Feb, 2019 2 commits
-
-
Michael Carilli authored
-
Michael Carilli authored
-
- 06 Feb, 2019 7 commits
-
-
ngimel authored
Better FP16 support in pytorch fp16 utils.
-
Michael Carilli authored
-
Michael Carilli authored
-
Michael Carilli authored
-
Michael Carilli authored
-
Michael Carilli authored
-
Michael Carilli authored
-
- 05 Feb, 2019 8 commits
-
-
Jerry Ma authored
This commit adds an FP16Model class as a successor to network_to_half. The benefits of this class are: - Preservation of single-precision for BatchNorm layers. The models generated by network_to_half() convert BatchNorm moment tensors to half-precision, then back to single-precision, which hurts the accuracy of the moment estimators and occasionally results in NaNs. - Support for multi-argument nn.Modules (self-explanatory from code).
-
-
Michael Carilli authored
Removing patching of loss.backward, which appears to cause memory leaks (reference cycles?) in some models
-
Michael Carilli authored
-
mcarilli authored
apex.optimizers.FP16_Optimizer: add state_dict() and load_state_dict()
-
mcarilli authored
Restore fused kernel
-
Michael Carilli authored
-
Michael Carilli authored
-
- 04 Feb, 2019 2 commits
-
-
mcarilli authored
allowing syncBN to run with affine = False
-
Michael Carilli authored
-
- 03 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 01 Feb, 2019 8 commits
-
-
Michael Carilli authored
-
Michael Carilli authored
-
Michael Carilli authored
-
-
Michael Carilli authored
-
mcarilli authored
-
mcarilli authored
-
jiej authored
-
- 31 Jan, 2019 2 commits
-
-
Michael Carilli authored
-
Michael Carilli authored
-
- 30 Jan, 2019 4 commits
-
-
mcarilli authored
Update default dims in word_language_model to be multiples of 8 to enable Tensor Core use
-
Michael Carilli authored
-
Michael Carilli authored
Updated default sizes to be multiples of 8 to enable Tensor Core use. Added performance guidelines to README.
-
mcarilli authored
add unit tests for optimizers/fp16_optimizer
-
- 29 Jan, 2019 5 commits
- 28 Jan, 2019 1 commit
-
-
jiej authored
test update to resolve https://github.com/NVIDIA/apex/issues/134#issue-403525480 Using identical learning rate for both DDP with sync BN and single process BN. The previous configure leaves the impression that sync BN requires adjusting lr in the script, which is not true.
-