"...autoencoder/AdditiveGaussianNoiseAutoencoderRunner.py" did not exist on "289a2f99a7df528f6193a5ab3ee284ff3112b731"
- 16 Dec, 2020 1 commit
-
-
Min Xu authored
* [doc]: AdaScale example and notes * formatted notes correctly as suggested by Benjamin * added feature and unit test to make sure lr_scheduler works * update the example with lr_scheduler * fixed doc with "make html" * addressed Mike's suggestions
-
- 01 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 18 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea * adding stubs & explanations in the documentation
-
- 16 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
-
- 11 Nov, 2020 1 commit
-
-
msbaines authored
-
- 10 Nov, 2020 1 commit
-
-
Tom Birch authored
Adds support for: * Reused layers (e.g. for weight sharing) * Lazily-constructed layers * Single-process control via PipeRPCWrapper * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive Also added examples for multi-process and PipeRPCWrapper
-
- 28 Oct, 2020 1 commit
-
-
msbaines authored
-
- 23 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* small refactor, getting rid of the while loop
-
- 21 Oct, 2020 1 commit
-
-
Min Xu authored
- Aurick noticed this bug and I ran into it yesterday - after the fix, our cifar training shows same gain values from different replics now: ``` 20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3512124098087777 20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3512124098087777 20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000600 fwd 0:00:00.003678 loss 0:00:00.000086 bwd 0:00:00.314158 update 0:00:00.002132 rest 0:00:00.000399 20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000643 fwd 0:00:00.003460 loss 0:00:00.000084 bwd 0:00:00.314678 update 0:00:00.002001 rest 0:00:00.000408 20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3514997779980324 20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3514997779980324 20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000732 fwd 0:00:00.003689 loss 0:00:00.000086 bwd 0:00:00.314176 update 0:00:00.002146 rest 0:00:00.000397 20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000646 fwd 0:00:00.003542 loss 0:00:00.000089 bwd 0:00:00.314549 update 0:00:00.001956 rest 0:00:00.000392 20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.352149646693932 20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.352149646693932 ```
-
- 20 Oct, 2020 2 commits
- 14 Oct, 2020 1 commit
-
-
msbaines authored
-
- 02 Oct, 2020 1 commit
-
-
msbaines authored
-
- 17 Sep, 2020 1 commit
-
-
Tom Birch authored
Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines) * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess * Added support for lazy construction of modules (see lazy_construction for an example) * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
-
- 16 Sep, 2020 1 commit
-
-
msbaines authored
-
- 03 Sep, 2020 1 commit
-
-
Jun Ru Anderson authored
Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 27 Aug, 2020 1 commit
-
-
msbaines authored
Workaround PyTorch bug that casts state (pytorch/pytorch#43706). Copied from https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/optim/fp16_optimizer.py#L251-L268
-
- 14 Aug, 2020 2 commits
-
-
msbaines authored
authored-by:Mandeep Singh Baines <msb@fb.com>
-
msbaines authored
-
- 31 Jul, 2020 3 commits
-
-
msbaines authored
-
Tom Birch authored
-
Jun Ru Anderson authored
Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 08 Jul, 2020 1 commit
-
-
Mandeep Singh Baines authored
-