- 23 Apr, 2021 2 commits
-
-
Min Xu authored
- this function is being removed in pytorch - we only need to call it in case we are working with older pytorch Co-authored-by:Min Xu <min.xu@acm.org>
-
shuyingsunshine21 authored
* relax checking root condition * formatting * add unittest * add unittest to ci test list * isort for import of unittest * format black . * move test to list 1 * add skip no cuda * black and isort
-
- 22 Apr, 2021 3 commits
-
-
Min Xu authored
* [fix] mypy and flaky test - CI didn't seem to catch this or maybe I merged incorrectly yesterday - this should fix the mypy error on master - also updated a test that seems to be flaky due to tcp port conflict * another flaky test, hopefully more determinism helps * CR * skip 1.6 * fix * minor Co-authored-by:Min Xu <min.xu@acm.org>
-
Benjamin Lefaudeux authored
-
girifb authored
* Changing FSDP init to by pass pg validation for freshly minted pgs inside of init. * Addressing Min's review comments. * Changing logging in init to debug from info * Changing logging in init to debug from info Co-authored-by:Giri Anantharaman <giriman@devfair0439.h2.fair>
-
- 21 Apr, 2021 2 commits
-
-
girifb authored
* Changing FSDP init to by pass pg validation for freshly minted pgs inside of init. * Addressing Min's review comments. Co-authored-by:Giri Anantharaman <giriman@devfair0439.h2.fair>
-
Benjamin Lefaudeux authored
-
- 20 Apr, 2021 1 commit
-
-
Sam Shleifer authored
-
- 19 Apr, 2021 2 commits
-
-
Min Xu authored
* [chore] 0.3.5 release * address comment Co-authored-by:Min Xu <min.xu@acm.org>
-
Min Xu authored
* FSDP: fixing training with freezing weights - an assert is changed to catch this case correctly - unit test added (based on Quentin's test code) for this case and compare DDP and FSDP fixes: #610 * added test file to list 1 * Use better and simpler code as suggested by Myle * testing both methods of freezing as well Co-authored-by:Min Xu <min.xu@acm.org>
-
- 15 Apr, 2021 3 commits
-
-
Benjamin Lefaudeux authored
-
anj-s authored
[fix] Revert change that removed the option to run OffloadModel with out activation checkpointing. (#608) * revert change made * add tests and revert sync shard changes * add tests * remove file checked in by error * inine var * fix lint errors * add checkpoint activation * fix mypy * use a bigger model * modify tests for now * resolve conflicts Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
anj-s authored
* modify doc string * add offload docs * add tutorial * remove print * remove print statement * modify import * modify constants * modify README and add Offload symbol * fix lint * smaller mods * lint errors * Update README.md added the references at the bottom of the readme * address comments * doc changes * add blank line Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> Co-authored-by:
Vittorio Caggiano <caggiano@gmail.com>
-
- 14 Apr, 2021 1 commit
-
-
Myle Ott authored
-
- 13 Apr, 2021 4 commits
-
-
Benjamin Lefaudeux authored
-
Sam Shleifer authored
-
Mehdi Mirzazadeh authored
replacing multip-process pipe implementation with more flexible one Initial implementation of proposal pytorch/pytorch#55256
-
Benjamin Lefaudeux authored
* Adding a unit test which checks for multiple FW passes on the same block * Adding an embedding table, but still no problem to show for it
-
- 09 Apr, 2021 1 commit
-
-
msbaines authored
-
- 08 Apr, 2021 1 commit
-
-
Sam Shleifer authored
-
- 07 Apr, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* Properly handle .train() and .eval() modes * showing that the unit test works, now fixed * code review
-
anj-s authored
* debugging * debugging activation issue * fix activation loading * remove changes used for testing * remove comment
-
Myle Ott authored
-
- 06 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 05 Apr, 2021 3 commits
-
-
anj-s authored
* add model * add offload regression benchmarks * add golden data * remove mp pipe benchmark * fix lint * remove rank * add check for model type * lint errors
-
Benjamin Lefaudeux authored
* making APIs more private * linting
-
Benjamin Lefaudeux authored
* fixing given torchvision's change
-
- 04 Apr, 2021 3 commits
-
-
Sam Shleifer authored
-
msbaines authored
This test is flaky for torch >= 1.8.0.
-
Benjamin Lefaudeux authored
-
- 03 Apr, 2021 1 commit
-
-
Shruti Bhosale authored
-
- 02 Apr, 2021 6 commits
-
-
msbaines authored
NCCL all_to_all is now supported in PyTorch (since v1.8.0) Fixes: #548
-
Min Xu authored
- releasing 0.3.3 - I need it in vissl for the auto_wrap_bn change
-
anj-s authored
-
Anjali Sridhar authored
-
Anjali Sridhar authored
-
anj-s authored
* add record_function support * add more record_function cutpoints * add more record_function cutpoints * lint errors * make string ids more specific
-
- 01 Apr, 2021 1 commit
-
-
msbaines authored
-
- 31 Mar, 2021 2 commits
-
-
Siddharth Goyal authored
-
msbaines authored
-