- 30 May, 2025 1 commit
-
-
limm authored
-
- 18 Nov, 2021 3 commits
-
-
Min Xu authored
* [chore] 0.4.3 release * update setup.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Anupam Bhatnagar authored
* adding a manual workflow * add push * fix syntax * adding a new workflow * renaming file * cleanup yaml files * [skip ci] removing pyproject edits
-
- 17 Nov, 2021 2 commits
-
-
anj-s authored
* fixed lint issues * remove unused print statements * add changelog entry * [skip ci] fix lint errors
-
Anupam Bhatnagar authored
* update changelog * [skip ci] removed requirements-test.txt * [skip ci] updating changelog * [skip ci] add PR numbers * replacing requirements-test.txt by requirements-dev.txt * [skip ci] changing requirements-test to requirements-dev in pre-commit and requirements-benchmarks * [skip ci] mark manual static analysis checks as deprecated * empty commit to trigger ci * [skip ci] updating changelog * [skip ci] addressing comments * addressing more comments
-
- 15 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* first commit * sharded scaler hitting nan assertions * adding test for sharded grad scaler without cpu offload * ddp grad scaler and fsdp sharded grad scaler test failing * removing test_output * fix no cpu offload test * changing optimizer from OSS to SGD * all tests passing, code cleanup pending * code cleanup * fix pyproject.toml * removing .isort.cfg * running isort linter * resolving isort issues * resolving black linter issue * resolving mypy issues * fix import statement * fix mypy error * modifying import statement * adding pytorch version requirement * fixing pytest skip test decorator * apply version guard for ShardedGradScaler * removing test_fsdp_grad_scaler * increasing num_epochs for ShardedGradScaler so that updates are not skipped * adding support for torch 1.8 * minor edit * [skip ci] more torch 1.8 changes * parametrizing the tests * cleanup code with linters * [skip ci] update doc string * [skip ci] addressing some more comments
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 09 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* CI config changes * changing params for failing tests * [skip ci] minor edit
-
- 08 Nov, 2021 3 commits
-
-
Anupam Bhatnagar authored
* [chore] 0.4.2 release * updating torch version * [skip ci] updating readme and requirements.txt
-
anj-s authored
* update release notes * initial commit * lint cleanup etc. * helper functions; lint errors * lint errors * lint errors * add back the boolean for named_parameters * address comments and fix lint * remove unused functions and class * remove unused state
-
Benjamin Lefaudeux authored
Add SlowMo Distributed Data Parallel for clusters with slow interconnects Co-authored-by:Vinayak Tantia <tantia.vinayak1@gmail.com>
-
- 05 Nov, 2021 1 commit
-
-
Min Xu authored
* [feat] MEVO kernel - initial import from min/softmax and min/testing branches - need to rename and further cleanup * only test with newer pytorch * renamed and added comments and code cleanup * rename and reduce test memory * testing * minor fixing * fixing * more fix * changelog * more 1.7 and 1.8 paper cuts * remove dead code * addressed Benjamin's comments * addressed more comments Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 03 Nov, 2021 1 commit
-
-
Vinayak Tantia authored
-
- 02 Nov, 2021 2 commits
-
-
anj-s authored
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 01 Nov, 2021 2 commits
-
-
Min Xu authored
* added a new test, passing without shared weights * tested weight sharing * added the test to test list file * extended to world_size = 2 * fixed test * [feat]: add limited and experimental support for shared parameter * fixed tests * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param * fixed the case where linked param is not in separate FSDP * changelog and remove old code Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* add doc strings * add lower level SSD APIs and tests * add the test to the list to be run * remove unused imports * more doc string changes * fix lint errors
-
- 28 Oct, 2021 1 commit
-
-
Min Xu authored
* [fix] fix test on main * [fix] fix test on main Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 Oct, 2021 6 commits
-
-
anj-s authored
* remove offload dependency on fp16 * update python version for cpu tess * run CPU tests with updated PyTorch version * split changes * revert tests config * fix lint errors * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors * modify docs * add tests * fix test failures * modify comments * fix lint errors * fix lint errors
-
Min Xu authored
* checkpoint + nonflat + mixed_precision * make tests pass with expected errors * addressed comments * add a comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* skip creating cpu grads and pinning memory * added additional comment * pin docutils to fix circleCI
-
Min Xu authored
* added the failing test * fixed the bug * fine-tune the condition * typo * typo * changelog and added test to test files Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
-
Eugen Hotaj authored
Fixes #827. Co-authored-by:Eugen Hotaj <ehotaj@fb.com>
-
- 24 Oct, 2021 1 commit
-
-
anj-s authored
* relax speed constraints * relax the regressions constraints
-
- 22 Oct, 2021 2 commits
-
-
anj-s authored
-
Eugen Hotaj authored
auto_shard.py currently uses torch.fx to create a symbolic DAG of operations and linearizes that DAG into an nn.Sequential so it can later be used for model offloading. This works in most cases but runs into issues for certain eager mode features, such as dynamic conditionals, shape-dependent computation, etc. This PR extends auto_shard.py to first run a preprocessing step which wraps any nn.Module which cannot be traced through. It adds a test for dynamic conditionals and updates existing failing test code. There are some immediate extensions to this approach which are marked as TODO in the code.
-
- 21 Oct, 2021 2 commits
-
-
anj-s authored
* update pytorch version for benchmarks * reduce golden data precision check
-
anj-s authored
* update python version for cpu tess * run CPU tests with updated PyTorch version * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors
-
- 20 Oct, 2021 3 commits
-
-
anj-s authored
* add log for new memory tracker features * add log for new memory tracker features
-
Quentin Duval authored
* [feat] layer memory tracking * [feat] layer memory tracking (add tests in CI) * [feat] layer memory tracking: doc typos * [feat] layer memory tracking: mypy fixes * [feat] layer memory tracking: fixes for FSDP all gather tracking on pytorch 1.9 and above * [feat] layer memory tracking: lint * [feat] layer memory tracking: mypy Co-authored-by:QuentinDuval <QuentinDuval@users.noreply.github.com>
-
anj-s authored
-
- 19 Oct, 2021 1 commit
-
-
Rohan Varma authored
* fix * remove dup file
-
- 28 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 24 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 22 Sep, 2021 1 commit
-
-
tmarkstrum authored
* update master branch to main * added FAQ about updating the branch from master to main * fixed some false positive correction * added what is new section * fixed the quoted code area * added release what is new section * added a step in release.md * fixed a word
-
- 21 Sep, 2021 1 commit
-
-
anj-s authored
-
- 20 Sep, 2021 1 commit
-
-
tmarkstrum authored
* [chore]0.4.1 release * put more details in one change log
-
- 17 Sep, 2021 1 commit
-
-
tmarkstrum authored
* add toggler to disable the using the nccl base collectives * added todo to remove the toggle when the issue is resolved.
-