• Anupam Bhatnagar's avatar
    Allow sharded grad scaler to cpu offload with FSDP (#831) · ba5785f7
    Anupam Bhatnagar authored
    * first commit
    
    * sharded scaler hitting nan assertions
    
    * adding test for sharded grad scaler without cpu offload
    
    * ddp grad scaler and fsdp sharded grad scaler test failing
    
    * removing test_output
    
    * fix no cpu offload test
    
    * changing optimizer from OSS to SGD
    
    * all tests passing, code cleanup pending
    
    * code cleanup
    
    * fix pyproject.toml
    
    * removing .isort.cfg
    
    * running isort linter
    
    * resolving isort issues
    
    * resolving black linter issue
    
    * resolving mypy issues
    
    * fix import statement
    
    * fix mypy error
    
    * modifying import statement
    
    * adding pytorch version requirement
    
    * fixing pytest skip test decorator
    
    * apply version guard for ShardedGradScaler
    
    * removing test_fsdp_grad_scaler
    
    * increasing num_epochs for ShardedGradScaler so that updates are not skipped
    
    * adding support for torch 1.8
    
    * minor edit
    
    * [skip ci] more torch 1.8 changes
    
    * parametrizing the tests
    
    * cleanup code with linters
    
    * [skip ci] update doc string
    
    * [skip ci] addressing some more comments
    ba5785f7
ci_test_list_1.txt 526 Bytes