• Min Xu's avatar
    [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98
    Min Xu authored
    [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)
    
    * [fix] disable single rank process group for auto_wrap_bn
    
    - beefed up unit test with regnet-like model
    - found that single-rank process group is causing problem
    - disabled it to enable convergence tests on the vissl side
    - use `raise e from None` to get a better assertion output
      in testing.py.
    
    * [test] fix regnet test for ddp+mixed_precision
    
    - need AMP context in FSDP
    - workaround different between ddp & fsdp when bias=True
    - fixed a bug in input data generation that caused different ranks have
      the same data with wrong iteration count.
    - added TODO for need a better loss and grad_scaler and reduced
      iters so there is no nan.
    - added a (disabled) debugging code
    
    * lint
    
    * lint
    
    * add scaler
    
    * lint
    
    * scaler
    
    * add a real loss
    
    * seeding in the ranks
    
    * blance tests
    
    * run AMP DDP==FSDP test only on cuda version 11 and up
    
    * add relu inplace and comment
    
    * make wrap_bn covers more cases in full precision mode
    a0458b98
test_fsdp_regnet.py 11.7 KB