1. 08 Aug, 2022 1 commit
  2. 12 Sep, 2021 1 commit
    • Darryl Barnhart's avatar
      [fix] FSDP intra-backwards gradient accumulation. (#784) · 4fa2ab9b
      Darryl Barnhart authored
      * [fix] FSDP intra-backwards gradient accumulation.
      
      Ensure gradient reduction accumulates into the unsharded gradient tensor
      within a backwards pass. This matters when an FSDP module is called
      multiple times within a forward pass, and reduction is _not_ deferred
      using activation checkpoint forward counters, bucketing or some other
      mechanism.
      
      Closes #780
      
      * [refactor] Remove forward counters. Comments.
      
      Removed forward counters from the activation checkpointing utility, now
      that FSDP does not require them for correct operation. Add more detailed
      comment about memory usage behaviour with gradient reduction.
      
      * [refactor] Delete deprecated forward counter usage.
      
      * [refactor] Add state assertion as end of pre-backward hook.
      4fa2ab9b
  3. 27 May, 2021 1 commit
  4. 14 May, 2021 1 commit
  5. 07 May, 2021 1 commit
    • msbaines's avatar
      [feat] experimental.nn.SyncBatchNorm: initial commit (#662) · f0a40046
      msbaines authored
      * [feat] experimental.nn.SyncBatchNorm: initial commit
      
      Fast/simple re-implementation of SyncBatchNorm.
      
      When profiling SSL Vision, I was seeing a majority of cycles spent in
      SyncBatchNorm. With this change, I see a 10% to 20% speedup on the
      model I was profiling.
      
      When running benchmarks/experimental/sync_batchnorm.py on 8 x V100,
      I get a 6x speedup:
      
      <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
      Elapsed time is  0.08709120750427246
      Elapsed time is  0.12632274627685547
      Elapsed time is  0.14095258712768555
      Elapsed time is  0.16529417037963867
      Elapsed time is  0.1419970989227295
      Elapsed time is  0.15166854858398438
      Elapsed time is  0.12000870704650879
      Elapsed time is  0.17534875869750977
      <class 'torch.nn.modules.batchnorm.SyncBatchNorm'>
      Elapsed time is  2.5087168216705322
      Elapsed time is  2.497001886367798
      Elapsed time is  2.5204885005950928
      Elapsed time is  2.526789903640747
      Elapsed time is  2.5080230236053467
      Elapsed time is  2.524489641189575
      Elapsed time is  2.513214588165283
      Elapsed time is  2.5359973907470703
      <class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'>
      Elapsed time is  0.4126114845275879
      Elapsed time is  0.39051294326782227
      Elapsed time is  0.40685415267944336
      Elapsed time is  0.4159870147705078
      Elapsed time is  0.42383885383605957
      Elapsed time is  0.4080159664154053
      Elapsed time is  0.41202712059020996
      Elapsed time is  0.42400121688842773
      f0a40046