1. 13 Sep, 2021 1 commit
  2. 07 Jul, 2021 1 commit
  3. 21 Jun, 2021 1 commit
    • Min Xu's avatar
      [feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 2: extending FPW to support multiple flat params groups
      - FSDP still only use one group
      - unit test does this the new code paths
      - updated the changelog
      
      * first cut, mypy passed
      
      * test_flatten_params_wrapper.py::TestFlattenParams tests pass
      
      * added two more test cases and fixed a case in the code
      
      * fixed one bug with param_path_infos
      
      * fixed two more tests with hardcoded flat_param names
      
      * Update CHANGELOG.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      ab71efb3
  4. 08 Jun, 2021 1 commit
  5. 12 May, 2021 1 commit
    • anj-s's avatar
      [chore] Rename and move checkpoint_activations from misc folder. (#654) · 72c6bab2
      anj-s authored
      * rename files
      
      * add newly renamed file
      
      * rename and move checkpoint activations related files
      
      * add test files to ci list
      
      * fix lint errors
      
      * modify docs
      
      * add changelog
      
      * retain old path for now
      
      * fix lint errors
      
      * add another import test case
      
      * fix merge conflict
      
      * add missing test file
      72c6bab2
  6. 19 Mar, 2021 1 commit
  7. 18 Mar, 2021 1 commit
  8. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [feat]: checkpoint and normalization (#457) · 5e64d6a7
      Min Xu authored
      * [feat]: checkpoint and normalization
      
      - added special handling of BN for track_running_stats and checkpointing
      - we test BN/LN and checkpointing
      - we test them with mixed precision
      5e64d6a7
  9. 02 Mar, 2021 1 commit
  10. 26 Feb, 2021 1 commit
  11. 25 Feb, 2021 1 commit
  12. 23 Feb, 2021 2 commits
    • Min Xu's avatar
      [test]: add peak mem in checkpoint test (#415) · 4b5b4d3d
      Min Xu authored
      * [test]: add peak mem in checkpoint test
      
      * more debugging
      
      * new test
      
      * more fix
      
      * better collection of debug in case of future failures
      
      * update the comment
      
      * typo
      
      * comment
      
      * clarify
      
      * better wording
      4b5b4d3d
    • Min Xu's avatar
      [bug]: not all CUDA memory is freed when model is deleted (#412) · e3035933
      Min Xu authored
      * [bug]: not all CUDA memory is freed when model is deleted
      
      * fixed memory leak
      
      - without this, peak memory will be high when more than one model
        is trained (i.e. first model leave staff around pushing up the
        peak memory when the second model runs)
      
      * addressed comments
      
      * fix
      
      * changelog
      e3035933
  13. 10 Feb, 2021 1 commit
  14. 21 Jan, 2021 3 commits