1. 23 Sep, 2022 2 commits
    • Min Xu's avatar
      [fix] better handling non-flatten in FSDP (#1072) · 429f3d31
      Min Xu authored
      
      
      * [fix] better handling non-flatten in FSDP
      
      - see the detailed comment about that backward firing case
      - also minor debugging help in FSDP
      - also minor fix in FPW's state dict
      
      * [feat] disallow reset_parameters by default
      
      * [feat] adding fsdp_instances API - useful in check wrapping by user code
      
      * [fix] one line fix but more than a day of debugging
      
      * fixed the case of loading combined check with empty fsdp instances
      
      * fixed another bug around state loading the root/nonroot module full param caching due to not resharding after forward
      
      * [feat] support .half and .float better
      
      * fixed a bug in gather optim state losses extra keys from the original state_dict
      
      * fixed a test failure in mixed precision
      
      * fixed another bug affecting no_sync grad acc
      
      * fixed a bug and a test in fsdp optim state
      
      * fixed another corner case
      
      * added a comment
      
      * skip ssd offload tests
      
      * skip fsdp one for ssd overload
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      429f3d31
    • Min Xu's avatar
      [fix] don't import ProcessGroup eagerly (#1074) · 47ce21ac
      Min Xu authored
      
      
      * [fix] don't import ProcessGroup eagerly
      
      - move the import into typing since it is only used for type checking
      - fixes #1057
      
      * more fixes
      
      * one more
      
      * tested at least
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      47ce21ac
  2. 13 Sep, 2022 3 commits
  3. 10 Sep, 2022 1 commit
  4. 07 Sep, 2022 4 commits
  5. 26 Aug, 2022 1 commit
  6. 25 Aug, 2022 1 commit
    • Min Xu's avatar
      [chore] update nightly version (#1064) · 15d4cf15
      Min Xu authored
      
      
      * update nightly version
      
      * update wgit to use numpy for load/store
      
      - this is introduced with new nightly torch version, which made torch.save() not
        producing deterministic bytes
      - this make tensor<->numpy conversion and then do the save/load to avoid that issues.
      
      * fixed tests
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      15d4cf15
  7. 11 Aug, 2022 1 commit
  8. 08 Aug, 2022 2 commits
  9. 03 Aug, 2022 1 commit
  10. 31 Jul, 2022 1 commit
    • Riyasat Ohib's avatar
      Implmentation of dense_sst_to_dst and sst_dst_to_dense (#1048) · c1dada48
      Riyasat Ohib authored
      [Feat] Implements dense_sst_to_dst and sst_dst_to_dense methods and adds tests
      
      1. Implements the dense_sst_to_dst and sst_dst_to_dense method.
      2. Adds tests for perfect reconstruction with all top-k across different dims.
      3. Adds tests for the two new methods.
      c1dada48
  11. 29 Jul, 2022 1 commit
  12. 28 Jul, 2022 1 commit
  13. 27 Jul, 2022 1 commit
    • Riyasat Ohib's avatar
      [Feat] dense to sst implementation (#1034) · 608492af
      Riyasat Ohib authored
      * [Feat] dense to sst implementation
      1. Implementation of dense_to_sst function.
      2. calculating the threshold for both the cases of top-k-element and top-k-percentage (fraction)
      3. assertions to verify that the top_k_elements is smaller than the numel along the same dim
      4. top_k_percent to top-k conversion
      5. When calculating SST, now the real part of the complex dense_freq is used instead of the magnitudes.
      
      * [Feat, Tests] transform method addition, handling of top_k_element None case
      1. Addition of a transform method
      2. Adds code to handle the dim=None case for top_k_element
      
      * [Feat, Refactor] Reorganizations, new assertions and fixes.
      1. XOR for validation that both of topk percent and element are not set, or both simultaneously unset. One and only one is set.
      3. Distills topk and percent both to topk using unified helper function .
      5. Adds a scatter topk values function to scatter values for SST and in future DST.
      6. Validation for percentage range, and ensures k is never 0.
      7. Uses config validation, adds config validation for top_k_element > 0 if not None.
      608492af
  14. 26 Jul, 2022 6 commits
  15. 22 Jul, 2022 1 commit
  16. 21 Jul, 2022 1 commit
  17. 19 Jul, 2022 1 commit
    • Min Xu's avatar
      [feat]: add per-tensor add to repo (#1033) · 4d58a294
      Min Xu authored
      
      
      * formatting change, no logical change
      
      * formatting and name change, no logical change
      
      * [refactor] sha1_store's path arg
      
      - make sha1_store's path arg directly the path, not its parent
      - this is because sha1_store is not like a .git or a .wgit dir, which is
        nested inside another "working" dir. It is simply a store, which
        is using a given dir.
      - updated repo and tests as well.
      
      * remove a test warning due to deprecated API from torch
      
      * [refactor] change how dot_wgit_dir_path is used
      
      - it should only be assigned in __init__.
      - we use it in error checking in the rest APIs.
      
      * simplify the init a bit
      
      * refactor the sanity check
      
      * moved some functions, no code change
      
      * [feat] added per-tensor add to the repo
      
      * enabled gzip compression on add
      
      * fix a unit test
      
      * add a note
      
      * make sha1 store work on general dict
      
      * handle general state_dict from a model, not just a module's one-level OrderedDict
      
      * formatting
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      4d58a294
  18. 18 Jul, 2022 1 commit
  19. 15 Jul, 2022 2 commits
  20. 14 Jul, 2022 2 commits
  21. 12 Jul, 2022 2 commits
  22. 05 Jul, 2022 1 commit
    • Riyasat Ohib's avatar
      weigit status and checking for file modification and tracking (#1021) · 5b5db28d
      Riyasat Ohib authored
      * [Fix] Restructure for wgit availability as a package
      
      * Preliminary implementation of wgit status
      
      * [Feat] Addition of wgit status
      1. Functionalities to check the status of the repo.
      2. Checks if file has been modified, whether changes added or added changes commited.
      
      * [test] Addition of tests for weigit status
      1. Some minor refactors and docstring changes
      
      * [Fix] Changes in repo status test
      
      * [test] status test fix
      1. made the test status printing order independent
      
      * [refactor] Metadata dirs mirroring chkpt paths, changes in wgit status
      1. Metadata files are now created within wgit with directory structure mirroring the relative paths of the checkpoint/files they track.
      2. Changes in status: 3 statuses now.
      3. Changes in tests.
      4. Some code refactoring.
      
      * [cleanup] minor changes in comments and cleanup
      5b5db28d
  23. 29 Jun, 2022 2 commits
  24. 27 Jun, 2022 1 commit