1. 22 May, 2023 1 commit
  2. 19 May, 2023 7 commits
  3. 18 May, 2023 5 commits
    • Hongxin Liu's avatar
      [plugin] torch ddp plugin supports sharded model checkpoint (#3775) · 5452df63
      Hongxin Liu authored
      * [plugin] torch ddp plugin add save sharded model
      
      * [test] fix torch ddp ckpt io test
      
      * [test] fix torch ddp ckpt io test
      
      * [test] fix low level zero plugin test
      
      * [test] fix low level zero plugin test
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] fix low level zero plugin test
      
      * [test] fix low level zero plugin test
      
      * [test] remove debug info
      5452df63
    • jiangmingyan's avatar
      [amp] Add naive amp demo (#3774) · 2703a37a
      jiangmingyan authored
      * [mixed_precison] add naive amp demo
      
      * [mixed_precison] add naive amp demo
      2703a37a
    • jiangmingyan's avatar
      [doc] update hybrid parallelism doc (#3770) · 48bd0567
      jiangmingyan authored
      48bd0567
    • binmakeswell's avatar
      [auto] fix install cmd (#3772) · 15024e40
      binmakeswell authored
      15024e40
    • jiangmingyan's avatar
      [doc] update booster tutorials (#3718) · d449525a
      jiangmingyan authored
      * [booster] update booster tutorials#3717
      
      * [booster] update booster tutorials#3717, fix
      
      * [booster] update booster tutorials#3717, update setup doc
      
      * [booster] update booster tutorials#3717, update setup doc
      
      * [booster] update booster tutorials#3717, update setup doc
      
      * [booster] update booster tutorials#3717, update setup doc
      
      * [booster] update booster tutorials#3717, update setup doc
      
      * [booster] update booster tutorials#3717, update setup doc
      
      * [booster] update booster tutorials#3717, rename colossalai booster.md
      
      * [booster] update booster tutorials#3717, rename colossalai booster.md
      
      * [booster] update booster tutorials#3717, rename colossalai booster.md
      
      * [booster] update booster tutorials#3717, fix
      
      * [booster] update booster tutorials#3717, fix
      
      * [booster] update tutorials#3717, update booster api doc
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, modify file
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3717, fix reference link
      
      * [booster] update tutorials#3713
      
      * [booster] update tutorials#3713, modify file
      d449525a
  4. 17 May, 2023 4 commits
  5. 16 May, 2023 1 commit
  6. 15 May, 2023 4 commits
  7. 11 May, 2023 2 commits
  8. 10 May, 2023 3 commits
  9. 09 May, 2023 1 commit
    • Hongxin Liu's avatar
      [booster] fix no_sync method (#3709) · 6552cbf8
      Hongxin Liu authored
      * [booster] fix no_sync method
      
      * [booster] add test for ddp no_sync
      
      * [booster] fix merge
      
      * [booster] update unit test
      
      * [booster] update unit test
      
      * [booster] update unit test
      6552cbf8
  10. 08 May, 2023 2 commits
  11. 06 May, 2023 4 commits
  12. 05 May, 2023 6 commits
    • Hongxin Liu's avatar
      [booster] refactor all dp fashion plugins (#3684) · d0915f54
      Hongxin Liu authored
      * [booster] add dp plugin base
      
      * [booster] inherit dp plugin base
      
      * [booster] refactor unit tests
      d0915f54
    • digger-yu's avatar
      [CI] Update test_sharded_optim_with_sync_bn.py (#3688) · b49020c1
      digger-yu authored
      fix spelling error in line23
      change "cudnn_determinstic"=True to "cudnn_deterministic=True"
      b49020c1
    • Tong Li's avatar
      Merge pull request #3680 from digger-yu/digger-yu-patch-2 · b36e67cb
      Tong Li authored
      fix spelling error with applications/Chat/evaluate/
      b36e67cb
    • jiangmingyan's avatar
      [booster] gemini plugin support shard checkpoint (#3610) · 307894f7
      jiangmingyan authored
      
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      ---------
      Co-authored-by: default avatarluchen <luchen@luchendeMBP.lan>
      Co-authored-by: default avatarluchen <luchen@luchendeMacBook-Pro.local>
      307894f7
    • Camille Zhong's avatar
      [chat] PPO stage3 doc enhancement (#3679) · 0f785cb1
      Camille Zhong authored
      * Add RoBERTa for RLHF Stage 2 & 3 (test)
      
      RoBERTa for RLHF Stage 2 & 3 (still in testing)
      
      Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
      
      This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.
      
      Add RoBERTa for RLHF stage 2 & 3
      
      1. add roberta folder under model folder
      2. add  roberta option in train_reward_model.py
      3. add some test in testci
      
      Update test_ci.sh
      
      Revert "Update test_ci.sh"
      
      This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
      
      Add RoBERTa for RLHF Stage 2 & 3 (test)
      
      RoBERTa for RLHF Stage 2 & 3 (still in testing)
      
      Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
      
      This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.
      
      Add RoBERTa for RLHF stage 2 & 3
      
      1. add roberta folder under model folder
      2. add  roberta option in train_reward_model.py
      3. add some test in testci
      
      Update test_ci.sh
      
      Revert "Update test_ci.sh"
      
      This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
      
      update roberta with coati
      
      chat ci update
      
      Revert "chat ci update"
      
      This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846.
      
      * Update README.md
      
      Update README.md
      
      * update readme
      
      * Update test_ci.sh
      
      * update readme and add a script
      
      update readme and add a script
      
      modify readme
      
      Update README.md
      0f785cb1
    • digger-yu's avatar
      [doc] fix chat spelling error (#3671) · 6650daeb
      digger-yu authored
      * Update README.md
      
      change "huggingaface" to "huggingface"
      
      * Update README.md
      
      change "Colossa-AI" to "Colossal-AI"
      6650daeb