1. 12 Oct, 2023 2 commits
  2. 11 Oct, 2023 1 commit
    • Bin Jia's avatar
      [Pipeline Inference] Sync pipeline inference branch to main (#4820) · 08a9f76b
      Bin Jia authored
      * [pipeline inference] pipeline inference (#4492)
      
      * add pp stage manager as circle stage
      
      * fix a bug when create process group
      
      * add ppinfer basic framework
      
      * add micro batch manager and support kvcache-pp gpt2 fwd
      
      * add generate schedule
      
      * use mb size to control mb number
      
      * support generate with kv cache
      
      * add output, remove unused code
      
      * add test
      
      * reuse shardformer to build model
      
      * refactor some code and use the same attribute name of hf
      
      * fix review and add test for generation
      
      * remove unused file
      
      * fix CI
      
      * add cache clear
      
      * fix code error
      
      * fix typo
      
      * [Pipeline inference] Modify to tieweight (#4599)
      
      * add pp stage manager as circle stage
      
      * fix a bug when create process group
      
      * add ppinfer basic framework
      
      * add micro batch manager and support kvcache-pp gpt2 fwd
      
      * add generate schedule
      
      * use mb size to control mb number
      
      * support generate with kv cache
      
      * add output, remove unused code
      
      * add test
      
      * reuse shardformer to build model
      
      * refactor some code and use the same attribute name of hf
      
      * fix review and add test for generation
      
      * remove unused file
      
      * modify the way of saving newtokens
      
      * modify to tieweight
      
      * modify test
      
      * remove unused file
      
      * solve review
      
      * add docstring
      
      * [Pipeline inference] support llama pipeline inference (#4647)
      
      * support llama pipeline inference
      
      * remove tie weight operation
      
      * [pipeline inference] Fix the blocking of communication when ppsize is 2 (#4708)
      
      * add benchmark verbose
      
      * fix export tokens
      
      * fix benchmark verbose
      
      * add P2POp style to do p2p communication
      
      * modify schedule as p2p type when ppsize is 2
      
      * remove unused code and add docstring
      
      * [Pipeline inference] Refactor code, add docsting, fix bug (#4790)
      
      * add benchmark script
      
      * update argparse
      
      * fix fp16 load
      
      * refactor code style
      
      * add docstring
      
      * polish code
      
      * fix test bug
      
      * [Pipeline inference] Add pipeline inference docs (#4817)
      
      * add readme doc
      
      * add a ico
      
      * Add performance
      
      * update table of contents
      
      * refactor code (#4873)
      08a9f76b
  3. 07 Oct, 2023 1 commit
  4. 26 Sep, 2023 2 commits
  5. 20 Sep, 2023 1 commit
  6. 19 Sep, 2023 1 commit
  7. 05 Sep, 2023 2 commits
  8. 04 Sep, 2023 1 commit
  9. 01 Sep, 2023 2 commits
  10. 31 Aug, 2023 2 commits
  11. 25 Aug, 2023 1 commit
  12. 24 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] improve compatibility and add static placement policy (#4479) · 27061426
      Hongxin Liu authored
      * [gemini] remove distributed-related part from colotensor (#4379)
      
      * [gemini] remove process group dependency
      
      * [gemini] remove tp part from colo tensor
      
      * [gemini] patch inplace op
      
      * [gemini] fix param op hook and update tests
      
      * [test] remove useless tests
      
      * [test] remove useless tests
      
      * [misc] fix requirements
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [misc] update requirements
      
      * [gemini] refactor gemini optimizer and gemini ddp (#4398)
      
      * [gemini] update optimizer interface
      
      * [gemini] renaming gemini optimizer
      
      * [gemini] refactor gemini ddp class
      
      * [example] update gemini related example
      
      * [example] update gemini related example
      
      * [plugin] fix gemini plugin args
      
      * [test] update gemini ckpt tests
      
      * [gemini] fix checkpoint io
      
      * [example] fix opt example requirements
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [gemini] add static placement policy (#4443)
      
      * [gemini] add static placement policy
      
      * [gemini] fix param offload
      
      * [test] update gemini tests
      
      * [plugin] update gemini plugin
      
      * [plugin] update gemini plugin docstr
      
      * [misc] fix flash attn requirement
      
      * [test] fix gemini checkpoint io test
      
      * [example] update resnet example result (#4457)
      
      * [example] update bert example result (#4458)
      
      * [doc] update gemini doc (#4468)
      
      * [example] update gemini related examples (#4473)
      
      * [example] update gpt example
      
      * [example] update dreambooth example
      
      * [example] update vit
      
      * [example] update opt
      
      * [example] update palm
      
      * [example] update vit and opt benchmark
      
      * [hotfix] fix bert in model zoo (#4480)
      
      * [hotfix] fix bert in model zoo
      
      * [test] remove chatglm gemini test
      
      * [test] remove sam gemini test
      
      * [test] remove vit gemini test
      
      * [hotfix] fix opt tutorial example (#4497)
      
      * [hotfix] fix opt tutorial example
      
      * [hotfix] fix opt tutorial example
      27061426
  13. 31 Jul, 2023 1 commit
  14. 21 Jul, 2023 1 commit
  15. 07 Jul, 2023 1 commit
  16. 04 Jul, 2023 1 commit
  17. 25 Jun, 2023 1 commit
  18. 16 Jun, 2023 2 commits
  19. 15 Jun, 2023 1 commit
  20. 23 May, 2023 1 commit
  21. 19 May, 2023 1 commit
  22. 18 May, 2023 1 commit
    • Hongxin Liu's avatar
      [plugin] torch ddp plugin supports sharded model checkpoint (#3775) · 5452df63
      Hongxin Liu authored
      * [plugin] torch ddp plugin add save sharded model
      
      * [test] fix torch ddp ckpt io test
      
      * [test] fix torch ddp ckpt io test
      
      * [test] fix low level zero plugin test
      
      * [test] fix low level zero plugin test
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] add debug info
      
      * [test] fix low level zero plugin test
      
      * [test] fix low level zero plugin test
      
      * [test] remove debug info
      5452df63
  23. 15 May, 2023 1 commit
    • Hongxin Liu's avatar
      [devops] update torch version of CI (#3725) · afb239bb
      Hongxin Liu authored
      * [test] fix flop tensor test
      
      * [test] fix autochunk test
      
      * [test] fix lazyinit test
      
      * [devops] update torch version of CI
      
      * [devops] enable testmon
      
      * [devops] fix ci
      
      * [devops] fix ci
      
      * [test] fix checkpoint io test
      
      * [test] fix cluster test
      
      * [test] fix timm test
      
      * [devops] fix ci
      
      * [devops] fix ci
      
      * [devops] fix ci
      
      * [devops] fix ci
      
      * [devops] force sync to test ci
      
      * [test] skip fsdp test
      afb239bb
  24. 10 May, 2023 1 commit
  25. 05 May, 2023 1 commit
    • jiangmingyan's avatar
      [booster] gemini plugin support shard checkpoint (#3610) · 307894f7
      jiangmingyan authored
      
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin add shard checkpoint save/load
      
      * gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      * [API Refactoring]gemini plugin support shard checkpoint
      
      ---------
      Co-authored-by: default avatarluchen <luchen@luchendeMBP.lan>
      Co-authored-by: default avatarluchen <luchen@luchendeMacBook-Pro.local>
      307894f7
  26. 06 Apr, 2023 2 commits
  27. 04 Apr, 2023 1 commit
  28. 27 Mar, 2023 1 commit
  29. 23 Mar, 2023 1 commit