1. 24 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] improve compatibility and add static placement policy (#4479) · 27061426
      Hongxin Liu authored
      * [gemini] remove distributed-related part from colotensor (#4379)
      
      * [gemini] remove process group dependency
      
      * [gemini] remove tp part from colo tensor
      
      * [gemini] patch inplace op
      
      * [gemini] fix param op hook and update tests
      
      * [test] remove useless tests
      
      * [test] remove useless tests
      
      * [misc] fix requirements
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [misc] update requirements
      
      * [gemini] refactor gemini optimizer and gemini ddp (#4398)
      
      * [gemini] update optimizer interface
      
      * [gemini] renaming gemini optimizer
      
      * [gemini] refactor gemini ddp class
      
      * [example] update gemini related example
      
      * [example] update gemini related example
      
      * [plugin] fix gemini plugin args
      
      * [test] update gemini ckpt tests
      
      * [gemini] fix checkpoint io
      
      * [example] fix opt example requirements
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [gemini] add static placement policy (#4443)
      
      * [gemini] add static placement policy
      
      * [gemini] fix param offload
      
      * [test] update gemini tests
      
      * [plugin] update gemini plugin
      
      * [plugin] update gemini plugin docstr
      
      * [misc] fix flash attn requirement
      
      * [test] fix gemini checkpoint io test
      
      * [example] update resnet example result (#4457)
      
      * [example] update bert example result (#4458)
      
      * [doc] update gemini doc (#4468)
      
      * [example] update gemini related examples (#4473)
      
      * [example] update gpt example
      
      * [example] update dreambooth example
      
      * [example] update vit
      
      * [example] update opt
      
      * [example] update palm
      
      * [example] update vit and opt benchmark
      
      * [hotfix] fix bert in model zoo (#4480)
      
      * [hotfix] fix bert in model zoo
      
      * [test] remove chatglm gemini test
      
      * [test] remove sam gemini test
      
      * [test] remove vit gemini test
      
      * [hotfix] fix opt tutorial example (#4497)
      
      * [hotfix] fix opt tutorial example
      
      * [hotfix] fix opt tutorial example
      27061426
  2. 08 Jun, 2023 4 commits
  3. 07 Jun, 2023 2 commits
  4. 06 Jun, 2023 2 commits
  5. 19 Jan, 2023 1 commit
  6. 16 Jan, 2023 1 commit
  7. 06 Jan, 2023 1 commit
  8. 19 Aug, 2022 1 commit
  9. 02 Aug, 2022 1 commit
  10. 26 Apr, 2022 1 commit
  11. 09 Dec, 2021 1 commit
    • Frank Lee's avatar
      Develop/experiments (#59) · da01c234
      Frank Lee authored
      
      
      * Add gradient accumulation, fix lr scheduler
      
      * fix FP16 optimizer and adapted torch amp with tensor parallel (#18)
      
      * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
      
      * fixed trainer
      
      * Revert "fixed trainer"
      
      This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.
      
      * improved consistency between trainer, engine and schedule (#23)
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      
      * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
      Fix convergence in cifar10, Imagenet1000
      
      * Integrate 1d tensor parallel in Colossal-AI (#39)
      
      * fixed 1D and 2D convergence (#38)
      
      * optimized 2D operations
      
      * fixed 1D ViT convergence problem
      
      * Feature/ddp (#49)
      
      * remove redundancy func in setup (#19) (#20)
      
      * use env to control the language of doc (#24) (#25)
      
      * Support TP-compatible Torch AMP and Update trainer API (#27)
      
      * Add gradient accumulation, fix lr scheduler
      
      * fix FP16 optimizer and adapted torch amp with tensor parallel (#18)
      
      * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
      
      * fixed trainer
      
      * Revert "fixed trainer"
      
      This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.
      
      * improved consistency between trainer, engine and schedule (#23)
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      
      * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)
      
      * add explanation for ViT example (#35) (#36)
      
      * support torch ddp
      
      * fix loss accumulation
      
      * add log for ddp
      
      * change seed
      
      * modify timing hook
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarbinmakeswell <binmakeswell@gmail.com>
      
      * Feature/pipeline (#40)
      
      * remove redundancy func in setup (#19) (#20)
      
      * use env to control the language of doc (#24) (#25)
      
      * Support TP-compatible Torch AMP and Update trainer API (#27)
      
      * Add gradient accumulation, fix lr scheduler
      
      * fix FP16 optimizer and adapted torch amp with tensor parallel (#18)
      
      * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
      
      * fixed trainer
      
      * Revert "fixed trainer"
      
      This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.
      
      * improved consistency between trainer, engine and schedule (#23)
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      
      * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)
      
      * add explanation for ViT example (#35) (#36)
      
      * optimize communication of pipeline parallel
      
      * fix grad clip for pipeline
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarbinmakeswell <binmakeswell@gmail.com>
      
      * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)
      
      * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset
      
      * update api for better usability (#58)
      
      update api for better usability
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      Co-authored-by: default avatarpuck_WCR <46049915+WANG-CR@users.noreply.github.com>
      Co-authored-by: default avatarbinmakeswell <binmakeswell@gmail.com>
      Co-authored-by: default avatarアマデウス <kurisusnowdeng@users.noreply.github.com>
      Co-authored-by: default avatarBoxiangW <45734921+BoxiangW@users.noreply.github.com>
      da01c234
  12. 28 Oct, 2021 1 commit