1. 05 Sep, 2023 3 commits
  2. 04 Sep, 2023 2 commits
    • flybird11111's avatar
      [shardformer] update bert finetune example with HybridParallelPlugin (#4584) · 0a94fcd3
      flybird11111 authored
      
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * [shardformer] zero1+pp and the corresponding tests (#4517)
      
      * pause
      
      * finish pp+zero1
      
      * Update test_shard_vit.py
      
      * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
      
      * fix overlap bug and support bert, add overlap as an option in shardconfig
      
      * support overlap for chatglm and bloom
      
      * [shardformer] fix emerged bugs after updating transformers (#4526)
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] Add overlap support for gpt2 (#4535)
      
      * add overlap support for gpt2
      
      * remove unused code
      
      * remove unused code
      
      * [shardformer] support pp+tp+zero1 tests (#4531)
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] fix submodule replacement bug when enabling pp (#4544)
      
      * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
      
      * implement sharded optimizer saving
      
      * add more param info
      
      * finish implementation of sharded optimizer saving
      
      * fix bugs in optimizer sharded saving
      
      * add pp+zero test
      
      * param group loading
      
      * greedy loading of optimizer
      
      * fix bug when loading
      
      * implement optimizer sharded saving
      
      * add optimizer test & arrange checkpointIO utils
      
      * fix gemini sharding state_dict
      
      * add verbose option
      
      * add loading of master params
      
      * fix typehint
      
      * fix master/working mapping in fp16 amp
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * rebase feature/shardformer
      
      * update pipeline
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert finetune fix
      
      * [shardformer] add all_reduce operation to loss
      
      add all_reduce operation to loss
      
      * [shardformer] make compatible with pytree.
      
      make compatible with pytree.
      
      * [shardformer] disable tp
      
      disable tp
      
      * [shardformer] add 3d plugin to ci test
      
      * [shardformer] update num_microbatches to None
      
      * [shardformer] update microbatchsize
      
      * [shardformer] update assert
      
      * update scheduler
      
      * update scheduler
      
      ---------
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      0a94fcd3
    • binmakeswell's avatar
      [doc] add llama2 benchmark (#4604) · 8d7b0229
      binmakeswell authored
      * [doc] add llama2 benchmark
      
      * [doc] add llama2 benchmark
      8d7b0229
  3. 28 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [example] add llama2 example (#4527) · 0b00def8
      Hongxin Liu authored
      * [example] transfer llama-1 example
      
      * [example] fit llama-2
      
      * [example] refactor scripts folder
      
      * [example] fit new gemini plugin
      
      * [cli] fix multinode runner
      
      * [example] fit gemini optim checkpoint
      
      * [example] refactor scripts
      
      * [example] update requirements
      
      * [example] update requirements
      
      * [example] rename llama to llama2
      
      * [example] update readme and pretrain script
      
      * [example] refactor scripts
      0b00def8
  4. 24 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] improve compatibility and add static placement policy (#4479) · 27061426
      Hongxin Liu authored
      * [gemini] remove distributed-related part from colotensor (#4379)
      
      * [gemini] remove process group dependency
      
      * [gemini] remove tp part from colo tensor
      
      * [gemini] patch inplace op
      
      * [gemini] fix param op hook and update tests
      
      * [test] remove useless tests
      
      * [test] remove useless tests
      
      * [misc] fix requirements
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [misc] update requirements
      
      * [gemini] refactor gemini optimizer and gemini ddp (#4398)
      
      * [gemini] update optimizer interface
      
      * [gemini] renaming gemini optimizer
      
      * [gemini] refactor gemini ddp class
      
      * [example] update gemini related example
      
      * [example] update gemini related example
      
      * [plugin] fix gemini plugin args
      
      * [test] update gemini ckpt tests
      
      * [gemini] fix checkpoint io
      
      * [example] fix opt example requirements
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [gemini] add static placement policy (#4443)
      
      * [gemini] add static placement policy
      
      * [gemini] fix param offload
      
      * [test] update gemini tests
      
      * [plugin] update gemini plugin
      
      * [plugin] update gemini plugin docstr
      
      * [misc] fix flash attn requirement
      
      * [test] fix gemini checkpoint io test
      
      * [example] update resnet example result (#4457)
      
      * [example] update bert example result (#4458)
      
      * [doc] update gemini doc (#4468)
      
      * [example] update gemini related examples (#4473)
      
      * [example] update gpt example
      
      * [example] update dreambooth example
      
      * [example] update vit
      
      * [example] update opt
      
      * [example] update palm
      
      * [example] update vit and opt benchmark
      
      * [hotfix] fix bert in model zoo (#4480)
      
      * [hotfix] fix bert in model zoo
      
      * [test] remove chatglm gemini test
      
      * [test] remove sam gemini test
      
      * [test] remove vit gemini test
      
      * [hotfix] fix opt tutorial example (#4497)
      
      * [hotfix] fix opt tutorial example
      
      * [hotfix] fix opt tutorial example
      27061426
  5. 26 Jul, 2023 1 commit
  6. 17 Jul, 2023 1 commit
  7. 28 Jun, 2023 1 commit
  8. 26 Jun, 2023 1 commit
  9. 19 Jun, 2023 1 commit
  10. 12 Jun, 2023 1 commit
  11. 08 Jun, 2023 2 commits
  12. 07 Jun, 2023 2 commits
  13. 30 May, 2023 1 commit
  14. 24 May, 2023 1 commit
    • digger yu's avatar
      [docs] change placememt_policy to placement_policy (#3829) · 518b31c0
      digger yu authored
      * fix typo colossalai/autochunk auto_parallel amp
      
      * fix typo colossalai/auto_parallel nn utils etc.
      
      * fix typo colossalai/auto_parallel autochunk fx/passes  etc.
      
      * fix typo docs/
      
      * change placememt_policy to placement_policy in docs/ and examples/
      518b31c0
  15. 18 May, 2023 1 commit
  16. 26 Apr, 2023 1 commit
    • digger-yu's avatar
      [doc] Fix typo under colossalai and doc(#3618) · b9a8dff7
      digger-yu authored
      * Fixed several spelling errors under colossalai
      
      * Fix the spelling error in colossalai and docs directory
      
      * Cautious Changed the spelling error under the example folder
      
      * Update runtime_preparation_pass.py
      
      revert autograft to autograd
      
      * Update search_chunk.py
      
      utile to until
      
      * Update check_installation.py
      
      change misteach to mismatch in line 91
      
      * Update 1D_tensor_parallel.md
      
      revert to perceptron
      
      * Update 2D_tensor_parallel.md
      
      revert to perceptron in line 73
      
      * Update 2p5D_tensor_parallel.md
      
      revert to perceptron in line 71
      
      * Update 3D_tensor_parallel.md
      
      revert to perceptron in line 80
      
      * Update README.md
      
      revert to resnet in line 42
      
      * Update reorder_graph.py
      
      revert to indice in line 7
      
      * Update p2p.py
      
      revert to megatron in line 94
      
      * Update initialize.py
      
      revert to torchrun in line 198
      
      * Update routers.py
      
      change to detailed in line 63
      
      * Update routers.py
      
      change to detailed in line 146
      
      * Update README.md
      
      revert  random number in line 402
      b9a8dff7
  17. 14 Apr, 2023 1 commit
  18. 07 Apr, 2023 2 commits
  19. 06 Apr, 2023 1 commit
  20. 04 Apr, 2023 2 commits
    • ver217's avatar
      [example] update examples related to zero/gemini (#3431) · 573af841
      ver217 authored
      * [zero] update legacy import
      
      * [zero] update examples
      
      * [example] fix opt tutorial
      
      * [example] fix opt tutorial
      
      * [example] fix opt tutorial
      
      * [example] fix opt tutorial
      
      * [example] fix import
      573af841
    • ver217's avatar
      [zero] reorganize zero/gemini folder structure (#3424) · 26b7aac0
      ver217 authored
      * [zero] refactor low-level zero folder structure
      
      * [zero] fix legacy zero import path
      
      * [zero] fix legacy zero import path
      
      * [zero] remove useless import
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor legacy zero import path
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor legacy zero import path
      
      * [zero] fix test import path
      
      * [zero] fix test
      
      * [zero] fix circular import
      
      * [zero] update import
      26b7aac0
  21. 23 Mar, 2023 1 commit
  22. 21 Mar, 2023 1 commit
  23. 09 Mar, 2023 2 commits
  24. 08 Mar, 2023 1 commit
  25. 07 Mar, 2023 1 commit
  26. 27 Feb, 2023 2 commits
  27. 22 Feb, 2023 2 commits
  28. 20 Feb, 2023 1 commit
  29. 15 Feb, 2023 1 commit
  30. 09 Feb, 2023 1 commit