1. 04 Sep, 2023 1 commit
    • flybird11111's avatar
      [shardformer] update bert finetune example with HybridParallelPlugin (#4584) · 0a94fcd3
      flybird11111 authored
      
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * [shardformer] zero1+pp and the corresponding tests (#4517)
      
      * pause
      
      * finish pp+zero1
      
      * Update test_shard_vit.py
      
      * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
      
      * fix overlap bug and support bert, add overlap as an option in shardconfig
      
      * support overlap for chatglm and bloom
      
      * [shardformer] fix emerged bugs after updating transformers (#4526)
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] Add overlap support for gpt2 (#4535)
      
      * add overlap support for gpt2
      
      * remove unused code
      
      * remove unused code
      
      * [shardformer] support pp+tp+zero1 tests (#4531)
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] fix submodule replacement bug when enabling pp (#4544)
      
      * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
      
      * implement sharded optimizer saving
      
      * add more param info
      
      * finish implementation of sharded optimizer saving
      
      * fix bugs in optimizer sharded saving
      
      * add pp+zero test
      
      * param group loading
      
      * greedy loading of optimizer
      
      * fix bug when loading
      
      * implement optimizer sharded saving
      
      * add optimizer test & arrange checkpointIO utils
      
      * fix gemini sharding state_dict
      
      * add verbose option
      
      * add loading of master params
      
      * fix typehint
      
      * fix master/working mapping in fp16 amp
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * rebase feature/shardformer
      
      * update pipeline
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert finetune fix
      
      * [shardformer] add all_reduce operation to loss
      
      add all_reduce operation to loss
      
      * [shardformer] make compatible with pytree.
      
      make compatible with pytree.
      
      * [shardformer] disable tp
      
      disable tp
      
      * [shardformer] add 3d plugin to ci test
      
      * [shardformer] update num_microbatches to None
      
      * [shardformer] update microbatchsize
      
      * [shardformer] update assert
      
      * update scheduler
      
      * update scheduler
      
      ---------
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      0a94fcd3
  2. 26 Jul, 2023 1 commit
  3. 17 Jul, 2023 1 commit
  4. 28 Jun, 2023 1 commit
  5. 26 Jun, 2023 1 commit
  6. 19 Jun, 2023 1 commit
  7. 12 Jun, 2023 1 commit
  8. 08 Jun, 2023 2 commits
  9. 07 Jun, 2023 2 commits
  10. 30 May, 2023 1 commit
  11. 24 May, 2023 1 commit
    • digger yu's avatar
      [docs] change placememt_policy to placement_policy (#3829) · 518b31c0
      digger yu authored
      * fix typo colossalai/autochunk auto_parallel amp
      
      * fix typo colossalai/auto_parallel nn utils etc.
      
      * fix typo colossalai/auto_parallel autochunk fx/passes  etc.
      
      * fix typo docs/
      
      * change placememt_policy to placement_policy in docs/ and examples/
      518b31c0
  12. 18 May, 2023 1 commit
  13. 26 Apr, 2023 1 commit
    • digger-yu's avatar
      [doc] Fix typo under colossalai and doc(#3618) · b9a8dff7
      digger-yu authored
      * Fixed several spelling errors under colossalai
      
      * Fix the spelling error in colossalai and docs directory
      
      * Cautious Changed the spelling error under the example folder
      
      * Update runtime_preparation_pass.py
      
      revert autograft to autograd
      
      * Update search_chunk.py
      
      utile to until
      
      * Update check_installation.py
      
      change misteach to mismatch in line 91
      
      * Update 1D_tensor_parallel.md
      
      revert to perceptron
      
      * Update 2D_tensor_parallel.md
      
      revert to perceptron in line 73
      
      * Update 2p5D_tensor_parallel.md
      
      revert to perceptron in line 71
      
      * Update 3D_tensor_parallel.md
      
      revert to perceptron in line 80
      
      * Update README.md
      
      revert to resnet in line 42
      
      * Update reorder_graph.py
      
      revert to indice in line 7
      
      * Update p2p.py
      
      revert to megatron in line 94
      
      * Update initialize.py
      
      revert to torchrun in line 198
      
      * Update routers.py
      
      change to detailed in line 63
      
      * Update routers.py
      
      change to detailed in line 146
      
      * Update README.md
      
      revert  random number in line 402
      b9a8dff7
  14. 14 Apr, 2023 1 commit
  15. 07 Apr, 2023 2 commits
  16. 06 Apr, 2023 1 commit
  17. 04 Apr, 2023 2 commits
    • ver217's avatar
      [example] update examples related to zero/gemini (#3431) · 573af841
      ver217 authored
      * [zero] update legacy import
      
      * [zero] update examples
      
      * [example] fix opt tutorial
      
      * [example] fix opt tutorial
      
      * [example] fix opt tutorial
      
      * [example] fix opt tutorial
      
      * [example] fix import
      573af841
    • ver217's avatar
      [zero] reorganize zero/gemini folder structure (#3424) · 26b7aac0
      ver217 authored
      * [zero] refactor low-level zero folder structure
      
      * [zero] fix legacy zero import path
      
      * [zero] fix legacy zero import path
      
      * [zero] remove useless import
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor legacy zero import path
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor gemini folder structure
      
      * [zero] refactor legacy zero import path
      
      * [zero] fix test import path
      
      * [zero] fix test
      
      * [zero] fix circular import
      
      * [zero] update import
      26b7aac0
  18. 23 Mar, 2023 1 commit
  19. 21 Mar, 2023 1 commit
  20. 09 Mar, 2023 2 commits
  21. 08 Mar, 2023 1 commit
  22. 07 Mar, 2023 1 commit
  23. 27 Feb, 2023 2 commits
  24. 22 Feb, 2023 2 commits
  25. 20 Feb, 2023 1 commit
  26. 15 Feb, 2023 1 commit
  27. 09 Feb, 2023 1 commit
  28. 31 Jan, 2023 1 commit
  29. 30 Jan, 2023 1 commit
  30. 28 Jan, 2023 1 commit
  31. 20 Jan, 2023 1 commit
  32. 18 Jan, 2023 2 commits