1. 04 Sep, 2023 2 commits
    • flybird11111's avatar
      [shardformer] update bert finetune example with HybridParallelPlugin (#4584) · 0a94fcd3
      flybird11111 authored
      
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * [shardformer] zero1+pp and the corresponding tests (#4517)
      
      * pause
      
      * finish pp+zero1
      
      * Update test_shard_vit.py
      
      * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
      
      * fix overlap bug and support bert, add overlap as an option in shardconfig
      
      * support overlap for chatglm and bloom
      
      * [shardformer] fix emerged bugs after updating transformers (#4526)
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] Add overlap support for gpt2 (#4535)
      
      * add overlap support for gpt2
      
      * remove unused code
      
      * remove unused code
      
      * [shardformer] support pp+tp+zero1 tests (#4531)
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] fix submodule replacement bug when enabling pp (#4544)
      
      * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
      
      * implement sharded optimizer saving
      
      * add more param info
      
      * finish implementation of sharded optimizer saving
      
      * fix bugs in optimizer sharded saving
      
      * add pp+zero test
      
      * param group loading
      
      * greedy loading of optimizer
      
      * fix bug when loading
      
      * implement optimizer sharded saving
      
      * add optimizer test & arrange checkpointIO utils
      
      * fix gemini sharding state_dict
      
      * add verbose option
      
      * add loading of master params
      
      * fix typehint
      
      * fix master/working mapping in fp16 amp
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * rebase feature/shardformer
      
      * update pipeline
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert finetune fix
      
      * [shardformer] add all_reduce operation to loss
      
      add all_reduce operation to loss
      
      * [shardformer] make compatible with pytree.
      
      make compatible with pytree.
      
      * [shardformer] disable tp
      
      disable tp
      
      * [shardformer] add 3d plugin to ci test
      
      * [shardformer] update num_microbatches to None
      
      * [shardformer] update microbatchsize
      
      * [shardformer] update assert
      
      * update scheduler
      
      * update scheduler
      
      ---------
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      0a94fcd3
    • Jianghai's avatar
      [shardformer] Pytree fix (#4533) · 24c07687
      Jianghai authored
      * pytree test
      
      * test bert
      
      * test bert
      
      * test bert
      
      * revise
      
      * add register
      
      * add register
      24c07687
  2. 01 Sep, 2023 2 commits
  3. 31 Aug, 2023 2 commits
  4. 30 Aug, 2023 2 commits
    • flybird11111's avatar
      [shardformer] support pp+tp+zero1 tests (#4531) · ec18fc73
      flybird11111 authored
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      ec18fc73
    • flybird11111's avatar
      [shardformer] fix opt test hanging (#4521) · d367b887
      flybird11111 authored
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      d367b887
  5. 29 Aug, 2023 2 commits
  6. 28 Aug, 2023 2 commits
  7. 25 Aug, 2023 2 commits
    • Baizhou Zhang's avatar
      [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) · 44eab2b2
      Baizhou Zhang authored
      * add APIs
      
      * implement save_sharded_model
      
      * add test for hybrid checkpointio
      
      * implement naive loading for sharded model
      
      * implement efficient sharded model loading
      
      * open a new file for hybrid checkpoint_io
      
      * small fix
      
      * fix circular importing
      
      * fix docstring
      
      * arrange arguments and apis
      
      * small fix
      44eab2b2
    • flybird11111's avatar
      [shardformer] opt fix. (#4514) · de8a65ba
      flybird11111 authored
      * [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      * fix
      
      fix
      
      fix
      
      fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * activate checks
      
      * [Test] test ci
      
      * test ci
      
      * test ci
      
      * test ci
      
      * test ci
      
      * test ci
      
      * test ci
      
      * fix
      de8a65ba
  8. 24 Aug, 2023 1 commit
    • flybird11111's avatar
      [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) · 3353e55c
      flybird11111 authored
      * [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      [shardformer] chatglm support sequence parallel
      
      * fix
      
      fix
      
      fix
      
      fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * [shardformer] jit fused fix
      
      * activate checks
      3353e55c
  9. 23 Aug, 2023 1 commit
  10. 22 Aug, 2023 3 commits
  11. 21 Aug, 2023 1 commit
  12. 18 Aug, 2023 4 commits
    • Jianghai's avatar
      [shardformer] Pipeline/whisper (#4456) · 8739aa7f
      Jianghai authored
      * add some base tests and policies
      
      * finish whisper base model
      
      * add conditional generation
      
      * finish basic tests
      
      * whisper
      
      * finish whisper
      
      * finish whisper
      
      * del useless  whisper test
      
      * fix
      
      * add argmin to replace
      
      * finish revision
      8739aa7f
    • flybird11111's avatar
      [shardformer] bert support sequence parallel. (#4455) · a27e0bb4
      flybird11111 authored
      * [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      * [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      [shardformer] bert support sequence parallel
      
      * [shardformer] bert support sequence parallel
      a27e0bb4
    • flybird11111's avatar
      [shardformer] bloom support sequence parallel (#4465) · 0ecd71e0
      flybird11111 authored
      [shardformer] bloom support sequence parallel
      0ecd71e0
    • Bin Jia's avatar
      [shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) · 7c8be770
      Bin Jia authored
      * support gpt2 seq parallel with pp/dp/tp
      
      * fix a bug when waiting for stream done
      
      * delete unused gpt2_seq file
      7c8be770
  13. 16 Aug, 2023 5 commits
  14. 15 Aug, 2023 11 commits