1. 04 Sep, 2023 1 commit
    • flybird11111's avatar
      [shardformer] update bert finetune example with HybridParallelPlugin (#4584) · 0a94fcd3
      flybird11111 authored
      
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * [shardformer] zero1+pp and the corresponding tests (#4517)
      
      * pause
      
      * finish pp+zero1
      
      * Update test_shard_vit.py
      
      * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
      
      * fix overlap bug and support bert, add overlap as an option in shardconfig
      
      * support overlap for chatglm and bloom
      
      * [shardformer] fix emerged bugs after updating transformers (#4526)
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] Add overlap support for gpt2 (#4535)
      
      * add overlap support for gpt2
      
      * remove unused code
      
      * remove unused code
      
      * [shardformer] support pp+tp+zero1 tests (#4531)
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] fix submodule replacement bug when enabling pp (#4544)
      
      * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
      
      * implement sharded optimizer saving
      
      * add more param info
      
      * finish implementation of sharded optimizer saving
      
      * fix bugs in optimizer sharded saving
      
      * add pp+zero test
      
      * param group loading
      
      * greedy loading of optimizer
      
      * fix bug when loading
      
      * implement optimizer sharded saving
      
      * add optimizer test & arrange checkpointIO utils
      
      * fix gemini sharding state_dict
      
      * add verbose option
      
      * add loading of master params
      
      * fix typehint
      
      * fix master/working mapping in fp16 amp
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * rebase feature/shardformer
      
      * update pipeline
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert finetune fix
      
      * [shardformer] add all_reduce operation to loss
      
      add all_reduce operation to loss
      
      * [shardformer] make compatible with pytree.
      
      make compatible with pytree.
      
      * [shardformer] disable tp
      
      disable tp
      
      * [shardformer] add 3d plugin to ci test
      
      * [shardformer] update num_microbatches to None
      
      * [shardformer] update microbatchsize
      
      * [shardformer] update assert
      
      * update scheduler
      
      * update scheduler
      
      ---------
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      0a94fcd3
  2. 07 Jun, 2023 1 commit
  3. 20 Feb, 2023 1 commit