1. 15 Aug, 2023 40 commits
    • flybird11111's avatar
      [shardformer] update bloom/llama/vit/chatglm tests (#4420) · 328a791d
      flybird11111 authored
      [shardformer] update bloom/llama/vit/chatglm tests
      
      [shardformer] update opt tests
      
      [shardformer] update opt tests
      
      [shardformer] update bloom/llama/vit/chatglm tests
      
      [shardformer] update bloom/llama/vit/chatglm tests
      
      [shardformer] update bloom/llama/vit/chatglm tests
      328a791d
    • flybird11111's avatar
      [shardformer]update t5 tests for using all optimizations. (#4407) · 108e54a0
      flybird11111 authored
      * [shardformer] gpt2 tests fix
      
      [shardformer] test all optimizations (#4399)
      
      [shardformer] test all optimizations
      
      [shardformer] test all optimizations
      
      [shardformer] test all optimizations
      
      [shardformer] gpt2 tests fix
      
      * [shardformer]update t5 to use all optimizations
      108e54a0
    • flybird11111's avatar
      [shardformer] update tests for all optimization (#4413) · 1edc9b5f
      flybird11111 authored
      [shardformer] update tests for all optimization
      1edc9b5f
    • Baizhou Zhang's avatar
      [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) · 7711bd52
      Baizhou Zhang authored
      * rewrite opt tests
      
      * rewrite llama tests
      
      * rewrite bloom & vit tests
      
      * rewrite chatglm tests
      
      * fix LinearCol for classfiers
      
      * add judge for other tp layers, fix lazy init in util
      7711bd52
    • flybird11111's avatar
      [shardformer]fix, test gpt2 for AMP+TP (#4403) · 21e0a42f
      flybird11111 authored
      * [shardformer] gpt2 tests fix
      
      [shardformer] test all optimizations (#4399)
      
      [shardformer] test all optimizations
      
      [shardformer] test all optimizations
      
      [shardformer] test all optimizations
      
      [shardformer] gpt2 tests fix
      
      * [shardformer] gpt2 tests fix
      21e0a42f
    • Jianghai's avatar
      [pipeline] rewrite bert tests and fix some bugs (#4409) · 7596e9ae
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      
      * rewrite bert test
      
      * rewrite bert test
      
      * fix some bugs
      
      * del pipeline tests
      
      * del pipeline tests
      
      * del useless print
      
      * del useless print
      
      * rewrite data repeats
      7596e9ae
    • flybird1111's avatar
      [shardformer] test all optimizations (#4399) · d2cd48e0
      flybird1111 authored
      [shardformer] test all optimizations
      
      [shardformer] test all optimizations
      
      [shardformer] test all optimizations
      d2cd48e0
    • flybird1111's avatar
      [shardformer] update shardformer to use flash attention 2 (#4392) · 7a3dfd0c
      flybird1111 authored
      * cherry-pick flash attention 2
      
      cherry-pick flash attention 2
      
      * [shardformer] update shardformer to use flash attention 2
      
      [shardformer] update shardformer to use flash attention 2, fix
      
      [shardformer] update shardformer to use flash attention 2, fix
      
      [shardformer] update shardformer to use flash attention 2, fix
      7a3dfd0c
    • Baizhou Zhang's avatar
      [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) · ed4c4484
      Baizhou Zhang authored
      * fix remaining t5 bugs/rewrite t5 tests
      
      * fix multi-tensor communication in pipeline
      
      * rearrange test_config
      
      * fix keyerror in sync_shared_params
      
      * fix get_held_layers & Randomnizer, complete t5 tests
      
      * erase printing
      
      * fix get_held_layers through modifying _release_unheld_layers
      
      * fix _get_recursive_held_layers bug
      ed4c4484
    • flybird1111's avatar
      [Shardformer] Merge flash attention branch to pipeline branch (#4362) · 906426cb
      flybird1111 authored
      
      
      * [shardformer] supported flash attention test dependency (#4158)
      
      * [shardformer] fix flash attention utils test (#4180)
      
      * [shardformer] opt support flash attention (#4163)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] add performance benchmark of shardformer (#4175)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] benchmark fix
      
      * [shardformer] benchmark fix
      
      * [shardformer] llama support flash attention (#4185)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] llama support flash attention
      
      * [shardformer] llama support flash attention
      
      * [shardformer] Move the import statement for xformer outside the forward function.
      
      * [shardformer] gpt2 support flash attention. (#4191)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] gpt2 support flash attention
      
      * [shardformer] gpt2 support flash attention
      
      * [shardformer] gpt2 support flash attention
      
      * [shardformer] bloom support flash attention (#4188)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] bloom suport flash attention
      
      * [shardformer] add assert to sequence length
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert support flash attention. (#4206)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] bert support flash attention
      
      * [shardformer] t5 support flash attention. (#4216)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] t5 support flash attention
      
      * [shardformer] t5 support flash attention
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * [shardformer] support 'paddedcausal'  type of attention mask in Coloattention. (#4215)
      
      * added padded causal attn mask type for ColoAttention
      
      * [shardformer]t5 flash attention fix (#4239)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] t5 flash attention fix
      
      * [shardformer] update gpt2 to use coloattention. (#4234)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2
      
      * [shardformer] update opt and llama to use coloattention. (#4226)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt
      
      * [shardformer] shardformer support jit fused operator. (#4236)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] bloom support jit fused operator
      
      * [shardformer] bloom support jit fused operator
      
      * [shardformer] bloom support jit fused operator
      
      * [shardformer] t5 support jit fused operator
      
      * [shardformer] t5 support jit fused operator
      
      * [shardformer] t5 support jit fused operator
      
      * [shardformer] add roadmap of flash attention
      
      * [shardformer] add roadmap of flash attention
      
      * [shardformer] add roadmap of flash attention
      
      * [shardformer] add type hint to 'self' param of forward
      
      * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      
      * [shardformer] whisper support flash attention (#4301)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] whisper support flash attention
      
      * [shardformer] whisper support flash attention
      
      * [shardformer]whisper support jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      
      * [shardformer] sam support flash attention (#4316)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] sam support flash attention
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      
      * [shardformer] merge blip2/chatglm  (#4321)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [shardformer] blip2 support flash attention and jit operator (#4325)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      * [shardformer] blip2 support flash attention and jit operator
      
      * [shardformer] blip2 support flash attention and jit operator
      
      * [shardformer] blip2 support flash attention and jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [shardformer] chatglm support flash attention and jit operator (#4330)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      * [shardformer] chatglm support flash attention and jit operator
      
      * [shardformer] chatglm support flash attention and jit operator
      
      * [shardformer] chatglm support flash attention and jit operator
      
      * [shardformer] chatglm support flash attention and jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [shardformer] vit support flash attention and jit operator (#4334)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      * [shardformer] vit support flash attention and jit operator
      
      * [shardformer] vit support flash attention and jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [pipeline] merge flash attention branch
      
      * [pipeline] merge flash attention branch
      
      * [pipeline] merge flash attention branch
      
      * [pipeline] fix conflict
      
      * [pipeline] fix conflict
      
      * Merge branch 'feature/pipeline' into feature/pipeline
      
      * Merge branch 'feature/pipeline' into feature/pipeline
      
      * Merge branch 'feature/pipeline' into feature/pipeline
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * fix flash attention tests
      
      * gemini ignore whisper
      
      * fix vit
      
      * fix xformers import handle
      
      ---------
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      906426cb
    • Jianghai's avatar
      [pipeline] add chatglm (#4363) · a88e9225
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      
      * add chatglm
      
      * add
      
      * chatglm
      
      * chatglm
      
      * finish chatglm
      
      * deletes
      
      * fix rmsnorm
      
      * chatglm
      
      * fix chatglm shard
      
      * init
      a88e9225
    • Baizhou Zhang's avatar
      [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) · b1feeced
      Baizhou Zhang authored
      * add util functions for shardformer tests & rewrite gpt2 test
      
      * fix shared_params & embedding/merging
      
      * fix precision
      b1feeced
    • Bin Jia's avatar
      [test] Hotfix/fix some model test and refactor check util api (#4369) · 5c6f1831
      Bin Jia authored
      * fix llama test
      
      * fix test bug of bert, blip2, bloom, gpt2
      
      * fix llama test
      
      * fix opt test
      
      * fix sam test
      
      * fix sam test
      
      * fix t5 test
      
      * fix vit test
      
      * fix whisper test
      
      * fix whisper test
      
      * polish code
      
      * adjust allclose parameter
      
      * Add mistakenly deleted code
      
      * addjust allclose
      
      * change loss function for some base model
      5c6f1831
    • FoolPlayer's avatar
      [test] skip some not compatible models · c3ca53cf
      FoolPlayer authored
      c3ca53cf
    • FoolPlayer's avatar
      update some module with new api version · 726541af
      FoolPlayer authored
      726541af
    • FoolPlayer's avatar
      [shardformer] support Blip2 (#4243) · 879301d0
      FoolPlayer authored
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      879301d0
    • klhhhhh's avatar
    • klhhhhh's avatar
      [shardformer] pre-commit check files · 4da05052
      klhhhhh authored
      4da05052
    • klhhhhh's avatar
      [shardformer] register without auto policy · 91850fe9
      klhhhhh authored
      91850fe9
    • klhhhhh's avatar
      [shardformer] ChatGLM support layernorm sharding · f155ae89
      klhhhhh authored
      f155ae89
    • klhhhhh's avatar
      [shardformer] delete some file · 00f6ef15
      klhhhhh authored
      00f6ef15
    • klhhhhh's avatar
      [shardformer] support chatglm without layernorm · dad00c42
      klhhhhh authored
      dad00c42
    • klhhhhh's avatar
      [shardformer] polish code · cbb54d32
      klhhhhh authored
      cbb54d32
    • klhhhhh's avatar
      [shardformer] polish chatglm code · 1a29e8fc
      klhhhhh authored
      1a29e8fc
    • klhhhhh's avatar
      8620009d
    • klhhhhh's avatar
      6ee4c9ee
    • klhhhhh's avatar
      import chatglm · 7377be7a
      klhhhhh authored
      7377be7a
    • klhhhhh's avatar
      [shardformer] vit test finish and support · c4928698
      klhhhhh authored
      c4928698
    • klhhhhh's avatar
      [shardformer] added tests · f60162b2
      klhhhhh authored
      f60162b2
    • Kun Lin's avatar
      Feature/chatglm (#4240) · ed34bb13
      Kun Lin authored
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      ed34bb13
    • FoolPlayer's avatar
      [shardformer] support whisper (#4212) · 9ee4ebea
      FoolPlayer authored
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      9ee4ebea
    • FoolPlayer's avatar
      [shardformer] support SAM (#4231) · dd2bf026
      FoolPlayer authored
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      dd2bf026
    • Kun Lin's avatar
      Feature/vit support (#4182) · c59d7aca
      Kun Lin authored
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      c59d7aca
    • Baizhou Zhang's avatar
      [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline... · 0ceec8f9
      Baizhou Zhang authored
      [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)
      
      * add naive optimizer for 3DPlugin/refactor gpt2 shardformer test
      
      * merge tests of PP/DP/TP combinations into one test file
      
      * fix bug when sync grad for dp in HybridPlugin
      
      * update supported precisions for 3DPlugin/fix bug when shifting tp_degree
      
      * improve the passing of lazy_init
      
      * modify lazy_init/use sync_shared_params
      0ceec8f9
    • Jianghai's avatar
      [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) · f13954cd
      Jianghai authored
      * refactor tests
      
      * refactor bloom model
      
      * finish policy tests
      
      * refactor tests
      
      * fix test pure pipeline
      
      * remove test pipeline and cutdown launch process
      
      * refactor tests
      
      * refactor bloom model
      
      * finish policy tests
      
      * refactor tests
      
      * fix test pure pipeline
      
      * remove test pipeline and cutdown launch process
      f13954cd
    • LuGY's avatar
      [pipeline] add unit test for 1f1b (#4303) · d3c6cd66
      LuGY authored
      * add unit test for 1f1b
      
      * polish code
      
      * polish code and update ut version
      
      * fix
      d3c6cd66
    • Baizhou Zhang's avatar
    • Hongxin Liu's avatar
      [hotfix] fix gemini and zero test (#4333) · 411cf1d2
      Hongxin Liu authored
      * [hotfix] fix gemini and zero test
      
      * [hotfix] fix lazy init test
      
      * [hotfix] fix lazy init test
      411cf1d2
    • Hongxin Liu's avatar
      [plugin] add 3d parallel plugin (#4295) · 261eab02
      Hongxin Liu authored
      * [amp] add mixed precision optimizer
      
      * [plugin] add 3d parallel plugin
      
      * [booster] support pipeline
      
      * [plugin] 3d parallel plugin support clip grad norm
      
      * [shardformer] fix sharder and add plugin test
      
      * [plugin] rename 3d parallel plugin
      
      * [ci] support testmon core pkg change detection (#4305)
      
      * [hotfix] debug testmon
      
      * [hotfix] fix llama
      
      * [hotfix] fix p2p bugs
      
      * [hotfix] fix requirements
      261eab02
    • FoolPlayer's avatar
      [shardformer] support pipeline base vit model (#4284) · b3f5d7a3
      FoolPlayer authored
      
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * support base vit pipeline
      
      * support vit downstream model
      
      * fix vit shard test
      
      * modify hidden states return type
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      b3f5d7a3