• flybird1111's avatar
    [Shardformer] Merge flash attention branch to pipeline branch (#4362) · 906426cb
    flybird1111 authored
    
    
    * [shardformer] supported flash attention test dependency (#4158)
    
    * [shardformer] fix flash attention utils test (#4180)
    
    * [shardformer] opt support flash attention (#4163)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] add performance benchmark of shardformer (#4175)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] benchmark fix
    
    * [shardformer] benchmark fix
    
    * [shardformer] llama support flash attention (#4185)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] llama support flash attention
    
    * [shardformer] llama support flash attention
    
    * [shardformer] Move the import statement for xformer outside the forward function.
    
    * [shardformer] gpt2 support flash attention. (#4191)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] gpt2 support flash attention
    
    * [shardformer] gpt2 support flash attention
    
    * [shardformer] gpt2 support flash attention
    
    * [shardformer] bloom support flash attention (#4188)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] bloom suport flash attention
    
    * [shardformer] add assert to sequence length
    
    * [shardformer] fix
    
    * [shardformer] fix
    
    * [shardformer] fix
    
    * [shardformer] bert support flash attention. (#4206)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] bert support flash attention
    
    * [shardformer] t5 support flash attention. (#4216)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] t5 support flash attention
    
    * [shardformer] t5 support flash attention
    
    * fix typo
    
    * fix typo
    
    * fix typo
    
    * fix typo
    
    * fix typo
    
    * fix typo
    
    * [shardformer] support 'paddedcausal'  type of attention mask in Coloattention. (#4215)
    
    * added padded causal attn mask type for ColoAttention
    
    * [shardformer]t5 flash attention fix (#4239)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] t5 flash attention fix
    
    * [shardformer] update gpt2 to use coloattention. (#4234)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] update gpt2 to use coloattention
    
    * [shardformer] update gpt2 to use coloattention
    
    * [shardformer] update gpt2 to use coloattention
    
    * [shardformer] update gpt2 to use coloattention
    
    * [shardformer] update gpt2
    
    * [shardformer] update opt and llama to use coloattention. (#4226)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * update opt to use coloattention
    
    * [shardformer]update opt to use coloattention
    
    * [shardformer]update opt to use coloattention
    
    * [shardformer]update opt to use coloattention
    
    * [shardformer]update opt to use coloattention
    
    * [shardformer]update opt to use coloattention
    
    * [shardformer]update opt to use coloattention
    
    * [shardformer]update opt
    
    * [shardformer] shardformer support jit fused operator. (#4236)
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] opt support flash attention
    
    * [shardformer] move to modeling
    
    * [shardformer] move to modeling
    
    * [shardformer] bloom support jit fused operator
    
    * [shardformer] bloom support jit fused operator
    
    * [shardformer] bloom support jit fused operator
    
    * [shardformer] t5 support jit fused operator
    
    * [shardformer] t5 support jit fused operator
    
    * [shardformer] t5 support jit fused operator
    
    * [shardformer] add roadmap of flash attention
    
    * [shardformer] add roadmap of flash attention
    
    * [shardformer] add roadmap of flash attention
    
    * [shardformer] add type hint to 'self' param of forward
    
    * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    
    * [shardformer] whisper support flash attention (#4301)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    * [shardformer] whisper support flash attention
    
    * [shardformer] whisper support flash attention
    
    * [shardformer]whisper support jit operator
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    
    * [shardformer] sam support flash attention (#4316)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    * [shardformer] sam support flash attention
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    
    * [shardformer] merge blip2/chatglm  (#4321)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
    
    * [shardformer] support Blip2 (#4243)
    
    * support base blip2
    
    * add support for downstream blip2 model
    
    * update readme
    
    * add forward injection
    
    * skip not compatible models test
    
    * fix test for gemini and low_level_zero_pugin
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
    
    * [shardformer] blip2 support flash attention and jit operator (#4325)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
    
    * [shardformer] support Blip2 (#4243)
    
    * support base blip2
    
    * add support for downstream blip2 model
    
    * update readme
    
    * add forward injection
    
    * skip not compatible models test
    
    * fix test for gemini and low_level_zero_pugin
    
    * [shardformer] blip2 support flash attention and jit operator
    
    * [shardformer] blip2 support flash attention and jit operator
    
    * [shardformer] blip2 support flash attention and jit operator
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
    
    * [shardformer] chatglm support flash attention and jit operator (#4330)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
    
    * [shardformer] support Blip2 (#4243)
    
    * support base blip2
    
    * add support for downstream blip2 model
    
    * update readme
    
    * add forward injection
    
    * skip not compatible models test
    
    * fix test for gemini and low_level_zero_pugin
    
    * [shardformer] chatglm support flash attention and jit operator
    
    * [shardformer] chatglm support flash attention and jit operator
    
    * [shardformer] chatglm support flash attention and jit operator
    
    * [shardformer] chatglm support flash attention and jit operator
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
    
    * [shardformer] vit support flash attention and jit operator (#4334)
    
    * Feature/vit support (#4182)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * fix attention dropout
    
    * [shardformer] support SAM (#4231)
    
    * 1.support sam 2.add fused qkv for nn.Linear
    
    * update utils support set element in list
    
    * overtwrite SamVisionAttention foward to use DropoutForParallelInput
    
    * remove unused code
    
    * [shardformer] support whisper (#4212)
    
    * support whisper
    
    * fix bug in vocabembedding
    
    * support downstream model of whisper
    
    * update readme
    
    * Feature/chatglm (#4240)
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * [shardformer] chatglm ready
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] chatglm shard without mlp sharding
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] fix chatglm configuration with pre-commit
    
    * [shardformer] added tests
    
    * [shardformer] vit test finish and support
    
    * import chatglm
    
    * [shardformer] add test kit in model zoo for chatglm
    
    * [sharformer] add first version of policy of chatglm
    
    * [shardformer] polish chatglm code
    
    * [shardformer] polish code
    
    * [shardformer] support chatglm without layernorm
    
    * [shardformer] delete some file
    
    * [shardformer] ChatGLM support layernorm sharding
    
    * [shardformer] register without auto policy
    
    * [shardformer] pre-commit check files
    
    * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
    
    * [shardformer] support Blip2 (#4243)
    
    * support base blip2
    
    * add support for downstream blip2 model
    
    * update readme
    
    * add forward injection
    
    * skip not compatible models test
    
    * fix test for gemini and low_level_zero_pugin
    
    * [shardformer] vit support flash attention and jit operator
    
    * [shardformer] vit support flash attention and jit operator
    
    ---------
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
    
    * [pipeline] merge flash attention branch
    
    * [pipeline] merge flash attention branch
    
    * [pipeline] merge flash attention branch
    
    * [pipeline] fix conflict
    
    * [pipeline] fix conflict
    
    * Merge branch 'feature/pipeline' into feature/pipeline
    
    * Merge branch 'feature/pipeline' into feature/pipeline
    
    * Merge branch 'feature/pipeline' into feature/pipeline
    
    * activate checks
    
    * activate checks
    
    * activate checks
    
    * activate checks
    
    * activate checks
    
    * activate checks
    
    * activate checks
    
    * activate checks
    
    * fix flash attention tests
    
    * gemini ignore whisper
    
    * fix vit
    
    * fix xformers import handle
    
    ---------
    Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
    Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
    Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
    906426cb
t5.py 3.12 KB