1. 16 Nov, 2023 1 commit
  2. 10 Nov, 2023 1 commit
    • flybird11111's avatar
      [gemini] gemini support tensor parallelism. (#4942) · 576a2f7b
      flybird11111 authored
      * [colossalai]fix typo
      
      * [inference] Add smmoothquant for llama (#4904)
      
      * [inference] add int8 rotary embedding kernel for smoothquant (#4843)
      
      * [inference] add smoothquant llama attention (#4850)
      
      * add smoothquant llama attention
      
      * remove uselss code
      
      * remove useless code
      
      * fix import error
      
      * rename file name
      
      * [inference] add silu linear fusion for smoothquant llama mlp  (#4853)
      
      * add silu linear
      
      * update skip condition
      
      * catch smoothquant cuda lib exception
      
      * prcocess exception for tests
      
      * [inference] add llama mlp for smoothquant (#4854)
      
      * add llama mlp for smoothquant
      
      * fix down out scale
      
      * remove duplicate lines
      
      * add llama mlp check
      
      * delete useless code
      
      * [inference] add smoothquant llama (#4861)
      
      * add smoothquant llama
      
      * fix attention accuracy
      
      * fix accuracy
      
      * add kv cache and save pretrained
      
      * refactor example
      
      * delete smooth
      
      * refactor code
      
      * [inference] add smooth function and delete useless code for smoothquant (#4895)
      
      * add smooth function and delete useless code
      
      * update datasets
      
      * remove duplicate import
      
      * delete useless file
      
      * refactor codes (#4902)
      
      * rafactor code
      
      * add license
      
      * add torch-int and smoothquant license
      
      * Update flash_attention_patch.py
      
      To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer.
      https://github.com/huggingface/transformers/pull/25598
      
      
      
      * [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921)
      
      * [kernel] support pure fp16 for cpu adam (#4896)
      
      * [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919)
      
      * [kernel] fix cpu adam
      
      * [test] update gemini optim test
      
      * [format] applied code formatting on changed files in pull request 4908 (#4918)
      Co-authored-by: default avatargithub-actions <github-actions@github.com>
      
      * [gemini] support gradient accumulation (#4869)
      
      * add test
      
      * fix no_sync bug in low level zero plugin
      
      * fix test
      
      * add argument for grad accum
      
      * add grad accum in backward hook for gemini
      
      * finish implementation, rewrite tests
      
      * fix test
      
      * skip stuck model in low level zero test
      
      * update doc
      
      * optimize communication & fix gradient checkpoint
      
      * modify doc
      
      * cleaning codes
      
      * update cpu adam fp16 case
      
      * [hotfix] fix torch 2.0 compatibility (#4936)
      
      * [hotfix] fix launch
      
      * [test] fix test gemini optim
      
      * [shardformer] fix vit
      
      * [test] add no master test for low level zero plugin (#4934)
      
      * [format] applied code formatting on changed files in pull request 4820 (#4886)
      Co-authored-by: default avatargithub-actions <github-actions@github.com>
      
      * [nfc] fix some typo with colossalai/ docs/ etc. (#4920)
      
      * [Refactor] Integrated some lightllm kernels into token-attention  (#4946)
      
      * add some req for inference
      
      * clean codes
      
      * add codes
      
      * add some lightllm deps
      
      * clean codes
      
      * hello
      
      * delete rms files
      
      * add some comments
      
      * add comments
      
      * add doc
      
      * add lightllm deps
      
      * add lightllm cahtglm2 kernels
      
      * add lightllm cahtglm2 kernels
      
      * replace rotary embedding with lightllm kernel
      
      * add some commnets
      
      * add some comments
      
      * add some comments
      
      * add
      
      * replace fwd kernel att1
      
      * fix a arg
      
      * add
      
      * add
      
      * fix token attention
      
      * add some comments
      
      * clean codes
      
      * modify comments
      
      * fix readme
      
      * fix bug
      
      * fix bug
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
      
      * [test] merge old components to test to model zoo (#4945)
      
      * [test] add custom models in model zoo
      
      * [test] update legacy test
      
      * [test] update model zoo
      
      * [test] update gemini test
      
      * [test] remove components to test
      
      * [inference] add reference and fix some bugs (#4937)
      
      * add reference and fix some bugs
      
      * update gptq init
      
      ---------
      Co-authored-by: default avatarXu Kai <xukai16@foxamil.com>
      
      * [Inference]ADD Bench Chatglm2 script (#4963)
      
      * add bench chatglm
      
      * fix bug and make utils
      
      ---------
      
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [Pipeline inference] Combine kvcache with pipeline inference (#4938)
      
      * merge kvcache with pipeline inference and refactor the code structure
      
      * support ppsize > 2
      
      * refactor pipeline code
      
      * do pre-commit
      
      * modify benchmark
      
      * fix bench mark
      
      * polish code
      
      * add docstring and update readme
      
      * refactor the code
      
      * fix some logic bug of ppinfer
      
      * polish readme
      
      * fix typo
      
      * skip infer test
      
      * updated c++17 compiler flags (#4983)
      
      * [Inference] Dynamic Batching Inference, online and offline (#4953)
      
      * [inference] Dynamic Batching for Single and Multiple GPUs (#4831)
      
      * finish batch manager
      
      * 1
      
      * first
      
      * fix
      
      * fix dynamic batching
      
      * llama infer
      
      * finish test
      
      * support different lengths generating
      
      * del prints
      
      * del prints
      
      * fix
      
      * fix bug
      
      ---------
      
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [inference] Async dynamic batching  (#4894)
      
      * finish input and output logic
      
      * add generate
      
      * test forward
      
      * 1
      
      * [inference]Re push async dynamic batching (#4901)
      
      * adapt to ray server
      
      * finish async
      
      * finish test
      
      * del test
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      
      * Revert "[inference]Re push async dynamic batching (#4901)" (#4905)
      
      This reverts commit fbf3c09e673794ed18c91d4bab1a7dfea052e95a.
      
      * Revert "[inference] Async dynamic batching  (#4894)"
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Revert "[inference] Async dynamic batching  (#4894)" (#4909)
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Add Ray Distributed Environment Init Scripts
      
      * support DynamicBatchManager base function
      
      * revert _set_tokenizer version
      
      * add driver async generate
      
      * add async test
      
      * fix bugs in test_ray_dist.py
      
      * add get_tokenizer.py
      
      * fix code style
      
      * fix bugs about No module named 'pydantic' in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * [infer]Add Ray Distributed Environment Init Scripts (#4911)
      
      * Revert "[inference] Async dynamic batching  (#4894)"
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Add Ray Distributed Environment Init Scripts
      
      * support DynamicBatchManager base function
      
      * revert _set_tokenizer version
      
      * add driver async generate
      
      * add async test
      
      * fix bugs in test_ray_dist.py
      
      * add get_tokenizer.py
      
      * fix code style
      
      * fix bugs about No module named 'pydantic' in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * support dynamic batch for bloom model and is_running function
      
      * [Inference]Test for new Async engine (#4935)
      
      * infer engine
      
      * infer engine
      
      * test engine
      
      * test engine
      
      * new manager
      
      * change step
      
      * add
      
      * test
      
      * fix
      
      * fix
      
      * finish test
      
      * finish test
      
      * finish test
      
      * finish test
      
      * add license
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      
      * add assertion for config (#4947)
      
      * [Inference] Finish dynamic batching offline test (#4948)
      
      * test
      
      * fix test
      
      * fix quant
      
      * add default
      
      * fix
      
      * fix some bugs
      
      * fix some bugs
      
      * fix
      
      * fix bug
      
      * fix bugs
      
      * reset param
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention  (#4965)
      
      * adding flash-decoding
      
      * clean
      
      * adding kernel
      
      * adding flash-decoding
      
      * add integration
      
      * add
      
      * adding kernel
      
      * adding kernel
      
      * adding triton 2.1.0 features for inference
      
      * update bloom triton kernel
      
      * remove useless vllm kernels
      
      * clean codes
      
      * fix
      
      * adding files
      
      * fix readme
      
      * update llama flash-decoding
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      
      * fix ColossalEval (#4992)
      Co-authored-by: default avatarXu Yuanchen <yuanchen.xu00@gmail.com>
      
      * [doc]Update doc for colossal-inference (#4989)
      
      * update doc
      
      * Update README.md
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      
      * [hotfix] Fix the bug where process groups were not being properly released. (#4940)
      
      * Fix the bug where process groups were not being properly released.
      
      * test
      
      * Revert "test"
      
      This reverts commit 479900c1398637310abf92eefa3cd168038ea02f.
      
      * [hotfix] fix the bug of repeatedly storing param group (#4951)
      
      * [doc] add supported feature diagram for hybrid parallel plugin (#4996)
      
      * [Pipeline Inference] Merge pp with tp (#4993)
      
      * refactor pipeline into new CaiInferEngine
      
      * updata llama modeling forward
      
      * merge tp with pp
      
      * update docstring
      
      * optimize test workflow and example
      
      * fix typo
      
      * add assert and todo
      
      * [release] update version (#4995)
      
      * [release] update version
      
      * [hotfix] fix ci
      
      * [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      * fix
      
      fix
      
      fix
      
      * update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      * support fused layernorm
      
      support fused layernorm
      
      support fused layernorm
      
      * update fusedlayernorm
      
      update fusedlayernorm
      
      update fusedlayernorm
      
      * add sequence parallel to gemini
      
      add sequence parallel to gemini
      
      * fix
      
      * fix comments
      
      fix comments
      
      fix comments
      
      * fix
      
      * fix t5
      
      * clear cache
      
      * fix
      
      * activate ci
      
      * activate ci
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * revert
      
      * modify tp gather method
      
      modify tp gather method
      
      modify tp gather method
      
      modify tp gather method
      
      * fix test
      
      ---------
      Co-authored-by: default avatarXu Kai <xukai16@foxmail.com>
      Co-authored-by: default avatarZian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
      Co-authored-by: default avatarHongxin Liu <lhx0217@gmail.com>
      Co-authored-by: default avatargithub-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
      Co-authored-by: default avatargithub-actions <github-actions@github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      Co-authored-by: default avatarZhongkai Zhao <kanezz620@gmail.com>
      Co-authored-by: default avatardigger yu <digger-yu@outlook.com>
      Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
      Co-authored-by: default avatarXu Kai <xukai16@foxamil.com>
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarアマデウス <kurisusnowdeng@users.noreply.github.com>
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      Co-authored-by: default avatarYuanchen <70520919+chengeharrison@users.noreply.github.com>
      Co-authored-by: default avatarXu Yuanchen <yuanchen.xu00@gmail.com>
      Co-authored-by: default avatarlittsk <1214689160@qq.com>
      Co-authored-by: default avatarppt0011 <143150326+ppt0011@users.noreply.github.com>
      576a2f7b
  3. 19 Sep, 2023 1 commit
  4. 11 Sep, 2023 1 commit
  5. 05 Sep, 2023 1 commit
    • Hongxin Liu's avatar
      [legacy] move engine to legacy (#4560) · 8accecd5
      Hongxin Liu authored
      * [legacy] move engine to legacy
      
      * [example] fix seq parallel example
      
      * [example] fix seq parallel example
      
      * [test] test gemini pluging hang
      
      * [test] test gemini pluging hang
      
      * [test] test gemini pluging hang
      
      * [test] test gemini pluging hang
      
      * [test] test gemini pluging hang
      
      * [example] update seq parallel requirements
      8accecd5
  6. 24 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] improve compatibility and add static placement policy (#4479) · 27061426
      Hongxin Liu authored
      * [gemini] remove distributed-related part from colotensor (#4379)
      
      * [gemini] remove process group dependency
      
      * [gemini] remove tp part from colo tensor
      
      * [gemini] patch inplace op
      
      * [gemini] fix param op hook and update tests
      
      * [test] remove useless tests
      
      * [test] remove useless tests
      
      * [misc] fix requirements
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [misc] update requirements
      
      * [gemini] refactor gemini optimizer and gemini ddp (#4398)
      
      * [gemini] update optimizer interface
      
      * [gemini] renaming gemini optimizer
      
      * [gemini] refactor gemini ddp class
      
      * [example] update gemini related example
      
      * [example] update gemini related example
      
      * [plugin] fix gemini plugin args
      
      * [test] update gemini ckpt tests
      
      * [gemini] fix checkpoint io
      
      * [example] fix opt example requirements
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [gemini] add static placement policy (#4443)
      
      * [gemini] add static placement policy
      
      * [gemini] fix param offload
      
      * [test] update gemini tests
      
      * [plugin] update gemini plugin
      
      * [plugin] update gemini plugin docstr
      
      * [misc] fix flash attn requirement
      
      * [test] fix gemini checkpoint io test
      
      * [example] update resnet example result (#4457)
      
      * [example] update bert example result (#4458)
      
      * [doc] update gemini doc (#4468)
      
      * [example] update gemini related examples (#4473)
      
      * [example] update gpt example
      
      * [example] update dreambooth example
      
      * [example] update vit
      
      * [example] update opt
      
      * [example] update palm
      
      * [example] update vit and opt benchmark
      
      * [hotfix] fix bert in model zoo (#4480)
      
      * [hotfix] fix bert in model zoo
      
      * [test] remove chatglm gemini test
      
      * [test] remove sam gemini test
      
      * [test] remove vit gemini test
      
      * [hotfix] fix opt tutorial example (#4497)
      
      * [hotfix] fix opt tutorial example
      
      * [hotfix] fix opt tutorial example
      27061426
  7. 15 Aug, 2023 3 commits
    • flybird1111's avatar
      [Shardformer] Merge flash attention branch to pipeline branch (#4362) · 906426cb
      flybird1111 authored
      
      
      * [shardformer] supported flash attention test dependency (#4158)
      
      * [shardformer] fix flash attention utils test (#4180)
      
      * [shardformer] opt support flash attention (#4163)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] add performance benchmark of shardformer (#4175)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] benchmark fix
      
      * [shardformer] benchmark fix
      
      * [shardformer] llama support flash attention (#4185)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] llama support flash attention
      
      * [shardformer] llama support flash attention
      
      * [shardformer] Move the import statement for xformer outside the forward function.
      
      * [shardformer] gpt2 support flash attention. (#4191)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] gpt2 support flash attention
      
      * [shardformer] gpt2 support flash attention
      
      * [shardformer] gpt2 support flash attention
      
      * [shardformer] bloom support flash attention (#4188)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] bloom suport flash attention
      
      * [shardformer] add assert to sequence length
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert support flash attention. (#4206)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] bert support flash attention
      
      * [shardformer] t5 support flash attention. (#4216)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] t5 support flash attention
      
      * [shardformer] t5 support flash attention
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * fix typo
      
      * [shardformer] support 'paddedcausal'  type of attention mask in Coloattention. (#4215)
      
      * added padded causal attn mask type for ColoAttention
      
      * [shardformer]t5 flash attention fix (#4239)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] t5 flash attention fix
      
      * [shardformer] update gpt2 to use coloattention. (#4234)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2 to use coloattention
      
      * [shardformer] update gpt2
      
      * [shardformer] update opt and llama to use coloattention. (#4226)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt to use coloattention
      
      * [shardformer]update opt
      
      * [shardformer] shardformer support jit fused operator. (#4236)
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] opt support flash attention
      
      * [shardformer] move to modeling
      
      * [shardformer] move to modeling
      
      * [shardformer] bloom support jit fused operator
      
      * [shardformer] bloom support jit fused operator
      
      * [shardformer] bloom support jit fused operator
      
      * [shardformer] t5 support jit fused operator
      
      * [shardformer] t5 support jit fused operator
      
      * [shardformer] t5 support jit fused operator
      
      * [shardformer] add roadmap of flash attention
      
      * [shardformer] add roadmap of flash attention
      
      * [shardformer] add roadmap of flash attention
      
      * [shardformer] add type hint to 'self' param of forward
      
      * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      
      * [shardformer] whisper support flash attention (#4301)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] whisper support flash attention
      
      * [shardformer] whisper support flash attention
      
      * [shardformer]whisper support jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      
      * [shardformer] sam support flash attention (#4316)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] sam support flash attention
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      
      * [shardformer] merge blip2/chatglm  (#4321)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [shardformer] blip2 support flash attention and jit operator (#4325)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      * [shardformer] blip2 support flash attention and jit operator
      
      * [shardformer] blip2 support flash attention and jit operator
      
      * [shardformer] blip2 support flash attention and jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [shardformer] chatglm support flash attention and jit operator (#4330)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      * [shardformer] chatglm support flash attention and jit operator
      
      * [shardformer] chatglm support flash attention and jit operator
      
      * [shardformer] chatglm support flash attention and jit operator
      
      * [shardformer] chatglm support flash attention and jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [shardformer] vit support flash attention and jit operator (#4334)
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * [shardformer] support SAM (#4231)
      
      * 1.support sam 2.add fused qkv for nn.Linear
      
      * update utils support set element in list
      
      * overtwrite SamVisionAttention foward to use DropoutForParallelInput
      
      * remove unused code
      
      * [shardformer] support whisper (#4212)
      
      * support whisper
      
      * fix bug in vocabembedding
      
      * support downstream model of whisper
      
      * update readme
      
      * Feature/chatglm (#4240)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * [shardformer] chatglm ready
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] chatglm shard without mlp sharding
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] fix chatglm configuration with pre-commit
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * import chatglm
      
      * [shardformer] add test kit in model zoo for chatglm
      
      * [sharformer] add first version of policy of chatglm
      
      * [shardformer] polish chatglm code
      
      * [shardformer] polish code
      
      * [shardformer] support chatglm without layernorm
      
      * [shardformer] delete some file
      
      * [shardformer] ChatGLM support layernorm sharding
      
      * [shardformer] register without auto policy
      
      * [shardformer] pre-commit check files
      
      * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
      
      * [shardformer] support Blip2 (#4243)
      
      * support base blip2
      
      * add support for downstream blip2 model
      
      * update readme
      
      * add forward injection
      
      * skip not compatible models test
      
      * fix test for gemini and low_level_zero_pugin
      
      * [shardformer] vit support flash attention and jit operator
      
      * [shardformer] vit support flash attention and jit operator
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      
      * [pipeline] merge flash attention branch
      
      * [pipeline] merge flash attention branch
      
      * [pipeline] merge flash attention branch
      
      * [pipeline] fix conflict
      
      * [pipeline] fix conflict
      
      * Merge branch 'feature/pipeline' into feature/pipeline
      
      * Merge branch 'feature/pipeline' into feature/pipeline
      
      * Merge branch 'feature/pipeline' into feature/pipeline
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * activate checks
      
      * fix flash attention tests
      
      * gemini ignore whisper
      
      * fix vit
      
      * fix xformers import handle
      
      ---------
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      Co-authored-by: default avatarFoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarklhhhhh <1412841649@qq.com>
      906426cb
    • FoolPlayer's avatar
      [test] skip some not compatible models · c3ca53cf
      FoolPlayer authored
      c3ca53cf
    • Hongxin Liu's avatar
      [hotfix] fix gemini and zero test (#4333) · 411cf1d2
      Hongxin Liu authored
      * [hotfix] fix gemini and zero test
      
      * [hotfix] fix lazy init test
      
      * [hotfix] fix lazy init test
      411cf1d2
  8. 04 Jul, 2023 1 commit
  9. 05 Jun, 2023 1 commit
    • Hongxin Liu's avatar
      [lazy] refactor lazy init (#3891) · dbb32692
      Hongxin Liu authored
      * [lazy] remove old lazy init
      
      * [lazy] refactor lazy init folder structure
      
      * [lazy] fix lazy tensor deepcopy
      
      * [test] update lazy init test
      dbb32692
  10. 05 May, 2023 1 commit
  11. 26 Apr, 2023 1 commit
    • Hongxin Liu's avatar
      [booster] add low level zero plugin (#3594) · 4b3240cb
      Hongxin Liu authored
      * [booster] add low level zero plugin
      
      * [booster] fix gemini plugin test
      
      * [booster] fix precision
      
      * [booster] add low level zero plugin test
      
      * [test] fix booster plugin test oom
      
      * [test] fix booster plugin test oom
      
      * [test] fix googlenet and inception output trans
      
      * [test] fix diffuser clip vision model
      
      * [test] fix torchaudio_wav2vec2_base
      
      * [test] fix low level zero plugin test
      4b3240cb
  12. 12 Apr, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] gemini supports lazy init (#3379) · 152239bb
      Hongxin Liu authored
      * [gemini] fix nvme optimizer init
      
      * [gemini] gemini supports lazy init
      
      * [gemini] add init example
      
      * [gemini] add fool model
      
      * [zero] update gemini ddp
      
      * [zero] update init example
      
      * add chunk method
      
      * add chunk method
      
      * [lazyinit] fix lazy tensor tolist
      
      * [gemini] fix buffer materialization
      
      * [misc] remove useless file
      
      * [booster] update gemini plugin
      
      * [test] update gemini plugin test
      
      * [test] fix gemini plugin test
      
      * [gemini] fix import
      
      * [gemini] fix import
      
      * [lazyinit] use new metatensor
      
      * [lazyinit] use new metatensor
      
      * [lazyinit] fix __set__ method
      152239bb
  13. 06 Apr, 2023 1 commit
  14. 03 Apr, 2023 1 commit
  15. 31 Mar, 2023 1 commit
    • ver217's avatar
      [booster] implement Gemini plugin (#3352) · 5f2e34e6
      ver217 authored
      * [booster] add gemini plugin
      
      * [booster] update docstr
      
      * [booster] gemini plugin add coloparam convertor
      
      * [booster] fix coloparam convertor
      
      * [booster] fix gemini plugin device
      
      * [booster] add gemini plugin test
      
      * [booster] gemini plugin ignore sync bn
      
      * [booster] skip some model
      
      * [booster] skip some model
      
      * [booster] modify test world size
      
      * [booster] modify test world size
      
      * [booster] skip test
      5f2e34e6