• flybird11111's avatar
    [gemini] gemini support tensor parallelism. (#4942) · 576a2f7b
    flybird11111 authored
    * [colossalai]fix typo
    
    * [inference] Add smmoothquant for llama (#4904)
    
    * [inference] add int8 rotary embedding kernel for smoothquant (#4843)
    
    * [inference] add smoothquant llama attention (#4850)
    
    * add smoothquant llama attention
    
    * remove uselss code
    
    * remove useless code
    
    * fix import error
    
    * rename file name
    
    * [inference] add silu linear fusion for smoothquant llama mlp  (#4853)
    
    * add silu linear
    
    * update skip condition
    
    * catch smoothquant cuda lib exception
    
    * prcocess exception for tests
    
    * [inference] add llama mlp for smoothquant (#4854)
    
    * add llama mlp for smoothquant
    
    * fix down out scale
    
    * remove duplicate lines
    
    * add llama mlp check
    
    * delete useless code
    
    * [inference] add smoothquant llama (#4861)
    
    * add smoothquant llama
    
    * fix attention accuracy
    
    * fix accuracy
    
    * add kv cache and save pretrained
    
    * refactor example
    
    * delete smooth
    
    * refactor code
    
    * [inference] add smooth function and delete useless code for smoothquant (#4895)
    
    * add smooth function and delete useless code
    
    * update datasets
    
    * remove duplicate import
    
    * delete useless file
    
    * refactor codes (#4902)
    
    * rafactor code
    
    * add license
    
    * add torch-int and smoothquant license
    
    * Update flash_attention_patch.py
    
    To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer.
    https://github.com/huggingface/transformers/pull/25598
    
    
    
    * [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921)
    
    * [kernel] support pure fp16 for cpu adam (#4896)
    
    * [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919)
    
    * [kernel] fix cpu adam
    
    * [test] update gemini optim test
    
    * [format] applied code formatting on changed files in pull request 4908 (#4918)
    Co-authored-by: default avatargithub-actions <github-actions@github.com>
    
    * [gemini] support gradient accumulation (#4869)
    
    * add test
    
    * fix no_sync bug in low level zero plugin
    
    * fix test
    
    * add argument for grad accum
    
    * add grad accum in backward hook for gemini
    
    * finish implementation, rewrite tests
    
    * fix test
    
    * skip stuck model in low level zero test
    
    * update doc
    
    * optimize communication & fix gradient checkpoint
    
    * modify doc
    
    * cleaning codes
    
    * update cpu adam fp16 case
    
    * [hotfix] fix torch 2.0 compatibility (#4936)
    
    * [hotfix] fix launch
    
    * [test] fix test gemini optim
    
    * [shardformer] fix vit
    
    * [test] add no master test for low level zero plugin (#4934)
    
    * [format] applied code formatting on changed files in pull request 4820 (#4886)
    Co-authored-by: default avatargithub-actions <github-actions@github.com>
    
    * [nfc] fix some typo with colossalai/ docs/ etc. (#4920)
    
    * [Refactor] Integrated some lightllm kernels into token-attention  (#4946)
    
    * add some req for inference
    
    * clean codes
    
    * add codes
    
    * add some lightllm deps
    
    * clean codes
    
    * hello
    
    * delete rms files
    
    * add some comments
    
    * add comments
    
    * add doc
    
    * add lightllm deps
    
    * add lightllm cahtglm2 kernels
    
    * add lightllm cahtglm2 kernels
    
    * replace rotary embedding with lightllm kernel
    
    * add some commnets
    
    * add some comments
    
    * add some comments
    
    * add
    
    * replace fwd kernel att1
    
    * fix a arg
    
    * add
    
    * add
    
    * fix token attention
    
    * add some comments
    
    * clean codes
    
    * modify comments
    
    * fix readme
    
    * fix bug
    
    * fix bug
    
    ---------
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
    
    * [test] merge old components to test to model zoo (#4945)
    
    * [test] add custom models in model zoo
    
    * [test] update legacy test
    
    * [test] update model zoo
    
    * [test] update gemini test
    
    * [test] remove components to test
    
    * [inference] add reference and fix some bugs (#4937)
    
    * add reference and fix some bugs
    
    * update gptq init
    
    ---------
    Co-authored-by: default avatarXu Kai <xukai16@foxamil.com>
    
    * [Inference]ADD Bench Chatglm2 script (#4963)
    
    * add bench chatglm
    
    * fix bug and make utils
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    
    * [Pipeline inference] Combine kvcache with pipeline inference (#4938)
    
    * merge kvcache with pipeline inference and refactor the code structure
    
    * support ppsize > 2
    
    * refactor pipeline code
    
    * do pre-commit
    
    * modify benchmark
    
    * fix bench mark
    
    * polish code
    
    * add docstring and update readme
    
    * refactor the code
    
    * fix some logic bug of ppinfer
    
    * polish readme
    
    * fix typo
    
    * skip infer test
    
    * updated c++17 compiler flags (#4983)
    
    * [Inference] Dynamic Batching Inference, online and offline (#4953)
    
    * [inference] Dynamic Batching for Single and Multiple GPUs (#4831)
    
    * finish batch manager
    
    * 1
    
    * first
    
    * fix
    
    * fix dynamic batching
    
    * llama infer
    
    * finish test
    
    * support different lengths generating
    
    * del prints
    
    * del prints
    
    * fix
    
    * fix bug
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    
    * [inference] Async dynamic batching  (#4894)
    
    * finish input and output logic
    
    * add generate
    
    * test forward
    
    * 1
    
    * [inference]Re push async dynamic batching (#4901)
    
    * adapt to ray server
    
    * finish async
    
    * finish test
    
    * del test
    
    ---------
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    
    * Revert "[inference]Re push async dynamic batching (#4901)" (#4905)
    
    This reverts commit fbf3c09e673794ed18c91d4bab1a7dfea052e95a.
    
    * Revert "[inference] Async dynamic batching  (#4894)"
    
    This reverts commit fced14025043e29ce816b315f440601188f7f79f.
    
    * Revert "[inference] Async dynamic batching  (#4894)" (#4909)
    
    This reverts commit fced14025043e29ce816b315f440601188f7f79f.
    
    * Add Ray Distributed Environment Init Scripts
    
    * support DynamicBatchManager base function
    
    * revert _set_tokenizer version
    
    * add driver async generate
    
    * add async test
    
    * fix bugs in test_ray_dist.py
    
    * add get_tokenizer.py
    
    * fix code style
    
    * fix bugs about No module named 'pydantic' in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * [infer]Add Ray Distributed Environment Init Scripts (#4911)
    
    * Revert "[inference] Async dynamic batching  (#4894)"
    
    This reverts commit fced14025043e29ce816b315f440601188f7f79f.
    
    * Add Ray Distributed Environment Init Scripts
    
    * support DynamicBatchManager base function
    
    * revert _set_tokenizer version
    
    * add driver async generate
    
    * add async test
    
    * fix bugs in test_ray_dist.py
    
    * add get_tokenizer.py
    
    * fix code style
    
    * fix bugs about No module named 'pydantic' in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * support dynamic batch for bloom model and is_running function
    
    * [Inference]Test for new Async engine (#4935)
    
    * infer engine
    
    * infer engine
    
    * test engine
    
    * test engine
    
    * new manager
    
    * change step
    
    * add
    
    * test
    
    * fix
    
    * fix
    
    * finish test
    
    * finish test
    
    * finish test
    
    * finish test
    
    * add license
    
    ---------
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    
    * add assertion for config (#4947)
    
    * [Inference] Finish dynamic batching offline test (#4948)
    
    * test
    
    * fix test
    
    * fix quant
    
    * add default
    
    * fix
    
    * fix some bugs
    
    * fix some bugs
    
    * fix
    
    * fix bug
    
    * fix bugs
    
    * reset param
    
    ---------
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    
    * [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention  (#4965)
    
    * adding flash-decoding
    
    * clean
    
    * adding kernel
    
    * adding flash-decoding
    
    * add integration
    
    * add
    
    * adding kernel
    
    * adding kernel
    
    * adding triton 2.1.0 features for inference
    
    * update bloom triton kernel
    
    * remove useless vllm kernels
    
    * clean codes
    
    * fix
    
    * adding files
    
    * fix readme
    
    * update llama flash-decoding
    
    ---------
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    
    * fix ColossalEval (#4992)
    Co-authored-by: default avatarXu Yuanchen <yuanchen.xu00@gmail.com>
    
    * [doc]Update doc for colossal-inference (#4989)
    
    * update doc
    
    * Update README.md
    
    ---------
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    
    * [hotfix] Fix the bug where process groups were not being properly released. (#4940)
    
    * Fix the bug where process groups were not being properly released.
    
    * test
    
    * Revert "test"
    
    This reverts commit 479900c1398637310abf92eefa3cd168038ea02f.
    
    * [hotfix] fix the bug of repeatedly storing param group (#4951)
    
    * [doc] add supported feature diagram for hybrid parallel plugin (#4996)
    
    * [Pipeline Inference] Merge pp with tp (#4993)
    
    * refactor pipeline into new CaiInferEngine
    
    * updata llama modeling forward
    
    * merge tp with pp
    
    * update docstring
    
    * optimize test workflow and example
    
    * fix typo
    
    * add assert and todo
    
    * [release] update version (#4995)
    
    * [release] update version
    
    * [hotfix] fix ci
    
    * [gemini] gemini support tp
    
    [gemini] gemini support tp
    
    [gemini] gemini support tp
    
    [gemini] gemini support tp
    
    [gemini] gemini support tp
    
    * fix
    
    fix
    
    fix
    
    * update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    update checkpointIO
    
    * support fused layernorm
    
    support fused layernorm
    
    support fused layernorm
    
    * update fusedlayernorm
    
    update fusedlayernorm
    
    update fusedlayernorm
    
    * add sequence parallel to gemini
    
    add sequence parallel to gemini
    
    * fix
    
    * fix comments
    
    fix comments
    
    fix comments
    
    * fix
    
    * fix t5
    
    * clear cache
    
    * fix
    
    * activate ci
    
    * activate ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * revert
    
    * modify tp gather method
    
    modify tp gather method
    
    modify tp gather method
    
    modify tp gather method
    
    * fix test
    
    ---------
    Co-authored-by: default avatarXu Kai <xukai16@foxmail.com>
    Co-authored-by: default avatarZian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
    Co-authored-by: default avatarHongxin Liu <lhx0217@gmail.com>
    Co-authored-by: default avatargithub-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: default avatargithub-actions <github-actions@github.com>
    Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
    Co-authored-by: default avatarZhongkai Zhao <kanezz620@gmail.com>
    Co-authored-by: default avatardigger yu <digger-yu@outlook.com>
    Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
    Co-authored-by: default avatarXu Kai <xukai16@foxamil.com>
    Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
    Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarアマデウス <kurisusnowdeng@users.noreply.github.com>
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    Co-authored-by: default avatarYuanchen <70520919+chengeharrison@users.noreply.github.com>
    Co-authored-by: default avatarXu Yuanchen <yuanchen.xu00@gmail.com>
    Co-authored-by: default avatarlittsk <1214689160@qq.com>
    Co-authored-by: default avatarppt0011 <143150326+ppt0011@users.noreply.github.com>
    576a2f7b
gemini_optimizer.py 35.3 KB