- 15 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check
-
- 08 Jan, 2024 1 commit
-
-
Elsa Granger authored
* A more general _communicate * feat: finish tree_flatten version p2p * fix: update p2p api calls --------- Co-authored-by:Wenhao Chen <cwher@outlook.com>
-
- 03 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: add fallback order option and update 1f1b * fix: fix deadlock comm in interleaved pp * test: modify p2p test
-
- 22 Dec, 2023 1 commit
-
-
Wenhao Chen authored
* test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin
-
- 28 Nov, 2023 1 commit
-
-
Wenhao Chen authored
* [shardformer] implement policy for all GPT-J models and test * [shardformer] support interleaved pipeline parallel for bert finetune * [shardformer] shardformer support falcon (#4883) * [shardformer]: fix interleaved pipeline for bert model (#5048) * [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093) * Add Mistral support for Shardformer (#5103) * [shardformer] add tests to mistral (#5105) --------- Co-authored-by:
Pengtai Xu <henryxu880@gmail.com> Co-authored-by:
ppt0011 <143150326+ppt0011@users.noreply.github.com> Co-authored-by:
flybird11111 <1829166702@qq.com> Co-authored-by:
eric8607242 <e0928021388@gmail.com>
-
- 19 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format
-
- 18 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci
-
- 11 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [legacy] move communication to legacy (#4640) * [legacy] refactor logger and clean up legacy codes (#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check
-
- 07 Sep, 2023 1 commit
-
-
Baizhou Zhang authored
* set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py
-
- 05 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [legacy] move trainer to legacy * [doc] update docs related to trainer * [test] ignore legacy test
-
- 01 Sep, 2023 1 commit
-
-
Hongxin Liu authored
-
- 24 Aug, 2023 1 commit
-
-
Hongxin Liu authored
* [gemini] remove distributed-related part from colotensor (#4379) * [gemini] remove process group dependency * [gemini] remove tp part from colo tensor * [gemini] patch inplace op * [gemini] fix param op hook and update tests * [test] remove useless tests * [test] remove useless tests * [misc] fix requirements * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [misc] update requirements * [gemini] refactor gemini optimizer and gemini ddp (#4398) * [gemini] update optimizer interface * [gemini] renaming gemini optimizer * [gemini] refactor gemini ddp class * [example] update gemini related example * [example] update gemini related example * [plugin] fix gemini plugin args * [test] update gemini ckpt tests * [gemini] fix checkpoint io * [example] fix opt example requirements * [example] fix opt example * [example] fix opt example * [example] fix opt example * [gemini] add static placement policy (#4443) * [gemini] add static placement policy * [gemini] fix param offload * [test] update gemini tests * [plugin] update gemini plugin * [plugin] update gemini plugin docstr * [misc] fix flash attn requirement * [test] fix gemini checkpoint io test * [example] update resnet example result (#4457) * [example] update bert example result (#4458) * [doc] update gemini doc (#4468) * [example] update gemini related examples (#4473) * [example] update gpt example * [example] update dreambooth example * [example] update vit * [example] update opt * [example] update palm * [example] update vit and opt benchmark * [hotfix] fix bert in model zoo (#4480) * [hotfix] fix bert in model zoo * [test] remove chatglm gemini test * [test] remove sam gemini test * [test] remove vit gemini test * [hotfix] fix opt tutorial example (#4497) * [hotfix] fix opt tutorial example * [hotfix] fix opt tutorial example
-
- 18 Aug, 2023 1 commit
-
-
Jianghai authored
* add some base tests and policies * finish whisper base model * add conditional generation * finish basic tests * whisper * finish whisper * finish whisper * del useless whisper test * fix * add argmin to replace * finish revision
-
- 16 Aug, 2023 2 commits
-
-
LuGY authored
* support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
- 15 Aug, 2023 15 commits
-
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * add chatglm * add * chatglm * chatglm * finish chatglm * deletes * fix rmsnorm * chatglm * fix chatglm shard * init
-
Jianghai authored
* refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process
-
LuGY authored
* add unit test for 1f1b * polish code * polish code and update ut version * fix
-
Baizhou Zhang authored
* modify t5 policy & add test * pipeline stage distribution for t5 * complete t5 base policy * t5 stack: halfway * modify gpt2 pipeline test * complete pipeline forward for T5Stack/T5EncoderModel * fix docstring * move t5 util tests to test_pipeline
-
Jianghai authored
* opt forward and test * pause * finish opt model pipeline * finish opt pipeline * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * fix opt * set transformers version * refactor the test pipeline
-
Jianghai authored
* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * finish some bert models * finish all bert models * finish bert tests * fix bugs * fix bugs * fix test pipeline * fix data gen for qa * update the set pipeline forward * shared params * fix bugs
-
Jianghai authored
* move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt
-
Hongxin Liu authored
* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt
-
Hongxin Liu authored
* [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import
-
Hongxin Liu authored
* [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict
-
Hongxin Liu authored
* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager
-
- 06 Apr, 2023 1 commit
-
-
Frank Lee authored
* [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code
-
- 12 Dec, 2022 1 commit
-
-
Ziyue Jiang authored
* add bwd and step for PP middleware * pre-commit Co-authored-by:Ziyue Jiang <ziyue.jiang@gmail.com>
-
- 08 Dec, 2022 1 commit
-
-
Ziyue Jiang authored
* add DAG test case * fix datarace by adjusting theposition of lock * polish code * fix pytest for middleware * remove test Co-authored-by:Ziyue Jiang <ziyue.jiang@gmail.com>
-
- 05 Dec, 2022 1 commit
-
-
Ziyue Jiang authored
* adapt scheduler for Topo * remoove comment * fix set input Co-authored-by:Ziyue Jiang <ziyue.jiang@gmail.com>
-
- 29 Nov, 2022 1 commit
-
-
Ziyue Jiang authored
* add DAG to split_module * add comment * add test case for DAG * remove print * add DAG middleware in scheduler * add test case for scheduler * remove break * recover old lifecycle Co-authored-by:Ziyue Jiang <ziyue.jiang@gmail.com>
-
- 18 Oct, 2022 1 commit
-
-
Super Daniel authored
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) * [fx] move meta registration * [fx] fix tests. * [fx] fix test. * [fx] fix. * [meta] refactor meta registration.py. * [fx] add compatibility descriptions. * [fx] polish import. * [fx] add a decorator. * [fx] fix tests. * [fx] remove print. * [fx] edit raise error. * [fx] edit raise error. * [fx] add type hint. * [fx] fix import in experimental. * [rpc] remove color debug. * [meta] fix naming.
-
- 29 Sep, 2022 1 commit
-
-
Kirigaya Kazuto authored
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera * [pipeline/chimera] test chimera | fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs | provide to process args and kwargs after forward
-
- 20 Sep, 2022 1 commit
-
-
Kirigaya Kazuto authored
* [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera * [pipeline/chimera] test chimera | fix bug of initializing
-
- 19 Sep, 2022 1 commit
-
-
Kirigaya Kazuto authored
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
-
- 07 Sep, 2022 1 commit
-
-
Kirigaya Kazuto authored
-