1. 05 Sep, 2023 1 commit
  2. 01 Sep, 2023 1 commit
  3. 24 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] improve compatibility and add static placement policy (#4479) · 27061426
      Hongxin Liu authored
      * [gemini] remove distributed-related part from colotensor (#4379)
      
      * [gemini] remove process group dependency
      
      * [gemini] remove tp part from colo tensor
      
      * [gemini] patch inplace op
      
      * [gemini] fix param op hook and update tests
      
      * [test] remove useless tests
      
      * [test] remove useless tests
      
      * [misc] fix requirements
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [misc] update requirements
      
      * [gemini] refactor gemini optimizer and gemini ddp (#4398)
      
      * [gemini] update optimizer interface
      
      * [gemini] renaming gemini optimizer
      
      * [gemini] refactor gemini ddp class
      
      * [example] update gemini related example
      
      * [example] update gemini related example
      
      * [plugin] fix gemini plugin args
      
      * [test] update gemini ckpt tests
      
      * [gemini] fix checkpoint io
      
      * [example] fix opt example requirements
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [gemini] add static placement policy (#4443)
      
      * [gemini] add static placement policy
      
      * [gemini] fix param offload
      
      * [test] update gemini tests
      
      * [plugin] update gemini plugin
      
      * [plugin] update gemini plugin docstr
      
      * [misc] fix flash attn requirement
      
      * [test] fix gemini checkpoint io test
      
      * [example] update resnet example result (#4457)
      
      * [example] update bert example result (#4458)
      
      * [doc] update gemini doc (#4468)
      
      * [example] update gemini related examples (#4473)
      
      * [example] update gpt example
      
      * [example] update dreambooth example
      
      * [example] update vit
      
      * [example] update opt
      
      * [example] update palm
      
      * [example] update vit and opt benchmark
      
      * [hotfix] fix bert in model zoo (#4480)
      
      * [hotfix] fix bert in model zoo
      
      * [test] remove chatglm gemini test
      
      * [test] remove sam gemini test
      
      * [test] remove vit gemini test
      
      * [hotfix] fix opt tutorial example (#4497)
      
      * [hotfix] fix opt tutorial example
      
      * [hotfix] fix opt tutorial example
      27061426
  4. 18 Aug, 2023 1 commit
    • Jianghai's avatar
      [shardformer] Pipeline/whisper (#4456) · 8739aa7f
      Jianghai authored
      * add some base tests and policies
      
      * finish whisper base model
      
      * add conditional generation
      
      * finish basic tests
      
      * whisper
      
      * finish whisper
      
      * finish whisper
      
      * del useless  whisper test
      
      * fix
      
      * add argmin to replace
      
      * finish revision
      8739aa7f
  5. 16 Aug, 2023 2 commits
  6. 15 Aug, 2023 15 commits
    • Jianghai's avatar
      [pipeline] add chatglm (#4363) · a88e9225
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      
      * add chatglm
      
      * add
      
      * chatglm
      
      * chatglm
      
      * finish chatglm
      
      * deletes
      
      * fix rmsnorm
      
      * chatglm
      
      * fix chatglm shard
      
      * init
      a88e9225
    • Jianghai's avatar
      [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) · f13954cd
      Jianghai authored
      * refactor tests
      
      * refactor bloom model
      
      * finish policy tests
      
      * refactor tests
      
      * fix test pure pipeline
      
      * remove test pipeline and cutdown launch process
      
      * refactor tests
      
      * refactor bloom model
      
      * finish policy tests
      
      * refactor tests
      
      * fix test pure pipeline
      
      * remove test pipeline and cutdown launch process
      f13954cd
    • LuGY's avatar
      [pipeline] add unit test for 1f1b (#4303) · d3c6cd66
      LuGY authored
      * add unit test for 1f1b
      
      * polish code
      
      * polish code and update ut version
      
      * fix
      d3c6cd66
    • Baizhou Zhang's avatar
      [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2
      Baizhou Zhang authored
      * modify t5 policy & add test
      
      * pipeline stage distribution for t5
      
      * complete t5 base policy
      
      * t5 stack: halfway
      
      * modify gpt2 pipeline test
      
      * complete pipeline forward for T5Stack/T5EncoderModel
      
      * fix docstring
      
      * move t5 util tests to test_pipeline
      36e546b2
    • Jianghai's avatar
      [pipeline] OPT model pipeline (#4258) · d8408d18
      Jianghai authored
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * fix opt
      
      * set transformers version
      
      * refactor the test pipeline
      d8408d18
    • Jianghai's avatar
      [pipeline] All bert models (#4233) · e7cc62d7
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      
      * add pure pipeline test
      
      * finish some bert models
      
      * finish all bert models
      
      * finish bert tests
      
      * fix bugs
      
      * fix bugs
      
      * fix test pipeline
      
      * fix data gen for qa
      
      * update the set pipeline forward
      
      * shared params
      
      * fix bugs
      e7cc62d7
    • Jianghai's avatar
      [pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292
      Jianghai authored
      * move bert related pipeline components to shardformer
      
      * fix bugs
      
      * revision
      
      * fix bert model tests
      
      * fix bert_lm_head model tests
      
      * fix tests
      
      * fix tests
      
      * done checks
      
      * skip bloom
      f3bcc292
    • Jianghai's avatar
      [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) · c5ea7280
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      c5ea7280
    • Jianghai's avatar
      [pipeline] build bloom model and policy , revise the base class of policy (#4161) · 90a65ea6
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      90a65ea6
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · c552cefa
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      c552cefa
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 5c897ddb
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      5c897ddb
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · e8e7e492
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      e8e7e492
    • Hongxin Liu's avatar
      [pipeline] refactor 1f1b schedule (#4115) · f51ce1bc
      Hongxin Liu authored
      * [api] update optimizer wrapper to fit pipeline
      
      * [pipeline] add base schedule
      
      * [pipeline] add 1f1b schedule
      
      * [test] add pipeline schedule utils test
      
      * [pipeline] fix import
      f51ce1bc
    • Hongxin Liu's avatar
      [pipeline] implement p2p communication (#4100) · 45fdc9b4
      Hongxin Liu authored
      * [pipeline] add p2p communication
      
      * [test] add p2p communication test
      
      * [test] add rerun decorator
      
      * [test] rename to avoid conflict
      45fdc9b4
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 42254422
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      42254422
  7. 06 Apr, 2023 1 commit
  8. 12 Dec, 2022 1 commit
  9. 08 Dec, 2022 1 commit
  10. 05 Dec, 2022 1 commit
  11. 29 Nov, 2022 1 commit
  12. 18 Oct, 2022 1 commit
    • Super Daniel's avatar
      [fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions... · 393f5940
      Super Daniel authored
      [fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
      
      * [fx] move meta registration
      
      * [fx] fix tests.
      
      * [fx] fix test.
      
      * [fx] fix.
      
      * [meta] refactor meta registration.py.
      
      * [fx] add compatibility descriptions.
      
      * [fx] polish import.
      
      * [fx] add a decorator.
      
      * [fx] fix tests.
      
      * [fx] remove print.
      
      * [fx] edit raise error.
      
      * [fx] edit raise error.
      
      * [fx] add type hint.
      
      * [fx] fix import in experimental.
      
      * [rpc] remove color debug.
      
      * [meta] fix naming.
      393f5940
  13. 29 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/pytree] add pytree to process args and kwargs | provide... · 9708638d
      Kirigaya Kazuto authored
      [pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
      
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      
      * [pipeline/chimera] test chimera | fix bug of initializing
      
      * [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
      9708638d
  14. 20 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] test chimera | fix bug of initializing (#1615) · 170fa810
      Kirigaya Kazuto authored
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      
      * [pipeline/chimera] test chimera | fix bug of initializing
      170fa810
  15. 19 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more... · edc9e419
      Kirigaya Kazuto authored
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
      
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      edc9e419
  16. 07 Sep, 2022 1 commit
  17. 01 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local... · f1e18362
      Kirigaya Kazuto authored
      [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
      
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP
      
      * [pipeline/pipleline_process_group] remove comment
      
      * [pipeline/pipleline_process_group] remove comment
      
      * [pipeline/pipleline_process_group] skip process group test
      
      * [pipeline/pipleline_process_group] remove test named function
      f1e18362
  18. 26 Aug, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) · 5a6fd71f
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      5a6fd71f
  19. 25 Aug, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) · 9145aef2
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      9145aef2
  20. 24 Aug, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when... · a6c87491
      Kirigaya Kazuto authored
      [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483)
      
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      a6c87491
  21. 22 Aug, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) · bb5f5289
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * Delete p2p_v2.py
      
      * Delete _pipeline_schedule_v2.py
      
      * Delete test_object_list_p2p_v2.py
      
      * Delete test_boardcast_send_recv_v2.py
      
      * Delete test_cifar_with_data_pipeline_tensor_v2.py
      bb5f5289
  22. 10 Jun, 2022 1 commit