1. 15 Jan, 2024 1 commit
  2. 08 Jan, 2024 1 commit
  3. 03 Jan, 2024 1 commit
  4. 22 Dec, 2023 1 commit
    • Wenhao Chen's avatar
      [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) · 4fa689fc
      Wenhao Chen authored
      * test: add more p2p tests
      
      * fix: remove send_forward_recv_forward as p2p op list need to use the same group
      
      * fix: make send and receive atomic
      
      * feat: update P2PComm fn
      
      * feat: add metadata cache in 1f1b
      
      * feat: add metadata cache in interleaved pp
      
      * feat: modify is_xx_stage fn
      
      * revert: add _broadcast_object_list
      
      * feat: add interleaved pp in llama policy
      
      * feat: set NCCL_BUFFSIZE in HybridParallelPlugin
      4fa689fc
  5. 28 Nov, 2023 1 commit
  6. 19 Sep, 2023 1 commit
  7. 18 Sep, 2023 1 commit
    • Hongxin Liu's avatar
      [legacy] clean up legacy code (#4743) · b5f9e37c
      Hongxin Liu authored
      * [legacy] remove outdated codes of pipeline (#4692)
      
      * [legacy] remove cli of benchmark and update optim (#4690)
      
      * [legacy] remove cli of benchmark and update optim
      
      * [doc] fix cli doc test
      
      * [legacy] fix engine clip grad norm
      
      * [legacy] remove outdated colo tensor (#4694)
      
      * [legacy] remove outdated colo tensor
      
      * [test] fix test import
      
      * [legacy] move outdated zero to legacy (#4696)
      
      * [legacy] clean up utils (#4700)
      
      * [legacy] clean up utils
      
      * [example] update examples
      
      * [legacy] clean up amp
      
      * [legacy] fix amp module
      
      * [legacy] clean up gpc (#4742)
      
      * [legacy] clean up context
      
      * [legacy] clean core, constants and global vars
      
      * [legacy] refactor initialize
      
      * [example] fix examples ci
      
      * [example] fix examples ci
      
      * [legacy] fix tests
      
      * [example] fix gpt example
      
      * [example] fix examples ci
      
      * [devops] fix ci installation
      
      * [example] fix examples ci
      b5f9e37c
  8. 11 Sep, 2023 1 commit
    • Hongxin Liu's avatar
      [legacy] move communication and nn to legacy and refactor logger (#4671) · 554aa959
      Hongxin Liu authored
      * [legacy] move communication to legacy (#4640)
      
      * [legacy] refactor logger and clean up legacy codes (#4654)
      
      * [legacy] make logger independent to gpc
      
      * [legacy] make optim independent to registry
      
      * [legacy] move test engine to legacy
      
      * [legacy] move nn to legacy (#4656)
      
      * [legacy] move nn to legacy
      
      * [checkpointio] fix save hf config
      
      * [test] remove useledd rpc pp test
      
      * [legacy] fix nn init
      
      * [example] skip tutorial hybriad parallel example
      
      * [devops] test doc check
      
      * [devops] test doc check
      554aa959
  9. 07 Sep, 2023 1 commit
  10. 05 Sep, 2023 1 commit
  11. 01 Sep, 2023 1 commit
  12. 24 Aug, 2023 1 commit
    • Hongxin Liu's avatar
      [gemini] improve compatibility and add static placement policy (#4479) · 27061426
      Hongxin Liu authored
      * [gemini] remove distributed-related part from colotensor (#4379)
      
      * [gemini] remove process group dependency
      
      * [gemini] remove tp part from colo tensor
      
      * [gemini] patch inplace op
      
      * [gemini] fix param op hook and update tests
      
      * [test] remove useless tests
      
      * [test] remove useless tests
      
      * [misc] fix requirements
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [test] fix model zoo
      
      * [misc] update requirements
      
      * [gemini] refactor gemini optimizer and gemini ddp (#4398)
      
      * [gemini] update optimizer interface
      
      * [gemini] renaming gemini optimizer
      
      * [gemini] refactor gemini ddp class
      
      * [example] update gemini related example
      
      * [example] update gemini related example
      
      * [plugin] fix gemini plugin args
      
      * [test] update gemini ckpt tests
      
      * [gemini] fix checkpoint io
      
      * [example] fix opt example requirements
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [example] fix opt example
      
      * [gemini] add static placement policy (#4443)
      
      * [gemini] add static placement policy
      
      * [gemini] fix param offload
      
      * [test] update gemini tests
      
      * [plugin] update gemini plugin
      
      * [plugin] update gemini plugin docstr
      
      * [misc] fix flash attn requirement
      
      * [test] fix gemini checkpoint io test
      
      * [example] update resnet example result (#4457)
      
      * [example] update bert example result (#4458)
      
      * [doc] update gemini doc (#4468)
      
      * [example] update gemini related examples (#4473)
      
      * [example] update gpt example
      
      * [example] update dreambooth example
      
      * [example] update vit
      
      * [example] update opt
      
      * [example] update palm
      
      * [example] update vit and opt benchmark
      
      * [hotfix] fix bert in model zoo (#4480)
      
      * [hotfix] fix bert in model zoo
      
      * [test] remove chatglm gemini test
      
      * [test] remove sam gemini test
      
      * [test] remove vit gemini test
      
      * [hotfix] fix opt tutorial example (#4497)
      
      * [hotfix] fix opt tutorial example
      
      * [hotfix] fix opt tutorial example
      27061426
  13. 18 Aug, 2023 1 commit
    • Jianghai's avatar
      [shardformer] Pipeline/whisper (#4456) · 8739aa7f
      Jianghai authored
      * add some base tests and policies
      
      * finish whisper base model
      
      * add conditional generation
      
      * finish basic tests
      
      * whisper
      
      * finish whisper
      
      * finish whisper
      
      * del useless  whisper test
      
      * fix
      
      * add argmin to replace
      
      * finish revision
      8739aa7f
  14. 16 Aug, 2023 2 commits
  15. 15 Aug, 2023 15 commits
    • Jianghai's avatar
      [pipeline] add chatglm (#4363) · a88e9225
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      
      * add chatglm
      
      * add
      
      * chatglm
      
      * chatglm
      
      * finish chatglm
      
      * deletes
      
      * fix rmsnorm
      
      * chatglm
      
      * fix chatglm shard
      
      * init
      a88e9225
    • Jianghai's avatar
      [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) · f13954cd
      Jianghai authored
      * refactor tests
      
      * refactor bloom model
      
      * finish policy tests
      
      * refactor tests
      
      * fix test pure pipeline
      
      * remove test pipeline and cutdown launch process
      
      * refactor tests
      
      * refactor bloom model
      
      * finish policy tests
      
      * refactor tests
      
      * fix test pure pipeline
      
      * remove test pipeline and cutdown launch process
      f13954cd
    • LuGY's avatar
      [pipeline] add unit test for 1f1b (#4303) · d3c6cd66
      LuGY authored
      * add unit test for 1f1b
      
      * polish code
      
      * polish code and update ut version
      
      * fix
      d3c6cd66
    • Baizhou Zhang's avatar
      [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2
      Baizhou Zhang authored
      * modify t5 policy & add test
      
      * pipeline stage distribution for t5
      
      * complete t5 base policy
      
      * t5 stack: halfway
      
      * modify gpt2 pipeline test
      
      * complete pipeline forward for T5Stack/T5EncoderModel
      
      * fix docstring
      
      * move t5 util tests to test_pipeline
      36e546b2
    • Jianghai's avatar
      [pipeline] OPT model pipeline (#4258) · d8408d18
      Jianghai authored
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * fix opt
      
      * set transformers version
      
      * refactor the test pipeline
      d8408d18
    • Jianghai's avatar
      [pipeline] All bert models (#4233) · e7cc62d7
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      
      * add pure pipeline test
      
      * finish some bert models
      
      * finish all bert models
      
      * finish bert tests
      
      * fix bugs
      
      * fix bugs
      
      * fix test pipeline
      
      * fix data gen for qa
      
      * update the set pipeline forward
      
      * shared params
      
      * fix bugs
      e7cc62d7
    • Jianghai's avatar
      [pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292
      Jianghai authored
      * move bert related pipeline components to shardformer
      
      * fix bugs
      
      * revision
      
      * fix bert model tests
      
      * fix bert_lm_head model tests
      
      * fix tests
      
      * fix tests
      
      * done checks
      
      * skip bloom
      f3bcc292
    • Jianghai's avatar
      [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) · c5ea7280
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      c5ea7280
    • Jianghai's avatar
      [pipeline] build bloom model and policy , revise the base class of policy (#4161) · 90a65ea6
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      90a65ea6
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · c552cefa
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      c552cefa
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 5c897ddb
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      5c897ddb
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · e8e7e492
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      e8e7e492
    • Hongxin Liu's avatar
      [pipeline] refactor 1f1b schedule (#4115) · f51ce1bc
      Hongxin Liu authored
      * [api] update optimizer wrapper to fit pipeline
      
      * [pipeline] add base schedule
      
      * [pipeline] add 1f1b schedule
      
      * [test] add pipeline schedule utils test
      
      * [pipeline] fix import
      f51ce1bc
    • Hongxin Liu's avatar
      [pipeline] implement p2p communication (#4100) · 45fdc9b4
      Hongxin Liu authored
      * [pipeline] add p2p communication
      
      * [test] add p2p communication test
      
      * [test] add rerun decorator
      
      * [test] rename to avoid conflict
      45fdc9b4
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 42254422
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      42254422
  16. 06 Apr, 2023 1 commit
  17. 12 Dec, 2022 1 commit
  18. 08 Dec, 2022 1 commit
  19. 05 Dec, 2022 1 commit
  20. 29 Nov, 2022 1 commit
  21. 18 Oct, 2022 1 commit
    • Super Daniel's avatar
      [fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions... · 393f5940
      Super Daniel authored
      [fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
      
      * [fx] move meta registration
      
      * [fx] fix tests.
      
      * [fx] fix test.
      
      * [fx] fix.
      
      * [meta] refactor meta registration.py.
      
      * [fx] add compatibility descriptions.
      
      * [fx] polish import.
      
      * [fx] add a decorator.
      
      * [fx] fix tests.
      
      * [fx] remove print.
      
      * [fx] edit raise error.
      
      * [fx] edit raise error.
      
      * [fx] add type hint.
      
      * [fx] fix import in experimental.
      
      * [rpc] remove color debug.
      
      * [meta] fix naming.
      393f5940
  22. 29 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/pytree] add pytree to process args and kwargs | provide... · 9708638d
      Kirigaya Kazuto authored
      [pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
      
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      
      * [pipeline/chimera] test chimera | fix bug of initializing
      
      * [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
      9708638d
  23. 20 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] test chimera | fix bug of initializing (#1615) · 170fa810
      Kirigaya Kazuto authored
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      
      * [pipeline/chimera] test chimera | fix bug of initializing
      170fa810
  24. 19 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more... · edc9e419
      Kirigaya Kazuto authored
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
      
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      edc9e419
  25. 07 Sep, 2022 1 commit