1. 15 Aug, 2023 40 commits
    • Baizhou Zhang's avatar
    • Hongxin Liu's avatar
      [hotfix] fix gemini and zero test (#4333) · 411cf1d2
      Hongxin Liu authored
      * [hotfix] fix gemini and zero test
      
      * [hotfix] fix lazy init test
      
      * [hotfix] fix lazy init test
      411cf1d2
    • Hongxin Liu's avatar
      [plugin] add 3d parallel plugin (#4295) · 261eab02
      Hongxin Liu authored
      * [amp] add mixed precision optimizer
      
      * [plugin] add 3d parallel plugin
      
      * [booster] support pipeline
      
      * [plugin] 3d parallel plugin support clip grad norm
      
      * [shardformer] fix sharder and add plugin test
      
      * [plugin] rename 3d parallel plugin
      
      * [ci] support testmon core pkg change detection (#4305)
      
      * [hotfix] debug testmon
      
      * [hotfix] fix llama
      
      * [hotfix] fix p2p bugs
      
      * [hotfix] fix requirements
      261eab02
    • FoolPlayer's avatar
      [shardformer] support pipeline base vit model (#4284) · b3f5d7a3
      FoolPlayer authored
      
      
      * Feature/vit support (#4182)
      
      * [shardformer] added tests
      
      * [shardformer] vit test finish and support
      
      * fix attention dropout
      
      * support base vit pipeline
      
      * support vit downstream model
      
      * fix vit shard test
      
      * modify hidden states return type
      
      ---------
      Co-authored-by: default avatarKun Lin <81014421+klhhhhh@users.noreply.github.com>
      b3f5d7a3
    • Baizhou Zhang's avatar
      [pipeline] add pipeline support for all T5 models (#4310) · 083d7da3
      Baizhou Zhang authored
      * complete policy for T5Model & T5ForConditionalGeneration
      
      * modify function signature in forwards
      
      * add forward for T5model
      
      * add forward for T5ForConditionalGeneration
      
      * fix a bug
      
      * fix hidden_states transporting in decoder
      
      * fix the passing of encoder_outputs
      083d7da3
    • Jianghai's avatar
      [pipeline] test pure pipeline process using llama (#4218) · d0807122
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      
      * add pure pipeline test
      
      * fixed version
      
      * fixed version
      
      * pure pipeline
      d0807122
    • Baizhou Zhang's avatar
      [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2
      Baizhou Zhang authored
      * modify t5 policy & add test
      
      * pipeline stage distribution for t5
      
      * complete t5 base policy
      
      * t5 stack: halfway
      
      * modify gpt2 pipeline test
      
      * complete pipeline forward for T5Stack/T5EncoderModel
      
      * fix docstring
      
      * move t5 util tests to test_pipeline
      36e546b2
    • Jianghai's avatar
      [pipeline] reformat for unified design (#4283) · 18ebcf40
      Jianghai authored
      * bert_reformat
      
      * reformat
      
      * reformat
      
      * fix a typo
      
      * format
      
      * format
      
      * fix bug
      18ebcf40
    • Jianghai's avatar
      [hotfix] fix opt pipeline (#4293) · 0a8f3c85
      Jianghai authored
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * fix opt
      
      * set transformers version
      
      * refactor the test pipeline
      
      * fix bug
      0a8f3c85
    • Jianghai's avatar
      [pipeline] OPT model pipeline (#4258) · d8408d18
      Jianghai authored
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * opt forward and test
      
      * pause
      
      * finish opt model pipeline
      
      * finish opt pipeline
      
      * fix opt
      
      * set transformers version
      
      * refactor the test pipeline
      d8408d18
    • Baizhou Zhang's avatar
      [pipeline] refactor gpt2 pipeline forwards (#4287) · b774d5ea
      Baizhou Zhang authored
      * move gpt2 pipeline forwards to modeling folder
      
      * check pipeline status when adding replacing policy
      
      * fix typehint
      
      * fix arguments processing in gpt2_model_forward
      b774d5ea
    • Hongxin Liu's avatar
      [shardformer] support inplace sharding (#4251) · d921ce83
      Hongxin Liu authored
      * [shardformer] embedding support inplace sharding
      
      * [shardformer] linear support inplace sharding
      
      * [shardformer] layernorm support inplace sharding
      
      * [shardformer] qkv support inplace sharding
      
      * [test] update shardformer layer test
      
      * [shardformer] fix shared param sharding
      
      * [shardformer] fix bert policy
      
      * [shardformer] fix bloom policy
      
      * [shardformer] fix llama policy
      
      * [shardformer] fix opt policy
      
      * [shardformer] fix t5 policy
      
      * [shardformer] fix fused qkv linear
      
      * [shardformer] fix bugs
      
      * force sync
      
      * [test] fix bugs
      
      * [test] fix transformer version
      d921ce83
    • Baizhou Zhang's avatar
      [pipeline] support shardformer for GPT2ForQuestionAnswering & complete... · 2a2eacfa
      Baizhou Zhang authored
      [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)
      
      * change for transformers loggers
      
      * add forward for GPT2ForQuestionAnswering
      
      * fix assert
      
      * fix torchrec test
      2a2eacfa
    • Jianghai's avatar
      [bugs] hot fix some testing bugs for new models (#4268) · d9be0472
      Jianghai authored
      * hot fix
      
      * hot fx tracer
      d9be0472
    • Jianghai's avatar
      [pipeline] finish bloom models pipeline and tests (#4223) · 34f0e34a
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * finish bloom model
      
      * test shard gpt2
      
      * clear cache
      
      * support all bloom models
      
      * add bloom models policies
      
      * finish bloom pipeline and tests
      
      * add set pipeline
      
      * finish bloom
      34f0e34a
    • Jianghai's avatar
      [pipeline] All bert models (#4233) · e7cc62d7
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      
      * add pure pipeline test
      
      * finish some bert models
      
      * finish all bert models
      
      * finish bert tests
      
      * fix bugs
      
      * fix bugs
      
      * fix test pipeline
      
      * fix data gen for qa
      
      * update the set pipeline forward
      
      * shared params
      
      * fix bugs
      e7cc62d7
    • Baizhou Zhang's avatar
      [pipeline] add pipeline forward for variants of gpt2 (#4238) · a14d3520
      Baizhou Zhang authored
      * add forward for GPTLMHeadModel
      
      * add test for gpt_lm
      
      * arranging get_held_layers method
      
      * arrange forward replacement
      
      * add forward for GPT2ForTokenClassification
      
      * add forward for GPT2ForSequenceClassification
      
      * fix test_shard_gpt2.py
      
      * add GPT2DoubleHeadsmodel & fix bugs
      
      * add id checking in get_shared_params
      a14d3520
    • Hongxin Liu's avatar
      [shardformer] fix base policy (#4229) · 7e4de520
      Hongxin Liu authored
      7e4de520
    • Baizhou Zhang's avatar
      [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) · 208ac8f2
      Baizhou Zhang authored
      * * fix typehint & docstring in sharder.py
      
      * * update pipeline forward for GPT2Model
      
      * * add test for pipeline forward of GPT2Model
      
      * * add cache cleaning in gpt2 test
      
      * * change assert to raise command
      208ac8f2
    • Jianghai's avatar
      [pipeline] add bloom model pipeline (#4210) · 37d22f68
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * finish bloom model
      
      * test shard gpt2
      
      * clear cache
      37d22f68
    • Jianghai's avatar
      [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) · 31bcf867
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      31bcf867
    • Jianghai's avatar
      [pipeline] Llama pipeline (#4205) · 16220310
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      16220310
    • Jianghai's avatar
      [pipeline] Bert pipeline for shardformer and its tests (#4197) · 1094e0f0
      Jianghai authored
      * add pipeline forward
      
      * complete pipeline forward check
      
      * fix bert forward without pipeline
      
      * fix comments
      
      * discard useless line
      
      * add todo
      
      * clean prints
      
      * fix distribute layers
      1094e0f0
    • Hongxin Liu's avatar
      [shardformer] support lazy init (#4202) · 890774b2
      Hongxin Liu authored
      * [shardformer] support lazy init
      
      * [shardformer] linear support lazy init
      
      * [shardformer] embedding support lazy init
      
      * [shardformer] norm support lazy init
      
      * [shardformer] fused linear support lazy init
      
      * [test] update shardformer test layer
      
      * [test] shardformer with lazy init fit ddp
      
      * [lazy] hotfix deepcopy of param
      
      * [shardformer] fix bert policy and update test
      
      * [shardformer] fix bloom policy and update test
      
      * [shardformer] fix opt policy and update test
      
      * [shardformer] fix t5 policy and update test
      
      * [shardformer] fix gpt2 policy and update test
      
      * [shardformer] fix llama policy and update test
      890774b2
    • Jianghai's avatar
      [pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292
      Jianghai authored
      * move bert related pipeline components to shardformer
      
      * fix bugs
      
      * revision
      
      * fix bert model tests
      
      * fix bert_lm_head model tests
      
      * fix tests
      
      * fix tests
      
      * done checks
      
      * skip bloom
      f3bcc292
    • Jianghai's avatar
      [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) · c5ea7280
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      c5ea7280
    • ver217's avatar
      [shardformer] fix type hint · d35bd7d0
      ver217 authored
      d35bd7d0
    • ver217's avatar
      [shardformer] rename policy file name · 1ed3f8a2
      ver217 authored
      1ed3f8a2
    • ver217's avatar
      [test] add shard util tests · 5fc60a3a
      ver217 authored
      5fc60a3a
    • ver217's avatar
      [test] update shardformer tests · 2d6cc07f
      ver217 authored
      2d6cc07f
    • ver217's avatar
      [pipeline] update shardformer docstring · b0b8ad28
      ver217 authored
      b0b8ad28
    • ver217's avatar
      [pipeline] update shardformer policy · 59f6f573
      ver217 authored
      59f6f573
    • Jianghai's avatar
      [pipeline] build bloom model and policy , revise the base class of policy (#4161) · 90a65ea6
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      90a65ea6
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · c552cefa
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      c552cefa
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 5c897ddb
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      5c897ddb
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · e8e7e492
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      e8e7e492
    • Hongxin Liu's avatar
      [pipeline] refactor 1f1b schedule (#4115) · f51ce1bc
      Hongxin Liu authored
      * [api] update optimizer wrapper to fit pipeline
      
      * [pipeline] add base schedule
      
      * [pipeline] add 1f1b schedule
      
      * [test] add pipeline schedule utils test
      
      * [pipeline] fix import
      f51ce1bc
    • Hongxin Liu's avatar
      [pipeline] implement p2p communication (#4100) · 45fdc9b4
      Hongxin Liu authored
      * [pipeline] add p2p communication
      
      * [test] add p2p communication test
      
      * [test] add rerun decorator
      
      * [test] rename to avoid conflict
      45fdc9b4
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 42254422
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      42254422
    • Hongxin Liu's avatar
      [cluster] add process group mesh (#4039) · 5e1a9d48
      Hongxin Liu authored
      * [cluster] add process group mesh
      
      * [test] add process group mesh test
      
      * force sync
      5e1a9d48