1. 15 Aug, 2023 25 commits
    • Jianghai's avatar
      [pipeline] All bert models (#4233) · e7cc62d7
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      
      * add pure pipeline test
      
      * finish some bert models
      
      * finish all bert models
      
      * finish bert tests
      
      * fix bugs
      
      * fix bugs
      
      * fix test pipeline
      
      * fix data gen for qa
      
      * update the set pipeline forward
      
      * shared params
      
      * fix bugs
      e7cc62d7
    • Baizhou Zhang's avatar
      [pipeline] add pipeline forward for variants of gpt2 (#4238) · a14d3520
      Baizhou Zhang authored
      * add forward for GPTLMHeadModel
      
      * add test for gpt_lm
      
      * arranging get_held_layers method
      
      * arrange forward replacement
      
      * add forward for GPT2ForTokenClassification
      
      * add forward for GPT2ForSequenceClassification
      
      * fix test_shard_gpt2.py
      
      * add GPT2DoubleHeadsmodel & fix bugs
      
      * add id checking in get_shared_params
      a14d3520
    • Hongxin Liu's avatar
      [shardformer] fix base policy (#4229) · 7e4de520
      Hongxin Liu authored
      7e4de520
    • Baizhou Zhang's avatar
      [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) · 208ac8f2
      Baizhou Zhang authored
      * * fix typehint & docstring in sharder.py
      
      * * update pipeline forward for GPT2Model
      
      * * add test for pipeline forward of GPT2Model
      
      * * add cache cleaning in gpt2 test
      
      * * change assert to raise command
      208ac8f2
    • Jianghai's avatar
      [pipeline] add bloom model pipeline (#4210) · 37d22f68
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * finish bloom model
      
      * test shard gpt2
      
      * clear cache
      37d22f68
    • Jianghai's avatar
      [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) · 31bcf867
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      
      * finish llama
      
      * causal lm and sequence classification
      
      * revision
      31bcf867
    • Jianghai's avatar
      [pipeline] Llama pipeline (#4205) · 16220310
      Jianghai authored
      * bloom policy
      
      * llama pipeline forward and tests
      
      * fix the output and attention_mask
      
      * fix name
      
      * bind argument to policy
      
      * Revert "bloom policy"
      
      This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.
      
      This policy should be revert and copied to feature/bloom
      
      * revert the bloom changes
      
      * cancel unneeded inputs
      
      * gpt
      16220310
    • Jianghai's avatar
      [pipeline] Bert pipeline for shardformer and its tests (#4197) · 1094e0f0
      Jianghai authored
      * add pipeline forward
      
      * complete pipeline forward check
      
      * fix bert forward without pipeline
      
      * fix comments
      
      * discard useless line
      
      * add todo
      
      * clean prints
      
      * fix distribute layers
      1094e0f0
    • Hongxin Liu's avatar
      [shardformer] support lazy init (#4202) · 890774b2
      Hongxin Liu authored
      * [shardformer] support lazy init
      
      * [shardformer] linear support lazy init
      
      * [shardformer] embedding support lazy init
      
      * [shardformer] norm support lazy init
      
      * [shardformer] fused linear support lazy init
      
      * [test] update shardformer test layer
      
      * [test] shardformer with lazy init fit ddp
      
      * [lazy] hotfix deepcopy of param
      
      * [shardformer] fix bert policy and update test
      
      * [shardformer] fix bloom policy and update test
      
      * [shardformer] fix opt policy and update test
      
      * [shardformer] fix t5 policy and update test
      
      * [shardformer] fix gpt2 policy and update test
      
      * [shardformer] fix llama policy and update test
      890774b2
    • Jianghai's avatar
      [pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292
      Jianghai authored
      * move bert related pipeline components to shardformer
      
      * fix bugs
      
      * revision
      
      * fix bert model tests
      
      * fix bert_lm_head model tests
      
      * fix tests
      
      * fix tests
      
      * done checks
      
      * skip bloom
      f3bcc292
    • Jianghai's avatar
      [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) · c5ea7280
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      
      * add bert_for_pretraining forward and policy
      
      * fix typos
      
      * cancel warning
      
      * change the imediate output to default dict
      
      * change the default output of get_shared_params
      c5ea7280
    • ver217's avatar
      [shardformer] fix type hint · d35bd7d0
      ver217 authored
      d35bd7d0
    • ver217's avatar
      [shardformer] rename policy file name · 1ed3f8a2
      ver217 authored
      1ed3f8a2
    • ver217's avatar
      [test] add shard util tests · 5fc60a3a
      ver217 authored
      5fc60a3a
    • ver217's avatar
      [test] update shardformer tests · 2d6cc07f
      ver217 authored
      2d6cc07f
    • ver217's avatar
      [pipeline] update shardformer docstring · b0b8ad28
      ver217 authored
      b0b8ad28
    • ver217's avatar
      [pipeline] update shardformer policy · 59f6f573
      ver217 authored
      59f6f573
    • Jianghai's avatar
      [pipeline] build bloom model and policy , revise the base class of policy (#4161) · 90a65ea6
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      
      * add bloom model and policy ,revise the base class of policy
      
      * revise
      
      * revision
      
      * add bert_for_pretraining
      90a65ea6
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · c552cefa
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      c552cefa
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 5c897ddb
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      5c897ddb
    • Jianghai's avatar
      [pipeline]add pipeline policy and bert forward (#4130) · e8e7e492
      Jianghai authored
      * add pipeline policy and bert forward to be done
      
      * add bertmodel pipeline forward and make tests
      
      * add Bert_Policy and test for policy
      
      * update formatting
      
      * update formatting
      
      * update the code
      
      * fix bugs
      
      * fix name confilt
      e8e7e492
    • Hongxin Liu's avatar
      [pipeline] refactor 1f1b schedule (#4115) · f51ce1bc
      Hongxin Liu authored
      * [api] update optimizer wrapper to fit pipeline
      
      * [pipeline] add base schedule
      
      * [pipeline] add 1f1b schedule
      
      * [test] add pipeline schedule utils test
      
      * [pipeline] fix import
      f51ce1bc
    • Hongxin Liu's avatar
      [pipeline] implement p2p communication (#4100) · 45fdc9b4
      Hongxin Liu authored
      * [pipeline] add p2p communication
      
      * [test] add p2p communication test
      
      * [test] add rerun decorator
      
      * [test] rename to avoid conflict
      45fdc9b4
    • Hongxin Liu's avatar
      [pipeline] add stage manager (#4093) · 42254422
      Hongxin Liu authored
      * [pipeline] add stage manager
      
      * [test] add pipeline stage manager test
      
      * [pipeline] add docstring for stage manager
      42254422
    • Hongxin Liu's avatar
      [cluster] add process group mesh (#4039) · 5e1a9d48
      Hongxin Liu authored
      * [cluster] add process group mesh
      
      * [test] add process group mesh test
      
      * force sync
      5e1a9d48
  2. 14 Aug, 2023 2 commits
  3. 11 Aug, 2023 1 commit
  4. 10 Aug, 2023 1 commit
  5. 09 Aug, 2023 1 commit
  6. 04 Aug, 2023 4 commits
  7. 02 Aug, 2023 1 commit
    • Wenhao Chen's avatar
      [chat] fix bugs and add unit tests (#4213) · da4f7b85
      Wenhao Chen authored
      * style: rename replay buffer
      
      Experience replay is typically for off policy algorithms.
      Use this name in PPO maybe misleading.
      
      * fix: fix wrong zero2 default arg
      
      * test: update experience tests
      
      * style: rename zero_pad fn
      
      * fix: defer init in CycledDataLoader
      
      * test: add benchmark test
      
      * style: rename internal fn of generation
      
      * style: rename internal fn of lora
      
      * fix: remove unused loss fn
      
      * fix: remove unused utils fn
      
      * refactor: remove generate_with_actor fn
      
      * fix: fix type annotation
      
      * test: add models tests
      
      * fix: skip llama due to long execution time
      
      * style: modify dataset
      
      * style: apply formatter
      
      * perf: update reward dataset
      
      * fix: fix wrong IGNORE_INDEX in sft dataset
      
      * fix: remove DataCollatorForSupervisedDataset
      
      * test: add dataset tests
      
      * style: apply formatter
      
      * style: rename test_ci to test_train
      
      * feat: add llama in inference
      
      * test: add inference tests
      
      * test: change test scripts directory
      
      * fix: update ci
      
      * fix: fix typo
      
      * fix: skip llama due to oom
      
      * fix: fix file mod
      
      * style: apply formatter
      
      * refactor: remove duplicated llama_gptq
      
      * style: apply formatter
      
      * to: update rm test
      
      * feat: add tokenizer arg
      
      * feat: add download model script
      
      * test: update train tests
      
      * fix: modify gemini load and save pretrained
      
      * test: update checkpoint io test
      
      * to: modify nproc_per_node
      
      * fix: do not remove existing dir
      
      * fix: modify save path
      
      * test: add random choice
      
      * fix: fix sft path
      
      * fix: enlarge nproc_per_node to avoid oom
      
      * fix: add num_retry
      
      * fix: make lora config of rm and critic consistent
      
      * fix: add warning about lora weights
      
      * fix: skip some gpt2 tests
      
      * fix: remove grad ckpt in rm and critic due to errors
      
      * refactor: directly use Actor in train_sft
      
      * test: add more arguments
      
      * fix: disable grad ckpt when using lora
      
      * fix: fix save_pretrained and related tests
      
      * test: enable zero2 tests
      
      * revert: remove useless fn
      
      * style: polish code
      
      * test: modify test args
      da4f7b85
  8. 01 Aug, 2023 5 commits