1. 04 Jul, 2023 12 commits
    • FoolPlayer's avatar
      [shardformer] Add dropout layer in shard model and refactor policy api (#3949) · 45927d55
      FoolPlayer authored
      * add dist dropout in model
      
      * update docstring and bert policy with dropout
      
      * refactor basepolicy and sharded, update bert
      
      * update format
      
      * update gpt2 policy
      
      * update bert policy
      
      * remove unused code
      
      * update readme for new policy usage
      45927d55
    • FoolPlayer's avatar
      [shardformer] Unit test (#3928) · a7313048
      FoolPlayer authored
      * fix bug in slicer, add slicer unit test
      
      * add dropout test
      
      * use pid as dropout seed
      
      * updata dropout test with local pattern
      
      * ad todo
      a7313048
    • FoolPlayer's avatar
      [shardformer] Align bert value (#3907) · f1cb5ac6
      FoolPlayer authored
      * add bert align test, fix dist loss bug
      
      * forward and backward align
      
      * add ignore index
      
      * add shardformer CI
      
      * add gather_output optional for user in shardconfig
      
      * update readme with optional gather_ouput
      
      * add dist crossentropy loss test, remove unused files
      
      * remove unused file
      
      * remove unused file
      
      * rename the file
      
      * polish code
      f1cb5ac6
    • FoolPlayer's avatar
      [shardformer] add gpt2 policy and modify shard and slicer to support (#3883) · 79f8d5d5
      FoolPlayer authored
      * add gpt2 policy and modify shard and slicer to support
      
      * remove unused code
      
      * polish code
      79f8d5d5
    • FoolPlayer's avatar
      update README (#3909) · 70173e31
      FoolPlayer authored
      70173e31
    • FoolPlayer's avatar
      [shardformer] add Dropout layer support different dropout pattern (#3856) · ab8a47f8
      FoolPlayer authored
      * add dropout layer, add dropout test
      
      * modify seed manager as context manager
      
      * add a copy of col_nn.layer
      
      * add dist_crossentropy loss; separate module test
      
      * polish the code
      
      * fix dist crossentropy loss
      ab8a47f8
    • FoolPlayer's avatar
      [shardformer] update readme with modules implement doc (#3834) · c594dc2f
      FoolPlayer authored
      * update readme with modules content
      
      * remove img
      c594dc2f
    • Frank Lee's avatar
      [shardformer] refactored the user api (#3828) · 4972e1f4
      Frank Lee authored
      * [shardformer] refactored the user api
      
      * polish code
      4972e1f4
    • Frank Lee's avatar
      [shardformer] updated readme (#3827) · 235792f1
      Frank Lee authored
      235792f1
    • FoolPlayer's avatar
      [shardformer]: Feature/shardformer, add some docstring and readme (#3816) · 8cc11235
      FoolPlayer authored
      * init shardformer code structure
      
      * add implement of sharder (inject and replace)
      
      * add implement of replace layer to colossal layer
      
      * separate different layer policy, add some notion
      
      * implement 1d and 2d slicer, can tell col or row
      
      * fix bug when slicing and inject model
      
      * fix some bug; add inference test example
      
      * add share weight and train example
      
      * add train
      
      * add docstring and readme
      
      * add docstring for other files
      
      * pre-commit
      8cc11235
    • FoolPlayer's avatar
      [shardformer] init shardformer code structure (#3731) · 8d68de76
      FoolPlayer authored
      * init shardformer code structure
      
      * add implement of sharder (inject and replace)
      
      * add implement of replace layer to colossal layer
      
      * separate different layer policy, add some notion
      
      * implement 1d and 2d slicer, can tell col or row
      
      * fix bug when slicing and inject model
      
      * fix some bug; add inference test example
      8d68de76
    • Wenhao Chen's avatar
      [chat] use official transformers and fix some issues (#4117) · 3d8d5d0d
      Wenhao Chen authored
      * feat: remove on_learn_epoch fn as not used
      
      * revert: add _on_learn_epoch fn
      
      * feat: remove NaiveStrategy
      
      * test: update train_prompts tests
      
      * fix: remove prepare_llama_tokenizer_and_embedding
      
      * test: add lora arg
      
      * feat: remove roberta support in train_prompts due to runtime errs
      
      * feat: remove deberta & roberta in rm as not used
      
      * test: remove deberta and roberta tests
      
      * feat: remove deberta and roberta models as not used
      
      * fix: remove calls to roberta
      
      * fix: remove prepare_llama_tokenizer_and_embedding
      
      * chore: update transformers version
      
      * docs: update transformers version
      
      * fix: fix actor inference
      
      * fix: fix ci
      
      * feat: change llama pad token to unk
      
      * revert: revert ddp setup_distributed
      
      * fix: change llama pad token to unk
      
      * revert: undo unnecessary changes
      
      * fix: use pip to install transformers
      3d8d5d0d
  2. 03 Jul, 2023 3 commits
  3. 30 Jun, 2023 1 commit
  4. 29 Jun, 2023 2 commits
    • Wenhao Chen's avatar
      [chat] remove naive strategy and split colossalai strategy (#4094) · edd75a59
      Wenhao Chen authored
      * feat: remove on_learn_epoch fn as not used
      
      * revert: add _on_learn_epoch fn
      
      * to: remove the use of NaiveStrategy
      
      * test: remove NaiveStrategy tests
      
      * feat: remove NaiveStrategy
      
      * style: modify comments and params
      
      * feat: split ColossalAIStrategy into LowLevelZeroStrategy and GeminiStrategy
      
      * fix: remove naive
      
      * fix: align with modified colossal strategy
      
      * fix: fix ddp _try_init_dist arg
      edd75a59
    • Wenhao Chen's avatar
      [chat] refactor trainer class (#4080) · b03d64d0
      Wenhao Chen authored
      * to: add SLTrainer
      
      * refactor: refactor RMTrainer and SFTTrainer
      
      * fix: fix init file
      
      * feat: remove on_learn_epoch fn as not used
      
      * fix: align with modified gemini arguments
      
      * to: add OnPolicyTrainer
      
      * revert: add _on_learn_epoch fn
      
      * refactor: refactor PPOTrainer
      
      * style: rename PPOTrainer argument
      
      * fix: align with modified PPO arguments
      
      * test: align with modified train_prompts arguments
      
      * chore: modify train_prompts
      
      * docs: align with modified arguments
      
      * fix: remove unnecessary output
      
      * fix: move dataloader to fit fn of SLTrainer
      
      * fix: move dataloader to fit fn of OnPolicyTrainer
      
      * fix: modify usage of prompt and pretrain dataloader
      b03d64d0
  5. 28 Jun, 2023 4 commits
  6. 27 Jun, 2023 2 commits
  7. 26 Jun, 2023 2 commits
  8. 25 Jun, 2023 3 commits
    • Baizhou Zhang's avatar
      Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching · 2c8ae37f
      Baizhou Zhang authored
      [gemini] Rename arguments in chunk configuration searching
      2c8ae37f
    • Wenhao Chen's avatar
      [chat] refactor strategy class with booster api (#3987) · 153b957a
      Wenhao Chen authored
      * refactor: adapt boost API in base and naive strategies
      
      * fix: initialize plugin after setup_distributed
      
      * fix: fix save_pretrained fn
      
      * refactor: adapt boost API in DDPStrategy
      
      * to: add _post_init check
      
      * to: fix ddp backward, modify ddp dataloader and unwrap
      
      * feat: adapt boost API in ColossalAIStrategy
      
      * fix: call setup_distributed before use get_current_device
      
      * fix: fix save_model and save_optimizer
      
      * test: remove save_sharded_optimizer test
      
      * style: apply formatter
      
      * fix: fix stage check and add comments
      
      * feat: allow dict type arg in strategy.prepare
      
      * to: temporarily remove lr_scheduler for testing
      
      * style: simplify init of ColossalAIStrategy
      
      * fix: fix lr_scheduler in sft and rm
      
      * style: modify comments
      
      * test: add train_prompts tests
      
      * fix: fix inference only case and use in train_prompts
      
      * test: skip failed tests in ci
      
      * style: fix CodeFactor check
      
      * fix: do not use model.to('cpu') with GeminiPlugin
      
      * test: enable colossalai_gemini tests
      
      * test: set CUDA_VISIBLE_DEVICES in ci
      
      * docs: add note
      153b957a
    • Baizhou Zhang's avatar
  9. 22 Jun, 2023 1 commit
  10. 19 Jun, 2023 5 commits
  11. 16 Jun, 2023 2 commits
  12. 15 Jun, 2023 3 commits