Commits · 45927d552746c1e02c9a860ca5375a5eb3facda4 · OpenDAS / ColossalAI

04 Jul, 2023 12 commits

[shardformer] Add dropout layer in shard model and refactor policy api (#3949) · 45927d55

FoolPlayer authored Jun 12, 2023

* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage

45927d55

[shardformer] Unit test (#3928) · a7313048

FoolPlayer authored Jun 12, 2023

* fix bug in slicer, add slicer unit test

* add dropout test

* use pid as dropout seed

* updata dropout test with local pattern

* ad todo

a7313048

[shardformer] Align bert value (#3907) · f1cb5ac6

FoolPlayer authored Jun 09, 2023

* add bert align test, fix dist loss bug

* forward and backward align

* add ignore index

* add shardformer CI

* add gather_output optional for user in shardconfig

* update readme with optional gather_ouput

* add dist crossentropy loss test, remove unused files

* remove unused file

* remove unused file

* rename the file

* polish code

f1cb5ac6

[shardformer] add gpt2 policy and modify shard and slicer to support (#3883) · 79f8d5d5
FoolPlayer authored Jun 07, 2023
```
* add gpt2 policy and modify shard and slicer to support

* remove unused code

* polish code
```
79f8d5d5
update README (#3909) · 70173e31
FoolPlayer authored Jun 06, 2023

70173e31

[shardformer] add Dropout layer support different dropout pattern (#3856) · ab8a47f8

FoolPlayer authored Jun 01, 2023

* add dropout layer, add dropout test

* modify seed manager as context manager

* add a copy of col_nn.layer

* add dist_crossentropy loss; separate module test

* polish the code

* fix dist crossentropy loss

ab8a47f8

[shardformer] update readme with modules implement doc (#3834) · c594dc2f
FoolPlayer authored May 24, 2023
```
* update readme with modules content

* remove img
```
c594dc2f
[shardformer] refactored the user api (#3828) · 4972e1f4
Frank Lee authored May 24, 2023
```
* [shardformer] refactored the user api

* polish code
```
4972e1f4
[shardformer] updated readme (#3827) · 235792f1
Frank Lee authored May 24, 2023

235792f1

[shardformer]: Feature/shardformer, add some docstring and readme (#3816) · 8cc11235

FoolPlayer authored May 24, 2023

* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

* add share weight and train example

* add train

* add docstring and readme

* add docstring for other files

* pre-commit

8cc11235

[shardformer] init shardformer code structure (#3731) · 8d68de76

FoolPlayer authored May 22, 2023

* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

8d68de76

[chat] use official transformers and fix some issues (#4117) · 3d8d5d0d

Wenhao Chen authored Jul 04, 2023

* feat: remove on_learn_epoch fn as not used

* revert: add _on_learn_epoch fn

* feat: remove NaiveStrategy

* test: update train_prompts tests

* fix: remove prepare_llama_tokenizer_and_embedding

* test: add lora arg

* feat: remove roberta support in train_prompts due to runtime errs

* feat: remove deberta & roberta in rm as not used

* test: remove deberta and roberta tests

* feat: remove deberta and roberta models as not used

* fix: remove calls to roberta

* fix: remove prepare_llama_tokenizer_and_embedding

* chore: update transformers version

* docs: update transformers version

* fix: fix actor inference

* fix: fix ci

* feat: change llama pad token to unk

* revert: revert ddp setup_distributed

* fix: change llama pad token to unk

* revert: undo unnecessary changes

* fix: use pip to install transformers

3d8d5d0d

03 Jul, 2023 3 commits
- [hotfix] fix import bug in checkpoint_io (#4142) · 1350ece4
  Baizhou Zhang authored Jul 03, 2023
  
  1350ece4
- fix Tensor is not defined (#4129) · 8abc8779
  digger yu authored Jul 03, 2023
  
  8abc8779
- fix CheckpointIndexFile is not defined (#4109) · 7e46bc87
  digger yu authored Jul 03, 2023
  
  7e46bc87
30 Jun, 2023 1 commit
- [nfc]fix ColossalaiOptimizer is not defined (#4122) · 09fe9dc7
  digger yu authored Jun 30, 2023
  
  09fe9dc7
29 Jun, 2023 2 commits

[chat] remove naive strategy and split colossalai strategy (#4094) · edd75a59

Wenhao Chen authored Jun 29, 2023

* feat: remove on_learn_epoch fn as not used

* revert: add _on_learn_epoch fn

* to: remove the use of NaiveStrategy

* test: remove NaiveStrategy tests

* feat: remove NaiveStrategy

* style: modify comments and params

* feat: split ColossalAIStrategy into LowLevelZeroStrategy and GeminiStrategy

* fix: remove naive

* fix: align with modified colossal strategy

* fix: fix ddp _try_init_dist arg

edd75a59

[chat] refactor trainer class (#4080) · b03d64d0

Wenhao Chen authored Jun 29, 2023

* to: add SLTrainer

* refactor: refactor RMTrainer and SFTTrainer

* fix: fix init file

* feat: remove on_learn_epoch fn as not used

* fix: align with modified gemini arguments

* to: add OnPolicyTrainer

* revert: add _on_learn_epoch fn

* refactor: refactor PPOTrainer

* style: rename PPOTrainer argument

* fix: align with modified PPO arguments

* test: align with modified train_prompts arguments

* chore: modify train_prompts

* docs: align with modified arguments

* fix: remove unnecessary output

* fix: move dataloader to fit fn of SLTrainer

* fix: move dataloader to fit fn of OnPolicyTrainer

* fix: modify usage of prompt and pretrain dataloader

b03d64d0

28 Jun, 2023 4 commits
- [doc] update and revise some typos and errs in docs (#4107) · 711e2b4c
  Jianghai authored Jun 28, 2023
```
* fix some typos and problems in doc

* fix some typos and problems in doc

* add doc test
```
  711e2b4c
- fix typo docs/ (#4033) · 769cddcb
  digger yu authored Jun 28, 2023
  
  769cddcb
- fix #3852 path error (#4058) · 2d40759a
  digger yu authored Jun 28, 2023
  
  2d40759a
- [workflow] added status check for test coverage workflow (#4106) · 1ee947f6
  Frank Lee authored Jun 28, 2023
  
  1ee947f6
27 Jun, 2023 2 commits
- [examples] copy resnet example to image (#4090) · 31dc3020
  Jianghai authored Jun 27, 2023
```
* copy resnet example

* add pytest package

* skip test_ci

* skip test_ci

* skip test_ci
```
  31dc3020
- [testing] move pytest to be inside the function (#4087) · 95e95b6d
  Frank Lee authored Jun 27, 2023
  
  95e95b6d
26 Jun, 2023 2 commits
- [hotfix]fix argument naming in docs and examples (#4083) · 4da324cd
  Baizhou Zhang authored Jun 26, 2023
  
  4da324cd
- [chat]: fix chat evaluation possible bug (#4064) · e89b127d
  Michelle authored Jun 26, 2023
```
* fix chat eval

* fix utils

* fix utils

* add comment

---------
Co-authored-by: Qianran Ma <qianranm@luchentech.com>
```
  e89b127d
25 Jun, 2023 3 commits

Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching · 2c8ae37f
Baizhou Zhang authored Jun 25, 2023
```
[gemini] Rename arguments in chunk configuration searching
```
2c8ae37f

[chat] refactor strategy class with booster api (#3987) · 153b957a

Wenhao Chen authored Jun 25, 2023

* refactor: adapt boost API in base and naive strategies

* fix: initialize plugin after setup_distributed

* fix: fix save_pretrained fn

* refactor: adapt boost API in DDPStrategy

* to: add _post_init check

* to: fix ddp backward, modify ddp dataloader and unwrap

* feat: adapt boost API in ColossalAIStrategy

* fix: call setup_distributed before use get_current_device

* fix: fix save_model and save_optimizer

* test: remove save_sharded_optimizer test

* style: apply formatter

* fix: fix stage check and add comments

* feat: allow dict type arg in strategy.prepare

* to: temporarily remove lr_scheduler for testing

* style: simplify init of ColossalAIStrategy

* fix: fix lr_scheduler in sft and rm

* style: modify comments

* test: add train_prompts tests

* fix: fix inference only case and use in train_prompts

* test: skip failed tests in ci

* style: fix CodeFactor check

* fix: do not use model.to('cpu') with GeminiPlugin

* test: enable colossalai_gemini tests

* test: set CUDA_VISIBLE_DEVICES in ci

* docs: add note

153b957a

[gemini] fix argument naming during chunk configuration searching · 0bb0b481
Baizhou Zhang authored Jun 25, 2023

0bb0b481

22 Jun, 2023 1 commit
- [workflow] cover all public repositories in weekly report (#4069) · b463651f
  Frank Lee authored Jun 22, 2023
  
  b463651f
19 Jun, 2023 5 commits
- [devops] fix build on pr ci (#4043) · 4a81faa5
  Hongxin Liu authored Jun 19, 2023
```
* [devops] fix build on pr ci

* [devops] fix build on pr ci
```
  4a81faa5
- [format] applied code formatting on changed files in pull request 4021 (#4022) · a52f6208
  github-actions[bot] authored Jun 19, 2023
```
Co-authored-by: github-actions <github-actions@github.com>
```
  a52f6208
- [example] fix bucket size in example of gpt gemini (#4028) · 160c64c6
  LuGY authored Jun 19, 2023
  
  160c64c6
- [nfc] fix dim not defined and fix typo (#3991) · 727c4598
  digger yu authored Jun 19, 2023
  
  727c4598
- Merge pull request #4025 from hpcaitech/develop · ca768eb6
  Frank Lee authored Jun 19, 2023
```
[sync] sync develop to main
```
  ca768eb6
16 Jun, 2023 2 commits
- [test] fixed codefactor format report (#4026) · a5883aa7
  Frank Lee authored Jun 16, 2023
  
  a5883aa7
- [checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) · 822c3d4d
  Baizhou Zhang authored Jun 16, 2023
  
  822c3d4d
15 Jun, 2023 3 commits
- [booster] make optimizer argument optional for boost (#3993) · 725af3ee
  Wenhao Chen authored Jun 15, 2023
```
* feat: make optimizer optional in Booster.boost

* test: skip unet test if diffusers version > 0.10.2
```
  725af3ee
- [checkpointio] General Checkpointing of Sharded Optimizers (#3984) · c9cff7e7
  Baizhou Zhang authored Jun 15, 2023
  
  c9cff7e7
- fix typo applications/Chat/coati/ (#3947) · d4fb7bfd
  digger yu authored Jun 15, 2023
  
  d4fb7bfd