Commits · 92f6791095491e44c5712e14f00f2e19b52dc9f6 · OpenDAS / ColossalAI

04 Jul, 2023 36 commits

[shardformer] Add layernorm (#4072) · 92f67910

FoolPlayer authored Jun 23, 2023

* add layernorm to bert

* add layernorm test

* add layernorm test with load state dict

* add use_mixedfusedLN in shard config

* refactor policy to support fused_layernorm

92f67910

[shardformer] supported fused qkv checkpoint (#4073) · 70c58cfd
Frank Lee authored Jun 23, 2023

70c58cfd
[shardformer] add linearconv1d test (#4067) · 0803a614
FoolPlayer authored Jun 22, 2023
```
* add linearconv1d test

* add linearconv1d test
```
0803a614
[shardformer] support module saving and loading (#4062) · 8eb09a4c
Frank Lee authored Jun 22, 2023
```
* [shardformer] support module saving and loading

* polish code
```
8eb09a4c
support kit use for bert/gpt test (#4055) · 7740c55c
FoolPlayer authored Jun 22, 2023
```
* support kit use for bert test

* support kit test for gpt2
```
7740c55c
[shardformer] refactored the shardformer layer structure (#4053) · f22ddace
Frank Lee authored Jun 21, 2023

f22ddace
[shardformer] adapted T5 and LLaMa test to use kit (#4049) · 58df7205
Frank Lee authored Jun 21, 2023
```
* [shardformer] adapted T5 and LLaMa test to use kit

* polish code
```
58df7205
[shardformer] add gpt2 test and layer class refactor (#4041) · 4021b9a8
FoolPlayer authored Jun 20, 2023
```
* add gpt2 test and layer class refactor

* add dropout in gpt2 policy
```
4021b9a8
[shardformer] supported T5 and its variants (#4045) · d857f3db
Frank Lee authored Jun 19, 2023

d857f3db
[shardformer] adapted llama to the new API (#4036) · c1d5453e
Frank Lee authored Jun 19, 2023

c1d5453e
[shardformer] fix bert and gpt downstream with new api (#4024) · 74d176c8
FoolPlayer authored Jun 19, 2023
```
* fix bert downstream with new api

* remove comment line
```
74d176c8
[shardformer] updated doc (#4016) · e253a070
Frank Lee authored Jun 16, 2023

e253a070
support bert with new api · df018fc3
FoolPlayer authored Jun 16, 2023

df018fc3
add vocabembedding layer · 507c0ad3
FoolPlayer authored Jun 16, 2023

507c0ad3
[shardformer] removed inplace tensor sharding (#4018) · 45d93843
Frank Lee authored Jun 16, 2023

45d93843
[shardformer] refactored embedding and dropout to parallel module (#4013) · 3893fa1a
Frank Lee authored Jun 16, 2023
```
* [shardformer] refactored embedding and dropout to parallel module

* polish code
```
3893fa1a
integrate with dist layer (#4011) · dfca9678
FoolPlayer authored Jun 16, 2023

dfca9678
[shardformer] integrated linear 1D with dtensor (#3996) · 015af592
Frank Lee authored Jun 15, 2023
```
* [shardformer] integrated linear 1D with dtensor

* polish code
```
015af592

[shardformer] Refactor shardformer api (#4001) · d3bc5308

FoolPlayer authored Jun 15, 2023

* fix an error in readme

* simplify code

* refactor shardformer

* add todo

* remove slicer

* resolve code review

d3bc5308

[device] support init device mesh from process group (#3990) · 61197124
Frank Lee authored Jun 15, 2023

61197124
[shardformer] fix an error in readme (#3988) · a2f9af81
FoolPlayer authored Jun 15, 2023
```
* fix an error in readme

* simplify code
```
a2f9af81

[Shardformer] Downstream bert (#3979) · f7774ec0

FoolPlayer authored Jun 15, 2023

* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage

* add downstream model of bert

* remove unused code

f7774ec0

[shardformer] shardformer support t5 model (#3994) · c1c672d0
wukong1992 authored Jun 15, 2023
```
test t5
```
c1c672d0
[shardformer] support llama model using shardformer (#3969) · 6b30dfb7
wukong1992 authored Jun 13, 2023
```
adjust layer attr
```
6b30dfb7

[shardformer] Add dropout layer in shard model and refactor policy api (#3949) · 45927d55

FoolPlayer authored Jun 12, 2023

* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage

45927d55

[shardformer] Unit test (#3928) · a7313048

FoolPlayer authored Jun 12, 2023

* fix bug in slicer, add slicer unit test

* add dropout test

* use pid as dropout seed

* updata dropout test with local pattern

* ad todo

a7313048

[shardformer] Align bert value (#3907) · f1cb5ac6

FoolPlayer authored Jun 09, 2023

* add bert align test, fix dist loss bug

* forward and backward align

* add ignore index

* add shardformer CI

* add gather_output optional for user in shardconfig

* update readme with optional gather_ouput

* add dist crossentropy loss test, remove unused files

* remove unused file

* remove unused file

* rename the file

* polish code

f1cb5ac6

[shardformer] add gpt2 policy and modify shard and slicer to support (#3883) · 79f8d5d5
FoolPlayer authored Jun 07, 2023
```
* add gpt2 policy and modify shard and slicer to support

* remove unused code

* polish code
```
79f8d5d5
update README (#3909) · 70173e31
FoolPlayer authored Jun 06, 2023

70173e31

[shardformer] add Dropout layer support different dropout pattern (#3856) · ab8a47f8

FoolPlayer authored Jun 01, 2023

* add dropout layer, add dropout test

* modify seed manager as context manager

* add a copy of col_nn.layer

* add dist_crossentropy loss; separate module test

* polish the code

* fix dist crossentropy loss

ab8a47f8

[shardformer] update readme with modules implement doc (#3834) · c594dc2f
FoolPlayer authored May 24, 2023
```
* update readme with modules content

* remove img
```
c594dc2f
[shardformer] refactored the user api (#3828) · 4972e1f4
Frank Lee authored May 24, 2023
```
* [shardformer] refactored the user api

* polish code
```
4972e1f4
[shardformer] updated readme (#3827) · 235792f1
Frank Lee authored May 24, 2023

235792f1

[shardformer]: Feature/shardformer, add some docstring and readme (#3816) · 8cc11235

FoolPlayer authored May 24, 2023

* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

* add share weight and train example

* add train

* add docstring and readme

* add docstring for other files

* pre-commit

8cc11235

[shardformer] init shardformer code structure (#3731) · 8d68de76

FoolPlayer authored May 22, 2023

* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

8d68de76

[chat] use official transformers and fix some issues (#4117) · 3d8d5d0d

Wenhao Chen authored Jul 04, 2023

* feat: remove on_learn_epoch fn as not used

* revert: add _on_learn_epoch fn

* feat: remove NaiveStrategy

* test: update train_prompts tests

* fix: remove prepare_llama_tokenizer_and_embedding

* test: add lora arg

* feat: remove roberta support in train_prompts due to runtime errs

* feat: remove deberta & roberta in rm as not used

* test: remove deberta and roberta tests

* feat: remove deberta and roberta models as not used

* fix: remove calls to roberta

* fix: remove prepare_llama_tokenizer_and_embedding

* chore: update transformers version

* docs: update transformers version

* fix: fix actor inference

* fix: fix ci

* feat: change llama pad token to unk

* revert: revert ddp setup_distributed

* fix: change llama pad token to unk

* revert: undo unnecessary changes

* fix: use pip to install transformers

3d8d5d0d

03 Jul, 2023 3 commits
- [hotfix] fix import bug in checkpoint_io (#4142) · 1350ece4
  Baizhou Zhang authored Jul 03, 2023
  
  1350ece4
- fix Tensor is not defined (#4129) · 8abc8779
  digger yu authored Jul 03, 2023
  
  8abc8779
- fix CheckpointIndexFile is not defined (#4109) · 7e46bc87
  digger yu authored Jul 03, 2023
  
  7e46bc87
30 Jun, 2023 1 commit
- [nfc]fix ColossalaiOptimizer is not defined (#4122) · 09fe9dc7
  digger yu authored Jun 30, 2023
  
  09fe9dc7