Commits · a88e92251df546dc71f2ec3cd351487319a53577 · OpenDAS / ColossalAI

15 Aug, 2023 40 commits

[pipeline] add chatglm (#4363) · a88e9225

Jianghai authored Aug 04, 2023

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params

* add chatglm

* add

* chatglm

* chatglm

* finish chatglm

* deletes

* fix rmsnorm

* chatglm

* fix chatglm shard

* init

a88e9225

[shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) · b1feeced
Baizhou Zhang authored Aug 03, 2023
```
* add util functions for shardformer tests & rewrite gpt2 test

* fix shared_params & embedding/merging

* fix precision
```
b1feeced

[test] Hotfix/fix some model test and refactor check util api (#4369) · 5c6f1831

Bin Jia authored Aug 03, 2023

* fix llama test

* fix test bug of bert, blip2, bloom, gpt2

* fix llama test

* fix opt test

* fix sam test

* fix sam test

* fix t5 test

* fix vit test

* fix whisper test

* fix whisper test

* polish code

* adjust allclose parameter

* Add mistakenly deleted code

* addjust allclose

* change loss function for some base model

5c6f1831

[test] skip some not compatible models · c3ca53cf
FoolPlayer authored Aug 02, 2023

c3ca53cf
update some module with new api version · 726541af
FoolPlayer authored Aug 01, 2023

726541af

[shardformer] support Blip2 (#4243) · 879301d0

FoolPlayer authored Jul 25, 2023

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

879301d0

[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit · 8120eca0
klhhhhh authored Jul 20, 2023

8120eca0
[shardformer] pre-commit check files · 4da05052
klhhhhh authored Jul 19, 2023

4da05052
[shardformer] register without auto policy · 91850fe9
klhhhhh authored Jul 18, 2023

91850fe9
[shardformer] ChatGLM support layernorm sharding · f155ae89
klhhhhh authored Jul 17, 2023

f155ae89
[shardformer] delete some file · 00f6ef15
klhhhhh authored Jul 17, 2023

00f6ef15
[shardformer] support chatglm without layernorm · dad00c42
klhhhhh authored Jul 14, 2023

dad00c42
[shardformer] polish code · cbb54d32
klhhhhh authored Jul 13, 2023

cbb54d32
[shardformer] polish chatglm code · 1a29e8fc
klhhhhh authored Jul 12, 2023

1a29e8fc
[sharformer] add first version of policy of chatglm · 8620009d
klhhhhh authored Jul 10, 2023

8620009d
[shardformer] add test kit in model zoo for chatglm · 6ee4c9ee
klhhhhh authored Jul 07, 2023

6ee4c9ee
import chatglm · 7377be7a
klhhhhh authored Jul 07, 2023

7377be7a
[shardformer] vit test finish and support · c4928698
klhhhhh authored Jul 06, 2023

c4928698
[shardformer] added tests · f60162b2
klhhhhh authored Jul 04, 2023

f60162b2

Feature/chatglm (#4240) · ed34bb13

Kun Lin authored Jul 20, 2023

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

ed34bb13

[shardformer] support whisper (#4212) · 9ee4ebea

FoolPlayer authored Jul 17, 2023

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

9ee4ebea

[shardformer] support SAM (#4231) · dd2bf026

FoolPlayer authored Jul 14, 2023

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

dd2bf026

Feature/vit support (#4182) · c59d7aca

Kun Lin authored Jul 07, 2023

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

c59d7aca

[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline... · 0ceec8f9

Baizhou Zhang authored Aug 01, 2023

[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)

* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test

* merge tests of PP/DP/TP combinations into one test file

* fix bug when sync grad for dp in HybridPlugin

* update supported precisions for 3DPlugin/fix bug when shifting tp_degree

* improve the passing of lazy_init

* modify lazy_init/use sync_shared_params

0ceec8f9

[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) · f13954cd

Jianghai authored Aug 01, 2023

* refactor tests

* refactor bloom model

* finish policy tests

* refactor tests

* fix test pure pipeline

* remove test pipeline and cutdown launch process

* refactor tests

* refactor bloom model

* finish policy tests

* refactor tests

* fix test pure pipeline

* remove test pipeline and cutdown launch process

f13954cd

[pipeline] add unit test for 1f1b (#4303) · d3c6cd66
LuGY authored Jul 31, 2023
```
* add unit test for 1f1b

* polish code

* polish code and update ut version

* fix
```
d3c6cd66
[pipeline] fix return_dict/fix pure_pipeline_test (#4331) · da3cef27
Baizhou Zhang authored Jul 27, 2023

da3cef27

[hotfix] fix gemini and zero test (#4333) · 411cf1d2

Hongxin Liu authored Jul 27, 2023

* [hotfix] fix gemini and zero test

* [hotfix] fix lazy init test

* [hotfix] fix lazy init test

411cf1d2

[plugin] add 3d parallel plugin (#4295) · 261eab02

Hongxin Liu authored Jul 26, 2023

* [amp] add mixed precision optimizer

* [plugin] add 3d parallel plugin

* [booster] support pipeline

* [plugin] 3d parallel plugin support clip grad norm

* [shardformer] fix sharder and add plugin test

* [plugin] rename 3d parallel plugin

* [ci] support testmon core pkg change detection (#4305)

* [hotfix] debug testmon

* [hotfix] fix llama

* [hotfix] fix p2p bugs

* [hotfix] fix requirements

261eab02

[shardformer] support pipeline base vit model (#4284) · b3f5d7a3

FoolPlayer authored Jul 25, 2023



* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* support base vit pipeline

* support vit downstream model

* fix vit shard test

* modify hidden states return type

---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>

b3f5d7a3

[pipeline] add pipeline support for all T5 models (#4310) · 083d7da3

Baizhou Zhang authored Jul 25, 2023

* complete policy for T5Model & T5ForConditionalGeneration

* modify function signature in forwards

* add forward for T5model

* add forward for T5ForConditionalGeneration

* fix a bug

* fix hidden_states transporting in decoder

* fix the passing of encoder_outputs

083d7da3

[pipeline] test pure pipeline process using llama (#4218) · d0807122

Jianghai authored Jul 25, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* fixed version

* fixed version

* pure pipeline

d0807122

[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2

Baizhou Zhang authored Jul 21, 2023

* modify t5 policy & add test

* pipeline stage distribution for t5

* complete t5 base policy

* t5 stack: halfway

* modify gpt2 pipeline test

* complete pipeline forward for T5Stack/T5EncoderModel

* fix docstring

* move t5 util tests to test_pipeline

36e546b2

[pipeline] reformat for unified design (#4283) · 18ebcf40
Jianghai authored Jul 21, 2023
```
* bert_reformat

* reformat

* reformat

* fix a typo

* format

* format

* fix bug
```
18ebcf40

[hotfix] fix opt pipeline (#4293) · 0a8f3c85

Jianghai authored Jul 20, 2023

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

* fix bug

0a8f3c85

[pipeline] OPT model pipeline (#4258) · d8408d18

Jianghai authored Jul 20, 2023

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

d8408d18

[pipeline] refactor gpt2 pipeline forwards (#4287) · b774d5ea

Baizhou Zhang authored Jul 20, 2023

* move gpt2 pipeline forwards to modeling folder

* check pipeline status when adding replacing policy

* fix typehint

* fix arguments processing in gpt2_model_forward

b774d5ea

[shardformer] support inplace sharding (#4251) · d921ce83

Hongxin Liu authored Jul 20, 2023

* [shardformer] embedding support inplace sharding

* [shardformer] linear support inplace sharding

* [shardformer] layernorm support inplace sharding

* [shardformer] qkv support inplace sharding

* [test] update shardformer layer test

* [shardformer] fix shared param sharding

* [shardformer] fix bert policy

* [shardformer] fix bloom policy

* [shardformer] fix llama policy

* [shardformer] fix opt policy

* [shardformer] fix t5 policy

* [shardformer] fix fused qkv linear

* [shardformer] fix bugs

* force sync

* [test] fix bugs

* [test] fix transformer version

d921ce83

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete... · 2a2eacfa

Baizhou Zhang authored Jul 19, 2023

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)

* change for transformers loggers

* add forward for GPT2ForQuestionAnswering

* fix assert

* fix torchrec test

2a2eacfa

[bugs] hot fix some testing bugs for new models (#4268) · d9be0472
Jianghai authored Jul 18, 2023
```
* hot fix

* hot fx tracer
```
d9be0472