Commits · da3cef27adcb71847ca59519324ac96b55f7abec · OpenDAS / ColossalAI

15 Aug, 2023 40 commits

[pipeline] fix return_dict/fix pure_pipeline_test (#4331) · da3cef27
Baizhou Zhang authored Jul 27, 2023

da3cef27

[hotfix] fix gemini and zero test (#4333) · 411cf1d2

Hongxin Liu authored Jul 27, 2023

* [hotfix] fix gemini and zero test

* [hotfix] fix lazy init test

* [hotfix] fix lazy init test

411cf1d2

[plugin] add 3d parallel plugin (#4295) · 261eab02

Hongxin Liu authored Jul 26, 2023

* [amp] add mixed precision optimizer

* [plugin] add 3d parallel plugin

* [booster] support pipeline

* [plugin] 3d parallel plugin support clip grad norm

* [shardformer] fix sharder and add plugin test

* [plugin] rename 3d parallel plugin

* [ci] support testmon core pkg change detection (#4305)

* [hotfix] debug testmon

* [hotfix] fix llama

* [hotfix] fix p2p bugs

* [hotfix] fix requirements

261eab02

[shardformer] support pipeline base vit model (#4284) · b3f5d7a3

FoolPlayer authored Jul 25, 2023



* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* support base vit pipeline

* support vit downstream model

* fix vit shard test

* modify hidden states return type

---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>

b3f5d7a3

[pipeline] add pipeline support for all T5 models (#4310) · 083d7da3

Baizhou Zhang authored Jul 25, 2023

* complete policy for T5Model & T5ForConditionalGeneration

* modify function signature in forwards

* add forward for T5model

* add forward for T5ForConditionalGeneration

* fix a bug

* fix hidden_states transporting in decoder

* fix the passing of encoder_outputs

083d7da3

[pipeline] test pure pipeline process using llama (#4218) · d0807122

Jianghai authored Jul 25, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* fixed version

* fixed version

* pure pipeline

d0807122

[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2

Baizhou Zhang authored Jul 21, 2023

* modify t5 policy & add test

* pipeline stage distribution for t5

* complete t5 base policy

* t5 stack: halfway

* modify gpt2 pipeline test

* complete pipeline forward for T5Stack/T5EncoderModel

* fix docstring

* move t5 util tests to test_pipeline

36e546b2

[pipeline] reformat for unified design (#4283) · 18ebcf40
Jianghai authored Jul 21, 2023
```
* bert_reformat

* reformat

* reformat

* fix a typo

* format

* format

* fix bug
```
18ebcf40

[hotfix] fix opt pipeline (#4293) · 0a8f3c85

Jianghai authored Jul 20, 2023

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

* fix bug

0a8f3c85

[pipeline] OPT model pipeline (#4258) · d8408d18

Jianghai authored Jul 20, 2023

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

d8408d18

[pipeline] refactor gpt2 pipeline forwards (#4287) · b774d5ea

Baizhou Zhang authored Jul 20, 2023

* move gpt2 pipeline forwards to modeling folder

* check pipeline status when adding replacing policy

* fix typehint

* fix arguments processing in gpt2_model_forward

b774d5ea

[shardformer] support inplace sharding (#4251) · d921ce83

Hongxin Liu authored Jul 20, 2023

* [shardformer] embedding support inplace sharding

* [shardformer] linear support inplace sharding

* [shardformer] layernorm support inplace sharding

* [shardformer] qkv support inplace sharding

* [test] update shardformer layer test

* [shardformer] fix shared param sharding

* [shardformer] fix bert policy

* [shardformer] fix bloom policy

* [shardformer] fix llama policy

* [shardformer] fix opt policy

* [shardformer] fix t5 policy

* [shardformer] fix fused qkv linear

* [shardformer] fix bugs

* force sync

* [test] fix bugs

* [test] fix transformer version

d921ce83

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete... · 2a2eacfa

Baizhou Zhang authored Jul 19, 2023

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)

* change for transformers loggers

* add forward for GPT2ForQuestionAnswering

* fix assert

* fix torchrec test

2a2eacfa

[bugs] hot fix some testing bugs for new models (#4268) · d9be0472
Jianghai authored Jul 18, 2023
```
* hot fix

* hot fx tracer
```
d9be0472

[pipeline] finish bloom models pipeline and tests (#4223) · 34f0e34a

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

* support all bloom models

* add bloom models policies

* finish bloom pipeline and tests

* add set pipeline

* finish bloom

34f0e34a

[pipeline] All bert models (#4233) · e7cc62d7

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* finish some bert models

* finish all bert models

* finish bert tests

* fix bugs

* fix bugs

* fix test pipeline

* fix data gen for qa

* update the set pipeline forward

* shared params

* fix bugs

e7cc62d7

[pipeline] add pipeline forward for variants of gpt2 (#4238) · a14d3520

Baizhou Zhang authored Jul 17, 2023

* add forward for GPTLMHeadModel

* add test for gpt_lm

* arranging get_held_layers method

* arrange forward replacement

* add forward for GPT2ForTokenClassification

* add forward for GPT2ForSequenceClassification

* fix test_shard_gpt2.py

* add GPT2DoubleHeadsmodel & fix bugs

* add id checking in get_shared_params

a14d3520

[shardformer] fix base policy (#4229) · 7e4de520
Hongxin Liu authored Jul 14, 2023

7e4de520

[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) · 208ac8f2

Baizhou Zhang authored Jul 13, 2023

* * fix typehint & docstring in sharder.py

* * update pipeline forward for GPT2Model

* * add test for pipeline forward of GPT2Model

* * add cache cleaning in gpt2 test

* * change assert to raise command

208ac8f2

[pipeline] add bloom model pipeline (#4210) · 37d22f68

Jianghai authored Jul 13, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

37d22f68

[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) · 31bcf867

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

31bcf867

[pipeline] Llama pipeline (#4205) · 16220310

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

16220310

[pipeline] Bert pipeline for shardformer and its tests (#4197) · 1094e0f0

Jianghai authored Jul 10, 2023

* add pipeline forward

* complete pipeline forward check

* fix bert forward without pipeline

* fix comments

* discard useless line

* add todo

* clean prints

* fix distribute layers

1094e0f0

[shardformer] support lazy init (#4202) · 890774b2

Hongxin Liu authored Jul 10, 2023

* [shardformer] support lazy init

* [shardformer] linear support lazy init

* [shardformer] embedding support lazy init

* [shardformer] norm support lazy init

* [shardformer] fused linear support lazy init

* [test] update shardformer test layer

* [test] shardformer with lazy init fit ddp

* [lazy] hotfix deepcopy of param

* [shardformer] fix bert policy and update test

* [shardformer] fix bloom policy and update test

* [shardformer] fix opt policy and update test

* [shardformer] fix t5 policy and update test

* [shardformer] fix gpt2 policy and update test

* [shardformer] fix llama policy and update test

890774b2

[pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292

Jianghai authored Jul 07, 2023

* move bert related pipeline components to shardformer

* fix bugs

* revision

* fix bert model tests

* fix bert_lm_head model tests

* fix tests

* fix tests

* done checks

* skip bloom

f3bcc292

[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) · c5ea7280

Jianghai authored Jul 06, 2023

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params

c5ea7280

[shardformer] fix type hint · d35bd7d0
ver217 authored Jul 05, 2023

d35bd7d0
[shardformer] rename policy file name · 1ed3f8a2
ver217 authored Jul 05, 2023

1ed3f8a2
[test] add shard util tests · 5fc60a3a
ver217 authored Jul 05, 2023

5fc60a3a
[test] update shardformer tests · 2d6cc07f
ver217 authored Jul 05, 2023

2d6cc07f
[pipeline] update shardformer docstring · b0b8ad28
ver217 authored Jul 05, 2023

b0b8ad28
[pipeline] update shardformer policy · 59f6f573
ver217 authored Jul 05, 2023

59f6f573

[pipeline] build bloom model and policy , revise the base class of policy (#4161) · 90a65ea6

Jianghai authored Jul 05, 2023

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

90a65ea6

[pipeline]add pipeline policy and bert forward (#4130) · c552cefa

Jianghai authored Jul 04, 2023

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

c552cefa

[pipeline] add stage manager (#4093) · 5c897ddb

Hongxin Liu authored Jun 27, 2023

* [pipeline] add stage manager

* [test] add pipeline stage manager test

* [pipeline] add docstring for stage manager

5c897ddb

[pipeline]add pipeline policy and bert forward (#4130) · e8e7e492

Jianghai authored Jul 04, 2023

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

e8e7e492

[pipeline] refactor 1f1b schedule (#4115) · f51ce1bc

Hongxin Liu authored Jun 29, 2023

* [api] update optimizer wrapper to fit pipeline

* [pipeline] add base schedule

* [pipeline] add 1f1b schedule

* [test] add pipeline schedule utils test

* [pipeline] fix import

f51ce1bc

[pipeline] implement p2p communication (#4100) · 45fdc9b4

Hongxin Liu authored Jun 28, 2023

* [pipeline] add p2p communication

* [test] add p2p communication test

* [test] add rerun decorator

* [test] rename to avoid conflict

45fdc9b4

[pipeline] add stage manager (#4093) · 42254422

Hongxin Liu authored Jun 27, 2023

* [pipeline] add stage manager

* [test] add pipeline stage manager test

* [pipeline] add docstring for stage manager

42254422

[cluster] add process group mesh (#4039) · 5e1a9d48
Hongxin Liu authored Jun 20, 2023
```
* [cluster] add process group mesh

* [test] add process group mesh test

* force sync
```
5e1a9d48