Commits · cbb54d3202c6935edf11e481fc43929c410fdf1a · OpenDAS / ColossalAI

15 Aug, 2023 40 commits

[shardformer] polish code · cbb54d32
klhhhhh authored Jul 13, 2023

cbb54d32
[shardformer] polish chatglm code · 1a29e8fc
klhhhhh authored Jul 12, 2023

1a29e8fc
[sharformer] add first version of policy of chatglm · 8620009d
klhhhhh authored Jul 10, 2023

8620009d
[shardformer] add test kit in model zoo for chatglm · 6ee4c9ee
klhhhhh authored Jul 07, 2023

6ee4c9ee
import chatglm · 7377be7a
klhhhhh authored Jul 07, 2023

7377be7a
[shardformer] vit test finish and support · c4928698
klhhhhh authored Jul 06, 2023

c4928698
[shardformer] added tests · f60162b2
klhhhhh authored Jul 04, 2023

f60162b2

Kun Lin authored Jul 20, 2023

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

ed34bb13

[shardformer] support whisper (#4212) · 9ee4ebea

FoolPlayer authored Jul 17, 2023

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

9ee4ebea

[shardformer] support SAM (#4231) · dd2bf026

FoolPlayer authored Jul 14, 2023

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

dd2bf026

Feature/vit support (#4182) · c59d7aca

Kun Lin authored Jul 07, 2023

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

c59d7aca

[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline... · 0ceec8f9

Baizhou Zhang authored Aug 01, 2023

[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)

* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test

* merge tests of PP/DP/TP combinations into one test file

* fix bug when sync grad for dp in HybridPlugin

* update supported precisions for 3DPlugin/fix bug when shifting tp_degree

* improve the passing of lazy_init

* modify lazy_init/use sync_shared_params

0ceec8f9

[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) · f13954cd

Jianghai authored Aug 01, 2023

* refactor tests

* refactor bloom model

* finish policy tests

* refactor tests

* fix test pure pipeline

* remove test pipeline and cutdown launch process

* refactor tests

* refactor bloom model

* finish policy tests

* refactor tests

* fix test pure pipeline

* remove test pipeline and cutdown launch process

f13954cd

[pipeline] add unit test for 1f1b (#4303) · d3c6cd66
LuGY authored Jul 31, 2023
```
* add unit test for 1f1b

* polish code

* polish code and update ut version

* fix
```
d3c6cd66
[pipeline] fix return_dict/fix pure_pipeline_test (#4331) · da3cef27
Baizhou Zhang authored Jul 27, 2023

da3cef27

[hotfix] fix gemini and zero test (#4333) · 411cf1d2

Hongxin Liu authored Jul 27, 2023

* [hotfix] fix gemini and zero test

* [hotfix] fix lazy init test

* [hotfix] fix lazy init test

411cf1d2

[plugin] add 3d parallel plugin (#4295) · 261eab02

Hongxin Liu authored Jul 26, 2023

* [amp] add mixed precision optimizer

* [plugin] add 3d parallel plugin

* [booster] support pipeline

* [plugin] 3d parallel plugin support clip grad norm

* [shardformer] fix sharder and add plugin test

* [plugin] rename 3d parallel plugin

* [ci] support testmon core pkg change detection (#4305)

* [hotfix] debug testmon

* [hotfix] fix llama

* [hotfix] fix p2p bugs

* [hotfix] fix requirements

261eab02

[shardformer] support pipeline base vit model (#4284) · b3f5d7a3

FoolPlayer authored Jul 25, 2023



* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* support base vit pipeline

* support vit downstream model

* fix vit shard test

* modify hidden states return type

---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>

b3f5d7a3

[pipeline] add pipeline support for all T5 models (#4310) · 083d7da3

Baizhou Zhang authored Jul 25, 2023

* complete policy for T5Model & T5ForConditionalGeneration

* modify function signature in forwards

* add forward for T5model

* add forward for T5ForConditionalGeneration

* fix a bug

* fix hidden_states transporting in decoder

* fix the passing of encoder_outputs

083d7da3

[pipeline] test pure pipeline process using llama (#4218) · d0807122

Jianghai authored Jul 25, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* fixed version

* fixed version

* pure pipeline

d0807122

[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2

Baizhou Zhang authored Jul 21, 2023

* modify t5 policy & add test

* pipeline stage distribution for t5

* complete t5 base policy

* t5 stack: halfway

* modify gpt2 pipeline test

* complete pipeline forward for T5Stack/T5EncoderModel

* fix docstring

* move t5 util tests to test_pipeline

36e546b2

[pipeline] reformat for unified design (#4283) · 18ebcf40
Jianghai authored Jul 21, 2023
```
* bert_reformat

* reformat

* reformat

* fix a typo

* format

* format

* fix bug
```
18ebcf40

[hotfix] fix opt pipeline (#4293) · 0a8f3c85

Jianghai authored Jul 20, 2023

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

* fix bug

0a8f3c85

[pipeline] OPT model pipeline (#4258) · d8408d18

Jianghai authored Jul 20, 2023

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

d8408d18

[pipeline] refactor gpt2 pipeline forwards (#4287) · b774d5ea

Baizhou Zhang authored Jul 20, 2023

* move gpt2 pipeline forwards to modeling folder

* check pipeline status when adding replacing policy

* fix typehint

* fix arguments processing in gpt2_model_forward

b774d5ea

[shardformer] support inplace sharding (#4251) · d921ce83

Hongxin Liu authored Jul 20, 2023

* [shardformer] embedding support inplace sharding

* [shardformer] linear support inplace sharding

* [shardformer] layernorm support inplace sharding

* [shardformer] qkv support inplace sharding

* [test] update shardformer layer test

* [shardformer] fix shared param sharding

* [shardformer] fix bert policy

* [shardformer] fix bloom policy

* [shardformer] fix llama policy

* [shardformer] fix opt policy

* [shardformer] fix t5 policy

* [shardformer] fix fused qkv linear

* [shardformer] fix bugs

* force sync

* [test] fix bugs

* [test] fix transformer version

d921ce83

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete... · 2a2eacfa

Baizhou Zhang authored Jul 19, 2023

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)

* change for transformers loggers

* add forward for GPT2ForQuestionAnswering

* fix assert

* fix torchrec test

2a2eacfa

[bugs] hot fix some testing bugs for new models (#4268) · d9be0472
Jianghai authored Jul 18, 2023
```
* hot fix

* hot fx tracer
```
d9be0472

[pipeline] finish bloom models pipeline and tests (#4223) · 34f0e34a

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

* support all bloom models

* add bloom models policies

* finish bloom pipeline and tests

* add set pipeline

* finish bloom

34f0e34a

[pipeline] All bert models (#4233) · e7cc62d7

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* finish some bert models

* finish all bert models

* finish bert tests

* fix bugs

* fix bugs

* fix test pipeline

* fix data gen for qa

* update the set pipeline forward

* shared params

* fix bugs

e7cc62d7

[pipeline] add pipeline forward for variants of gpt2 (#4238) · a14d3520

Baizhou Zhang authored Jul 17, 2023

* add forward for GPTLMHeadModel

* add test for gpt_lm

* arranging get_held_layers method

* arrange forward replacement

* add forward for GPT2ForTokenClassification

* add forward for GPT2ForSequenceClassification

* fix test_shard_gpt2.py

* add GPT2DoubleHeadsmodel & fix bugs

* add id checking in get_shared_params

a14d3520

[shardformer] fix base policy (#4229) · 7e4de520
Hongxin Liu authored Jul 14, 2023

7e4de520

[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) · 208ac8f2

Baizhou Zhang authored Jul 13, 2023

* * fix typehint & docstring in sharder.py

* * update pipeline forward for GPT2Model

* * add test for pipeline forward of GPT2Model

* * add cache cleaning in gpt2 test

* * change assert to raise command

208ac8f2

[pipeline] add bloom model pipeline (#4210) · 37d22f68

Jianghai authored Jul 13, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

37d22f68

[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) · 31bcf867

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

31bcf867

[pipeline] Llama pipeline (#4205) · 16220310

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

16220310

[pipeline] Bert pipeline for shardformer and its tests (#4197) · 1094e0f0

Jianghai authored Jul 10, 2023

* add pipeline forward

* complete pipeline forward check

* fix bert forward without pipeline

* fix comments

* discard useless line

* add todo

* clean prints

* fix distribute layers

1094e0f0

[shardformer] support lazy init (#4202) · 890774b2

Hongxin Liu authored Jul 10, 2023

* [shardformer] support lazy init

* [shardformer] linear support lazy init

* [shardformer] embedding support lazy init

* [shardformer] norm support lazy init

* [shardformer] fused linear support lazy init

* [test] update shardformer test layer

* [test] shardformer with lazy init fit ddp

* [lazy] hotfix deepcopy of param

* [shardformer] fix bert policy and update test

* [shardformer] fix bloom policy and update test

* [shardformer] fix opt policy and update test

* [shardformer] fix t5 policy and update test

* [shardformer] fix gpt2 policy and update test

* [shardformer] fix llama policy and update test

890774b2

[pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292

Jianghai authored Jul 07, 2023

* move bert related pipeline components to shardformer

* fix bugs

* revision

* fix bert model tests

* fix bert_lm_head model tests

* fix tests

* fix tests

* done checks

* skip bloom

f3bcc292

[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) · c5ea7280

Jianghai authored Jul 06, 2023

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params

c5ea7280