Commits · d921ce83915f5b5f2f01a31b0d591c38e02d90b4 · OpenDAS / ColossalAI

15 Aug, 2023 15 commits

[shardformer] support inplace sharding (#4251) · d921ce83

Hongxin Liu authored Jul 20, 2023

* [shardformer] embedding support inplace sharding

* [shardformer] linear support inplace sharding

* [shardformer] layernorm support inplace sharding

* [shardformer] qkv support inplace sharding

* [test] update shardformer layer test

* [shardformer] fix shared param sharding

* [shardformer] fix bert policy

* [shardformer] fix bloom policy

* [shardformer] fix llama policy

* [shardformer] fix opt policy

* [shardformer] fix t5 policy

* [shardformer] fix fused qkv linear

* [shardformer] fix bugs

* force sync

* [test] fix bugs

* [test] fix transformer version

d921ce83

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete... · 2a2eacfa

Baizhou Zhang authored Jul 19, 2023

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)

* change for transformers loggers

* add forward for GPT2ForQuestionAnswering

* fix assert

* fix torchrec test

2a2eacfa

[bugs] hot fix some testing bugs for new models (#4268) · d9be0472
Jianghai authored Jul 18, 2023
```
* hot fix

* hot fx tracer
```
d9be0472

[pipeline] finish bloom models pipeline and tests (#4223) · 34f0e34a

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

* support all bloom models

* add bloom models policies

* finish bloom pipeline and tests

* add set pipeline

* finish bloom

34f0e34a

[pipeline] All bert models (#4233) · e7cc62d7

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* finish some bert models

* finish all bert models

* finish bert tests

* fix bugs

* fix bugs

* fix test pipeline

* fix data gen for qa

* update the set pipeline forward

* shared params

* fix bugs

e7cc62d7

[pipeline] add pipeline forward for variants of gpt2 (#4238) · a14d3520

Baizhou Zhang authored Jul 17, 2023

* add forward for GPTLMHeadModel

* add test for gpt_lm

* arranging get_held_layers method

* arrange forward replacement

* add forward for GPT2ForTokenClassification

* add forward for GPT2ForSequenceClassification

* fix test_shard_gpt2.py

* add GPT2DoubleHeadsmodel & fix bugs

* add id checking in get_shared_params

a14d3520

[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) · 208ac8f2

Baizhou Zhang authored Jul 13, 2023

* * fix typehint & docstring in sharder.py

* * update pipeline forward for GPT2Model

* * add test for pipeline forward of GPT2Model

* * add cache cleaning in gpt2 test

* * change assert to raise command

208ac8f2

[pipeline] add bloom model pipeline (#4210) · 37d22f68

Jianghai authored Jul 13, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

37d22f68

[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) · 31bcf867

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

31bcf867

[pipeline] Llama pipeline (#4205) · 16220310

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

16220310

[pipeline] Bert pipeline for shardformer and its tests (#4197) · 1094e0f0

Jianghai authored Jul 10, 2023

* add pipeline forward

* complete pipeline forward check

* fix bert forward without pipeline

* fix comments

* discard useless line

* add todo

* clean prints

* fix distribute layers

1094e0f0

[shardformer] support lazy init (#4202) · 890774b2

Hongxin Liu authored Jul 10, 2023

* [shardformer] support lazy init

* [shardformer] linear support lazy init

* [shardformer] embedding support lazy init

* [shardformer] norm support lazy init

* [shardformer] fused linear support lazy init

* [test] update shardformer test layer

* [test] shardformer with lazy init fit ddp

* [lazy] hotfix deepcopy of param

* [shardformer] fix bert policy and update test

* [shardformer] fix bloom policy and update test

* [shardformer] fix opt policy and update test

* [shardformer] fix t5 policy and update test

* [shardformer] fix gpt2 policy and update test

* [shardformer] fix llama policy and update test

890774b2

[pipeline] move bert related pipeline components to shardformer (#4187) · f3bcc292

Jianghai authored Jul 07, 2023

* move bert related pipeline components to shardformer

* fix bugs

* revision

* fix bert model tests

* fix bert_lm_head model tests

* fix tests

* fix tests

* done checks

* skip bloom

f3bcc292

[test] add shard util tests · 5fc60a3a
ver217 authored Jul 05, 2023

5fc60a3a
[test] update shardformer tests · 2d6cc07f
ver217 authored Jul 05, 2023

2d6cc07f

04 Jul, 2023 25 commits
- [format] applied code formatting on changed files in pull request 4152 (#4157) · c77b3b19
  github-actions[bot] authored Jul 04, 2023
```
Co-authored-by: github-actions <github-actions@github.com>
```
  c77b3b19
- [shardformer] made tensor parallelism configurable (#4144) · 1fb0d95d
  Frank Lee authored Jul 04, 2023
```
* [shardformer] made tensor parallelism configurable

* polish code
```
  1fb0d95d
- [shardformer] refactored some doc and api (#4137) · 74257cb4
  Frank Lee authored Jul 03, 2023
```
* [shardformer] refactored some doc and api

* polish code
```
  74257cb4
- [shardformer] added embedding gradient check (#4124) · ae035d30
  Frank Lee authored Jun 30, 2023
  
  ae035d30
- [shardformer] integrate with data parallelism (#4103) · 6a88bae4
  Frank Lee authored Jun 30, 2023
  
  6a88bae4
- [shardformer] supported fused normalization (#4112) · f3b6aaa6
  Frank Lee authored Jun 30, 2023
  
  f3b6aaa6
- [shardformer] supported bloom model (#4098) · b1c29015
  Frank Lee authored Jun 28, 2023
  
  b1c29015
- [shardformer] support vision transformer (#4096) · 8af29ee4
  Kun Lin authored Jun 28, 2023
```
* first v of vit shardformer

* keep vit

* update

* vit shard add vitattention vitlayer

* update num head shard para

* finish test for vit

* add new_model_class & postprocess

* add vit readme

* delete old files & fix the conflict

* fix sth
```
  8af29ee4
- [shardformer] shardformer support opt models (#4091) · ac809371
  jiangmingyan authored Jun 27, 2023
```
* [shardformer] shardformer support opt models

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix
```
  ac809371
- [shardformer] refactored layernorm (#4086) · d33a44e8
  Frank Lee authored Jun 26, 2023
  
  d33a44e8
- [shardformer] Add layernorm (#4072) · 92f67910
  FoolPlayer authored Jun 23, 2023
```
* add layernorm to bert

* add layernorm test

* add layernorm test with load state dict

* add use_mixedfusedLN in shard config

* refactor policy to support fused_layernorm
```
  92f67910
- [shardformer] supported fused qkv checkpoint (#4073) · 70c58cfd
  Frank Lee authored Jun 23, 2023
  
  70c58cfd
- [shardformer] add linearconv1d test (#4067) · 0803a614
  FoolPlayer authored Jun 22, 2023
```
* add linearconv1d test

* add linearconv1d test
```
  0803a614
- [shardformer] support module saving and loading (#4062) · 8eb09a4c
  Frank Lee authored Jun 22, 2023
```
* [shardformer] support module saving and loading

* polish code
```
  8eb09a4c
- support kit use for bert/gpt test (#4055) · 7740c55c
  FoolPlayer authored Jun 22, 2023
```
* support kit use for bert test

* support kit test for gpt2
```
  7740c55c
- [shardformer] refactored the shardformer layer structure (#4053) · f22ddace
  Frank Lee authored Jun 21, 2023
  
  f22ddace
- [shardformer] adapted T5 and LLaMa test to use kit (#4049) · 58df7205
  Frank Lee authored Jun 21, 2023
```
* [shardformer] adapted T5 and LLaMa test to use kit

* polish code
```
  58df7205
- [shardformer] add gpt2 test and layer class refactor (#4041) · 4021b9a8
  FoolPlayer authored Jun 20, 2023
```
* add gpt2 test and layer class refactor

* add dropout in gpt2 policy
```
  4021b9a8
- [shardformer] supported T5 and its variants (#4045) · d857f3db
  Frank Lee authored Jun 19, 2023
  
  d857f3db
- [shardformer] adapted llama to the new API (#4036) · c1d5453e
  Frank Lee authored Jun 19, 2023
  
  c1d5453e
- [shardformer] fix bert and gpt downstream with new api (#4024) · 74d176c8
  FoolPlayer authored Jun 19, 2023
```
* fix bert downstream with new api

* remove comment line
```
  74d176c8
- add vocabembedding layer · 507c0ad3
  FoolPlayer authored Jun 16, 2023
  
  507c0ad3
- [shardformer] refactored embedding and dropout to parallel module (#4013) · 3893fa1a
  Frank Lee authored Jun 16, 2023
```
* [shardformer] refactored embedding and dropout to parallel module

* polish code
```
  3893fa1a
- integrate with dist layer (#4011) · dfca9678
  FoolPlayer authored Jun 16, 2023
  
  dfca9678
- [shardformer] integrated linear 1D with dtensor (#3996) · 015af592
  Frank Lee authored Jun 15, 2023
```
* [shardformer] integrated linear 1D with dtensor

* polish code
```
  015af592