Commits · 5c6f183192a203a19e7d1dadbefe3e814c7f05d1 · OpenDAS / ColossalAI

15 Aug, 2023 21 commits

[test] Hotfix/fix some model test and refactor check util api (#4369) · 5c6f1831

Bin Jia authored Aug 03, 2023

* fix llama test

* fix test bug of bert, blip2, bloom, gpt2

* fix llama test

* fix opt test

* fix sam test

* fix sam test

* fix t5 test

* fix vit test

* fix whisper test

* fix whisper test

* polish code

* adjust allclose parameter

* Add mistakenly deleted code

* addjust allclose

* change loss function for some base model

5c6f1831

[shardformer] support Blip2 (#4243) · 879301d0

FoolPlayer authored Jul 25, 2023

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

879301d0

[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit · 8120eca0
klhhhhh authored Jul 20, 2023

8120eca0
[shardformer] pre-commit check files · 4da05052
klhhhhh authored Jul 19, 2023

4da05052
[shardformer] ChatGLM support layernorm sharding · f155ae89
klhhhhh authored Jul 17, 2023

f155ae89
[shardformer] delete some file · 00f6ef15
klhhhhh authored Jul 17, 2023

00f6ef15
[shardformer] support chatglm without layernorm · dad00c42
klhhhhh authored Jul 14, 2023

dad00c42
[shardformer] polish code · cbb54d32
klhhhhh authored Jul 13, 2023

cbb54d32
[shardformer] add test kit in model zoo for chatglm · 6ee4c9ee
klhhhhh authored Jul 07, 2023

6ee4c9ee
import chatglm · 7377be7a
klhhhhh authored Jul 07, 2023

7377be7a

Feature/chatglm (#4240) · ed34bb13

Kun Lin authored Jul 20, 2023

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

ed34bb13

[shardformer] support whisper (#4212) · 9ee4ebea

FoolPlayer authored Jul 17, 2023

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

9ee4ebea

[shardformer] support SAM (#4231) · dd2bf026

FoolPlayer authored Jul 14, 2023

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

dd2bf026

[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline... · 0ceec8f9

Baizhou Zhang authored Aug 01, 2023

[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)

* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test

* merge tests of PP/DP/TP combinations into one test file

* fix bug when sync grad for dp in HybridPlugin

* update supported precisions for 3DPlugin/fix bug when shifting tp_degree

* improve the passing of lazy_init

* modify lazy_init/use sync_shared_params

0ceec8f9

[shardformer] support pipeline base vit model (#4284) · b3f5d7a3

FoolPlayer authored Jul 25, 2023



* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* support base vit pipeline

* support vit downstream model

* fix vit shard test

* modify hidden states return type

---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>

b3f5d7a3

[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) · 36e546b2

Baizhou Zhang authored Jul 21, 2023

* modify t5 policy & add test

* pipeline stage distribution for t5

* complete t5 base policy

* t5 stack: halfway

* modify gpt2 pipeline test

* complete pipeline forward for T5Stack/T5EncoderModel

* fix docstring

* move t5 util tests to test_pipeline

36e546b2

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete... · 2a2eacfa

Baizhou Zhang authored Jul 19, 2023

[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)

* change for transformers loggers

* add forward for GPT2ForQuestionAnswering

* fix assert

* fix torchrec test

2a2eacfa

[pipeline] All bert models (#4233) · e7cc62d7

Jianghai authored Jul 17, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* finish some bert models

* finish all bert models

* finish bert tests

* fix bugs

* fix bugs

* fix test pipeline

* fix data gen for qa

* update the set pipeline forward

* shared params

* fix bugs

e7cc62d7

[pipeline] add bloom model pipeline (#4210) · 37d22f68

Jianghai authored Jul 13, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

37d22f68

[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) · 31bcf867

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

31bcf867

[pipeline] Llama pipeline (#4205) · 16220310

Jianghai authored Jul 11, 2023

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

16220310

04 Jul, 2023 5 commits
- [shardformer] added embedding gradient check (#4124) · ae035d30
  Frank Lee authored Jun 30, 2023
  
  ae035d30
- [shardformer] supported bloom model (#4098) · b1c29015
  Frank Lee authored Jun 28, 2023
  
  b1c29015
- [shardformer] shardformer support opt models (#4091) · ac809371
  jiangmingyan authored Jun 27, 2023
```
* [shardformer] shardformer support opt models

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix
```
  ac809371
- support kit use for bert/gpt test (#4055) · 7740c55c
  FoolPlayer authored Jun 22, 2023
```
* support kit use for bert test

* support kit test for gpt2
```
  7740c55c
- [shardformer] adapted T5 and LLaMa test to use kit (#4049) · 58df7205
  Frank Lee authored Jun 21, 2023
```
* [shardformer] adapted T5 and LLaMa test to use kit

* polish code
```
  58df7205
09 Jun, 2023 1 commit
- fix typo tests/ (#3936) · e61ffc77
  digger yu authored Jun 09, 2023
  
  e61ffc77
26 Apr, 2023 1 commit

[booster] add low level zero plugin (#3594) · 4b3240cb

Hongxin Liu authored Apr 26, 2023

* [booster] add low level zero plugin

* [booster] fix gemini plugin test

* [booster] fix precision

* [booster] add low level zero plugin test

* [test] fix booster plugin test oom

* [test] fix booster plugin test oom

* [test] fix googlenet and inception output trans

* [test] fix diffuser clip vision model

* [test] fix torchaudio_wav2vec2_base

* [test] fix low level zero plugin test

4b3240cb

22 Mar, 2023 1 commit

[FX] refactor experimental tracer and adapt it with hf models (#3157) · f57d3495

YuliangLiu0306 authored Mar 22, 2023

* pass gpt trace and meta_prop

* pass t5 trace and meta_prop

* [FX] refactor experimental tracer and adapt it with hf models

* pass all mainstream model zoo

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* skip tests

* fix CI

* using packaging version

* polish

f57d3495

20 Mar, 2023 2 commits

[test] fixed torchrec registration in model zoo (#3177) · 085e7f4e
Frank Lee authored Mar 20, 2023
```
* [test] fixed torchrec registration in model zoo

* polish code

* polish code

* polish code
```
085e7f4e

[test] fixed torchrec model test (#3167) · 1ad3a636

Frank Lee authored Mar 20, 2023

* [test] fixed torchrec model test

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

1ad3a636

17 Mar, 2023 2 commits

[lazyinit] add correctness verification (#3147) · 6ae8ed04

ver217 authored Mar 17, 2023

* [lazyinit] fix shared module

* [tests] add lazy init test utils

* [tests] add torchvision for lazy init

* [lazyinit] fix pre op fn

* [lazyinit] handle legacy constructor

* [tests] refactor lazy init test models

* [tests] refactor lazy init test utils

* [lazyinit] fix ops don't support meta

* [tests] lazy init test timm models

* [lazyinit] fix set data

* [lazyinit] handle apex layers

* [tests] lazy init test transformers models

* [tests] lazy init test torchaudio models

* [lazyinit] fix import path

* [tests] lazy init test torchrec models

* [tests] update torch version in CI

* [tests] revert torch version in CI

* [tests] skip lazy init test

6ae8ed04

[booster] implemented mixed precision class (#3151) · ed192905
Frank Lee authored Mar 17, 2023
```
* [booster] implemented mixed precision class

* polish code
```
ed192905

15 Mar, 2023 4 commits
- [test] add torchrec models to test model zoo (#3139) · ecd643f1
  YuliangLiu0306 authored Mar 15, 2023
  
  ecd643f1
- [tests] model zoo add torchaudio models (#3138) · 14a11500
  ver217 authored Mar 15, 2023
```
* [tests] model zoo add torchaudio models

* [tests] refactor torchaudio wavernn

* [tests] refactor fx torchaudio tests
```
  14a11500
- [test] added transformers models to test model zoo (#3135) · 6d48eb05
  Frank Lee authored Mar 15, 2023
  
  6d48eb05
- [test] added torchvision models to test model zoo (#3132) · a674c633
  Frank Lee authored Mar 15, 2023
```
* [test] added torchvision models to test model zoo

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
```
  a674c633
14 Mar, 2023 2 commits

[tests] diffuser models in model zoo (#3136) · 1216d1e7

HELSON authored Mar 14, 2023

* [tests] diffuser models in model zoo

* remove useless code

* [tests] add diffusers to requirement-test

1216d1e7

[test] added timm models to test model zoo (#3129) · 86ac782d

Frank Lee authored Mar 14, 2023

* [test] added timm models to test model zoo

* polish code

* polish code

* polish code

* polish code

* polish code

86ac782d