Commits · 799d105bb44d009859400faebe658075f17a65d9 · OpenDAS / ColossalAI

11 Mar, 2022 23 commits

using pytest parametrize · 799d105b
jiaruifang authored Mar 08, 2022

799d105b
show pytest parameterize · dec24561
jiaruifang authored Mar 08, 2022

dec24561
[zero] update zero context init with the updated test utils (#327) · 11bddb6e
Jiarui Fang authored Mar 08, 2022

11bddb6e
[test] refactored testing components (#324) · 6268446b
Frank Lee authored Mar 08, 2022

6268446b

[zero] zero init context (#321) · de0468c7

Jiarui Fang authored Mar 07, 2022

* add zero init context

* add more flags for zero init context
fix bug of repeated converting param to ShardedParamV2

* polish code

de0468c7

Added profiler communication operations · 73bff112
1SAA authored Mar 04, 2022
```
Fixed bug for learning rate scheduler
```
73bff112

[zero] cpu adam kernel (#288) · a3269de5

LuGY authored Mar 04, 2022



* Added CPU Adam

* finished the cpu adam

* updated the license

* delete useless parameters, removed resnet

* modified the method off cpu adam unittest

* deleted some useless codes

* removed useless codes
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: jiaruifang <fangjiarui123@gmail.com>

a3269de5

[zero] yet an improved sharded param (#311) · 90d3aef6
Jiarui Fang authored Mar 04, 2022

90d3aef6

[zero] polish shard strategy (#310) · c9e7d958

Jiarui Fang authored Mar 04, 2022

* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code

* add shard stratgy

* move shard and gather logic to shard strategy from shard tensor.

* polish code

c9e7d958

fix sharded param hook and unit test · 36f9a74a
ver217 authored Mar 04, 2022

36f9a74a
impl shard optim v2 and add unit test · 001ca624
ver217 authored Mar 04, 2022

001ca624
[zero] a shard strategy in granularity of tensor (#307) · 74f77e31
Jiarui Fang authored Mar 04, 2022

74f77e31

[zero] sharded tensor (#305) · 80364c76

Jiarui Fang authored Mar 04, 2022

* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code

80364c76

[profiler] primary memory tracer · d3446892
Jie Zhu authored Mar 04, 2022

d3446892

Polish sharded parameter (#297) · e17e92c5

Jiarui Fang authored Mar 03, 2022

* init shard param from shape tuple

* add more unitest for shard param

* add more unittests to shareded param

e17e92c5

[zero] add sharded grad and refactor grad hooks for ShardedModel (#287) · 7aef75ca
ver217 authored Mar 02, 2022

7aef75ca
added unit test for sharded optimizer (#293) · 27155b85
Frank Lee authored Mar 02, 2022
```
* added unit test for sharded optimizer

* refactor for elegance
```
27155b85
added buffer sync to naive amp model wrapper (#291) · e17e54e3
Frank Lee authored Mar 02, 2022

e17e54e3
add a common util for hooks registered on parameter. (#292) · 8d653af4
Jiarui Fang authored Mar 02, 2022

8d653af4

Feature/zero (#279) · 5a560a06

Jiarui Fang authored Mar 01, 2022



* add zero1 (#209)

* add zero1

* add test zero1

* update zero stage 1 develop (#212)

* Implement naive zero3 (#240)

* naive zero3 works well

* add zero3 param manager

* add TODOs in comments

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* fix bugs of hook and add unit tests (#252)

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* polish code and add state dict hook

* fix bug

* update unit test

* refactor reconstructed zero code

* clip_grad support zero3 and add unit test

* add unit test for Zero3ParameterManager

* [WIP] initialize the shard param class

* [WIP] Yet another sharded model implementation (#274)

* [WIP] initialize the shard param class

* [WIP] Yes another implementation of shardModel. Using a better hook method.

* torch.concat -> torch.cat

* fix test_zero_level_1.py::test_zero_level_1 unitest

* remove deepspeed implementation and refactor for the reconstructed zero module

* polish zero dp unittests
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>

5a560a06

Added TPExpert for special situation · 82023779
1SAA authored Feb 27, 2022

82023779
Optimized MoE layer and fixed some bugs; · 219df6e6
1SAA authored Feb 18, 2022
```
Decreased moe tests;

Added FFNExperts and ViTMoE model
```
219df6e6

fixed padding index issue for vocab parallel embedding layers; updated 3D... · 3dba0705

zbian authored Feb 17, 2022

fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial

3dba0705

15 Feb, 2022 1 commit

moved env variables to global variables; (#215) · 9ee197d0

アマデウス authored Feb 14, 2022

added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs

9ee197d0

25 Jan, 2022 1 commit

add pytorch hooks (#179) · 569357fe

Jiarui Fang authored Jan 25, 2022

* add pytorch hooks
fix #175

* remove licenses in src code

* add gpu memory tracer

* replacing print with logger in ophooks.

569357fe

20 Jan, 2022 1 commit
- adapted for sequence parallel (#163) · e2089c5c
  Frank Lee authored Jan 20, 2022
  
  e2089c5c
17 Jan, 2022 1 commit
- pipeline last stage supports multi output (#151) · 7bf1e98b
  ver217 authored Jan 17, 2022
  
  7bf1e98b
30 Dec, 2021 1 commit

Optimize pipeline schedule (#94) · 96780e6e

ver217 authored Dec 30, 2021



* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions
Co-authored-by: Frank Lee <somerlee.9@gmail.com>

96780e6e

29 Dec, 2021 1 commit

Hotfix/Colossalai layers (#92) · 01a80cd8

アマデウス authored Dec 29, 2021



* optimized 1d layer apis; reorganized nn.layer modules; fixed tests

* fixed 2.5d runtime issue

* reworked split batch, now called in trainer.schedule.load_batch
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

01a80cd8

27 Dec, 2021 1 commit

Layer integration (#83) · 0fedef4f

アマデウス authored Dec 27, 2021



* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

0fedef4f

20 Dec, 2021 2 commits
- add interleaved pipeline, fix naive amp and update pipeline model initializer (#80) · 8f02a88d
  ver217 authored Dec 20, 2021
  
  8f02a88d
- fixed zero level 3 dtype bug (#76) · 91c327cb
  Frank Lee authored Dec 20, 2021
  
  91c327cb
16 Dec, 2021 1 commit
- added CI for unit testing (#69) · cd9c28e0
  Frank Lee authored Dec 16, 2021
  
  cd9c28e0
09 Dec, 2021 1 commit

Develop/experiments (#59) · da01c234

Frank Lee authored Dec 09, 2021



* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

da01c234

18 Nov, 2021 1 commit

Support TP-compatible Torch AMP and Update trainer API (#27) · 3defa32a

Frank Lee authored Nov 18, 2021



* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

3defa32a

28 Oct, 2021 2 commits
- cleaned test scripts · 3245a69f
  アマデウス authored Oct 29, 2021
  
  3245a69f
- Migrated project · 404ecbdc
  zbian authored Oct 28, 2021
  
  404ecbdc