Commits · cb34cd384d7235f3906e8b13a73612955c6413f2 · OpenDAS / ColossalAI

"vscode:/vscode.git/clone" did not exist on "4d7537ba254dc0b82aaad735d6760065feefe1df"

11 Mar, 2022 34 commits
- [test] polish zero related unitest (#351) · cb34cd38
  Jiarui Fang authored Mar 10, 2022
  
  cb34cd38
- add test sharded optim with cpu adam (#347) · 532ae79c
  ver217 authored Mar 09, 2022
  
  532ae79c
- Added Profiler Context to manage all profilers (#340) · 425bb0df
  HELSON authored Mar 09, 2022
  
  425bb0df
- [zero] update sharded optim v2 (#334) · d0ae0f22
  ver217 authored Mar 09, 2022
  
  d0ae0f22
- skip bert in test engine · 2b8cddd4
  ver217 authored Mar 09, 2022
  
  2b8cddd4
- fix bert unit test · f5f0ad26
  ver217 authored Mar 09, 2022
  
  f5f0ad26
- polish engine unitest · d271f259
  jiaruifang authored Mar 09, 2022
  
  d271f259
- polish code · 354c0f90
  jiaruifang authored Mar 09, 2022
  
  354c0f90
- adapting bert unitest interface · 4d94cd51
  jiaruifang authored Mar 09, 2022
  
  4d94cd51
- add bert for unitest and sharded model is not able to pass the bert case · 7977422a
  jiaruifang authored Mar 09, 2022
  
  7977422a
- [zero] Update sharded model v2 using sharded param v2 (#323) · 13886716
  ver217 authored Mar 08, 2022
  
  13886716
- using pytest parametrize · 799d105b
  jiaruifang authored Mar 08, 2022
  
  799d105b
- show pytest parameterize · dec24561
  jiaruifang authored Mar 08, 2022
  
  dec24561
- [zero] update zero context init with the updated test utils (#327) · 11bddb6e
  Jiarui Fang authored Mar 08, 2022
  
  11bddb6e
- [test] refactored testing components (#324) · 6268446b
  Frank Lee authored Mar 08, 2022
  
  6268446b
- [zero] zero init context (#321) · de0468c7
  Jiarui Fang authored Mar 07, 2022
```
* add zero init context

* add more flags for zero init context
fix bug of repeated converting param to ShardedParamV2

* polish code
```
  de0468c7
- Added profiler communication operations · 73bff112
  1SAA authored Mar 04, 2022
```
Fixed bug for learning rate scheduler
```
  73bff112
- [zero] cpu adam kernel (#288) · a3269de5
  LuGY authored Mar 04, 2022
```
* Added CPU Adam

* finished the cpu adam

* updated the license

* delete useless parameters, removed resnet

* modified the method off cpu adam unittest

* deleted some useless codes

* removed useless codes
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: jiaruifang <fangjiarui123@gmail.com>
```
  a3269de5
- [zero] yet an improved sharded param (#311) · 90d3aef6
  Jiarui Fang authored Mar 04, 2022
  
  90d3aef6
- [zero] polish shard strategy (#310) · c9e7d958
  Jiarui Fang authored Mar 04, 2022
```
* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code

* add shard stratgy

* move shard and gather logic to shard strategy from shard tensor.

* polish code
```
  c9e7d958
- fix sharded param hook and unit test · 36f9a74a
  ver217 authored Mar 04, 2022
  
  36f9a74a
- impl shard optim v2 and add unit test · 001ca624
  ver217 authored Mar 04, 2022
  
  001ca624
- [zero] a shard strategy in granularity of tensor (#307) · 74f77e31
  Jiarui Fang authored Mar 04, 2022
  
  74f77e31
- [zero] sharded tensor (#305) · 80364c76
  Jiarui Fang authored Mar 04, 2022
```
* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code
```
  80364c76
- [profiler] primary memory tracer · d3446892
  Jie Zhu authored Mar 04, 2022
  
  d3446892
- Polish sharded parameter (#297) · e17e92c5
  Jiarui Fang authored Mar 03, 2022
```
* init shard param from shape tuple

* add more unitest for shard param

* add more unittests to shareded param
```
  e17e92c5
- [zero] add sharded grad and refactor grad hooks for ShardedModel (#287) · 7aef75ca
  ver217 authored Mar 02, 2022
  
  7aef75ca
- added unit test for sharded optimizer (#293) · 27155b85
  Frank Lee authored Mar 02, 2022
```
* added unit test for sharded optimizer

* refactor for elegance
```
  27155b85
- added buffer sync to naive amp model wrapper (#291) · e17e54e3
  Frank Lee authored Mar 02, 2022
  
  e17e54e3
- add a common util for hooks registered on parameter. (#292) · 8d653af4
  Jiarui Fang authored Mar 02, 2022
  
  8d653af4
- Feature/zero (#279) · 5a560a06
  Jiarui Fang authored Mar 01, 2022
```
* add zero1 (#209)

* add zero1

* add test zero1

* update zero stage 1 develop (#212)

* Implement naive zero3 (#240)

* naive zero3 works well

* add zero3 param manager

* add TODOs in comments

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* fix bugs of hook and add unit tests (#252)

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* polish code and add state dict hook

* fix bug

* update unit test

* refactor reconstructed zero code

* clip_grad support zero3 and add unit test

* add unit test for Zero3ParameterManager

* [WIP] initialize the shard param class

* [WIP] Yet another sharded model implementation (#274)

* [WIP] initialize the shard param class

* [WIP] Yes another implementation of shardModel. Using a better hook method.

* torch.concat -> torch.cat

* fix test_zero_level_1.py::test_zero_level_1 unitest

* remove deepspeed implementation and refactor for the reconstructed zero module

* polish zero dp unittests
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
```
  5a560a06
- Added TPExpert for special situation · 82023779
  1SAA authored Feb 27, 2022
  
  82023779
- Optimized MoE layer and fixed some bugs; · 219df6e6
  1SAA authored Feb 18, 2022
```
Decreased moe tests;

Added FFNExperts and ViTMoE model
```
  219df6e6
- fixed padding index issue for vocab parallel embedding layers; updated 3D... · 3dba0705
  zbian authored Feb 17, 2022
```
fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial
```
  3dba0705
15 Feb, 2022 1 commit

moved env variables to global variables; (#215) · 9ee197d0

アマデウス authored Feb 14, 2022

added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs

9ee197d0

25 Jan, 2022 1 commit

add pytorch hooks (#179) · 569357fe

Jiarui Fang authored Jan 25, 2022

* add pytorch hooks
fix #175

* remove licenses in src code

* add gpu memory tracer

* replacing print with logger in ophooks.

569357fe

20 Jan, 2022 1 commit
- adapted for sequence parallel (#163) · e2089c5c
  Frank Lee authored Jan 20, 2022
  
  e2089c5c
17 Jan, 2022 1 commit
- pipeline last stage supports multi output (#151) · 7bf1e98b
  ver217 authored Jan 17, 2022
  
  7bf1e98b
30 Dec, 2021 1 commit

Optimize pipeline schedule (#94) · 96780e6e

ver217 authored Dec 30, 2021



* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions
Co-authored-by: Frank Lee <somerlee.9@gmail.com>

96780e6e

29 Dec, 2021 1 commit

Hotfix/Colossalai layers (#92) · 01a80cd8

アマデウス authored Dec 29, 2021



* optimized 1d layer apis; reorganized nn.layer modules; fixed tests

* fixed 2.5d runtime issue

* reworked split batch, now called in trainer.schedule.load_batch
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

01a80cd8