Commits · 5a560a060a3c2212cb1d10462da8dfdb0b40a31e · OpenDAS / ColossalAI

11 Mar, 2022 4 commits

Jiarui Fang authored Mar 01, 2022



* add zero1 (#209)

* add zero1

* add test zero1

* update zero stage 1 develop (#212)

* Implement naive zero3 (#240)

* naive zero3 works well

* add zero3 param manager

* add TODOs in comments

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* fix bugs of hook and add unit tests (#252)

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* polish code and add state dict hook

* fix bug

* update unit test

* refactor reconstructed zero code

* clip_grad support zero3 and add unit test

* add unit test for Zero3ParameterManager

* [WIP] initialize the shard param class

* [WIP] Yet another sharded model implementation (#274)

* [WIP] initialize the shard param class

* [WIP] Yes another implementation of shardModel. Using a better hook method.

* torch.concat -> torch.cat

* fix test_zero_level_1.py::test_zero_level_1 unitest

* remove deepspeed implementation and refactor for the reconstructed zero module

* polish zero dp unittests
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>

5a560a06

Added TPExpert for special situation · 82023779
1SAA authored Feb 27, 2022

82023779
Optimized MoE layer and fixed some bugs; · 219df6e6
1SAA authored Feb 18, 2022
```
Decreased moe tests;

Added FFNExperts and ViTMoE model
```
219df6e6

fixed padding index issue for vocab parallel embedding layers; updated 3D... · 3dba0705

zbian authored Feb 17, 2022

fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial

3dba0705

15 Feb, 2022 1 commit

moved env variables to global variables; (#215) · 9ee197d0

アマデウス authored Feb 14, 2022

added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs

9ee197d0

25 Jan, 2022 1 commit

add pytorch hooks (#179) · 569357fe

Jiarui Fang authored Jan 25, 2022

* add pytorch hooks
fix #175

* remove licenses in src code

* add gpu memory tracer

* replacing print with logger in ophooks.

569357fe

20 Jan, 2022 1 commit
- adapted for sequence parallel (#163) · e2089c5c
  Frank Lee authored Jan 20, 2022
  
  e2089c5c
17 Jan, 2022 1 commit
- pipeline last stage supports multi output (#151) · 7bf1e98b
  ver217 authored Jan 17, 2022
  
  7bf1e98b
30 Dec, 2021 1 commit

Optimize pipeline schedule (#94) · 96780e6e

ver217 authored Dec 30, 2021



* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions
Co-authored-by: Frank Lee <somerlee.9@gmail.com>

96780e6e

29 Dec, 2021 1 commit

Hotfix/Colossalai layers (#92) · 01a80cd8

アマデウス authored Dec 29, 2021



* optimized 1d layer apis; reorganized nn.layer modules; fixed tests

* fixed 2.5d runtime issue

* reworked split batch, now called in trainer.schedule.load_batch
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

01a80cd8

27 Dec, 2021 1 commit

Layer integration (#83) · 0fedef4f

アマデウス authored Dec 27, 2021



* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

0fedef4f

20 Dec, 2021 2 commits
- add interleaved pipeline, fix naive amp and update pipeline model initializer (#80) · 8f02a88d
  ver217 authored Dec 20, 2021
  
  8f02a88d
- fixed zero level 3 dtype bug (#76) · 91c327cb
  Frank Lee authored Dec 20, 2021
  
  91c327cb
16 Dec, 2021 1 commit
- added CI for unit testing (#69) · cd9c28e0
  Frank Lee authored Dec 16, 2021
  
  cd9c28e0
09 Dec, 2021 1 commit

Develop/experiments (#59) · da01c234

Frank Lee authored Dec 09, 2021



* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

da01c234

18 Nov, 2021 1 commit

Support TP-compatible Torch AMP and Update trainer API (#27) · 3defa32a

Frank Lee authored Nov 18, 2021



* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

3defa32a

28 Oct, 2021 2 commits
- cleaned test scripts · 3245a69f
  アマデウス authored Oct 29, 2021
  
  3245a69f
- Migrated project · 404ecbdc
  zbian authored Oct 28, 2021
  
  404ecbdc