Commits · 2499faa2dbcc96cf5adaf23d348abb3b2d7b5f8c · OpenDAS / ColossalAI

"vscode:/vscode.git/clone" did not exist on "02192a632e6c6f965d93ec79937f97e10e121307"

18 Jan, 2022 5 commits
- update benchmark commit id (#159) · 2499faa2
  Frank Lee authored Jan 18, 2022
  
  2499faa2
- Added rand augment and update the dataloader · d143396c
  LuGY_mac authored Jan 14, 2022
  
  d143396c
- set benchmarks as a git submodule (#156) · c7b8ece7
  Frank Lee authored Jan 18, 2022
```
* remove benchmark folder

* added benchmark submodule

* update .gitmodules
```
  c7b8ece7
- fixed jit default setting (#154) · f3802d6b
  Frank Lee authored Jan 18, 2022
  
  f3802d6b
- added docker documentation (#152) · a1da3900
  Frank Lee authored Jan 18, 2022
  
  a1da3900
17 Jan, 2022 2 commits
- pipeline last stage supports multi output (#151) · 7bf1e98b
  ver217 authored Jan 17, 2022
  
  7bf1e98b
- Added moe parallel example (#140) · 1ff5be36
  HELSON authored Jan 17, 2022
  
  1ff5be36
13 Jan, 2022 1 commit
- refactor kernel (#142) · f68eddfb
  ver217 authored Jan 13, 2022
  
  f68eddfb
10 Jan, 2022 2 commits
- Update layer integration documentations (#108) · 4a3d3446
  BoxiangW authored Jan 10, 2022
```
Update the documentations of layer integration

Update _log_hook.py

Update _operation.py
```
  4a3d3446
- add doc issue template (#133) · 3a61d785
  binmakeswell authored Jan 10, 2022
  
  3a61d785
07 Jan, 2022 5 commits
- try import deepspeed when using zero (#130) · 9ef05ed1
  ver217 authored Jan 07, 2022
  
  9ef05ed1
- add workflow for deploying doc (#129) · b7975d2b
  ver217 authored Jan 07, 2022
  
  b7975d2b
- Added MoE parallel (#127) · dceae851
  HELSON authored Jan 07, 2022
  
  dceae851
- added docker image (#126) · 42741dd4
  Frank Lee authored Jan 07, 2022
  
  42741dd4
- add scatter/gather optim for pipeline (#123) · 293fb40c
  ver217 authored Jan 07, 2022
  
  293fb40c
06 Jan, 2022 2 commits
- Hotfix/gitact (#125) · 404e6f88
  Frank Lee authored Jan 07, 2022
```
* enable CI after PR sync

* Fixed github action
```
  404e6f88
- fix issue template (#118) · 43e7d546
  binmakeswell authored Jan 06, 2022
  
  43e7d546
05 Jan, 2022 1 commit
- fix a bug in timer (#114) · 2c0c85d3
  Jiarui Fang authored Jan 05, 2022
  
  2c0c85d3
04 Jan, 2022 4 commits
- fix layers/schedule for hybrid parallelization (#111) (#112) · 7904baf6
  ver217 authored Jan 04, 2022
  
  7904baf6
- update vit example for new API (#98) (#99) · f03bcb35
  ver217 authored Jan 04, 2022
  
  f03bcb35
- enable CI after PR sync (#97) · d09a79ba
  Frank Lee authored Jan 04, 2022
  
  d09a79ba
- update default logger (#100) (#101) · a951bc60
  ver217 authored Jan 04, 2022
  
  a951bc60
30 Dec, 2021 2 commits

Optimize pipeline schedule (#94) · 96780e6e

ver217 authored Dec 30, 2021



* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions
Co-authored-by: Frank Lee <somerlee.9@gmail.com>

96780e6e

added gpt model & benchmark (#95) · e5b9f9a0
アマデウス authored Dec 30, 2021

e5b9f9a0

29 Dec, 2021 1 commit

Hotfix/Colossalai layers (#92) · 01a80cd8

アマデウス authored Dec 29, 2021



* optimized 1d layer apis; reorganized nn.layer modules; fixed tests

* fixed 2.5d runtime issue

* reworked split batch, now called in trainer.schedule.load_batch
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

01a80cd8

27 Dec, 2021 1 commit

Layer integration (#83) · 0fedef4f

アマデウス authored Dec 27, 2021



* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

0fedef4f

21 Dec, 2021 2 commits

add colossalai kernel module (#55) · 5c3843dc
shenggan authored Dec 21, 2021

5c3843dc

add example of self-supervised SimCLR training - V2 (#50) · 648f8063

Xin Zhang authored Dec 21, 2021

* add example of self-supervised SimCLR training

* simclr v2, replace nvidia dali dataloader

* updated

* sync to latest code writing style

* sync to latest code writing style and modify README

* detail README & standardize dataset path

648f8063

20 Dec, 2021 2 commits
- add interleaved pipeline, fix naive amp and update pipeline model initializer (#80) · 8f02a88d
  ver217 authored Dec 20, 2021
  
  8f02a88d
- fixed zero level 3 dtype bug (#76) · 91c327cb
  Frank Lee authored Dec 20, 2021
  
  91c327cb
16 Dec, 2021 2 commits
- overlap computation and communication in 2d operations (#75) · 632e622d
  HELSON authored Dec 16, 2021
  
  632e622d
- added CI for unit testing (#69) · cd9c28e0
  Frank Lee authored Dec 16, 2021
  
  cd9c28e0
14 Dec, 2021 1 commit
- Update issue templates (#66) · 45355a62
  Frank Lee authored Dec 14, 2021
  
  45355a62
13 Dec, 2021 1 commit
- update examples and sphnix docs for the new api (#63) · 35813ed3
  Frank Lee authored Dec 13, 2021
  
  35813ed3
10 Dec, 2021 2 commits
- fix zero3 fp16 and add zero3 model context (#62) · 7d371105
  ver217 authored Dec 10, 2021
  
  7d371105
- update markdown docs (english) (#60) · 9a046653
  Frank Lee authored Dec 10, 2021
  
  9a046653
09 Dec, 2021 1 commit

Develop/experiments (#59) · da01c234

Frank Lee authored Dec 09, 2021



* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.

* improved consistency between trainer, engine and schedule (#23)
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>

da01c234

02 Dec, 2021 3 commits
- add how to build tfrecord dataset (#48) · eb2f8b1f
  ver217 authored Dec 02, 2021
  
  eb2f8b1f
- add some details in vit-b16 example (#46) · 4da256a5
  ver217 authored Dec 02, 2021
  
  4da256a5
- add some details in vit-b16 example (#43) (#44) · e67dab92
  ver217 authored Dec 02, 2021
  
  e67dab92