Commits · 5a1a095b925a78f00185954bb3783c73086dbade · OpenDAS / ColossalAI

14 Apr, 2022 1 commit
- [test] refactored with the new rerun decorator (#763) · 5a1a095b
  Frank Lee authored Apr 15, 2022
```
* [test] refactored with the new rerun decorator

* polish test case
```
  5a1a095b
13 Apr, 2022 2 commits
- [zero] add tensor placement policies (#743) · e396bb71
  ver217 authored Apr 13, 2022
```
* add tensor placement policies

* polish comments

* polish comments

* update moe unit tests
```
  e396bb71
- [zero] refactor ShardedParamV2 for convenience (#742) · 22c4b88d
  HELSON authored Apr 13, 2022
  
  22c4b88d
12 Apr, 2022 2 commits
- [utils] correct cpu memory used and capacity in the context of multi-process (#726) · 53cb5848
  Jiarui Fang authored Apr 12, 2022
  
  53cb5848
- [moe] add checkpoint for moe zero test (#729) · b9b469ea
  HELSON authored Apr 12, 2022
  
  b9b469ea
11 Apr, 2022 2 commits
- [refactor] refactor the memory utils (#715) · 193dc8da
  Jiarui Fang authored Apr 11, 2022
  
  193dc8da
- [zero] improve adaptability for not-shard parameters (#708) · a9b8300d
  HELSON authored Apr 11, 2022
```
* adapt post grad hooks for not-shard parameters
* adapt optimizer for not-shard parameters
* offload gradients for not-replicated parameters
```
  a9b8300d
08 Apr, 2022 1 commit
- [zero] adapt zero hooks for unsharded module (#699) · ee112fe1
  HELSON authored Apr 08, 2022
  
  ee112fe1
07 Apr, 2022 1 commit
- [zero] fix init bugs in zero context (#686) · d7ecaf36
  HELSON authored Apr 07, 2022
```
* adapt model weight initialization for methods in Pytorch nn.init
```
  d7ecaf36
03 Apr, 2022 1 commit
- [hotfix] fix a bug in model data stats tracing (#655) · 0aab5230
  Jiarui Fang authored Apr 03, 2022
  
  0aab5230
02 Apr, 2022 2 commits

[hotfix] fix bugs in testing (#659) · e5d615ae

HELSON authored Apr 02, 2022

* remove hybrid adam in test_moe_zero_optim

* fix activation checkpointing and its unitest

e5d615ae

fix bugs in CPU adam (#633) · b31daed4

HELSON authored Apr 02, 2022

* add cpu adam counter for all cpu adam

* fixed updating error in adam kernel

b31daed4

01 Apr, 2022 1 commit
- [zero] adapt zero for unsharded paramters (Optimizer part) (#601) · 055fbf5b
  HELSON authored Apr 01, 2022
  
  055fbf5b
31 Mar, 2022 2 commits
- [zero] adapt zero for unsharded parameters (#561) · e6d50ec1
  HELSON authored Mar 31, 2022
```
* support existing sharded and unsharded parameters in zero

* add unitest for moe-zero model init

* polish moe gradient handler
```
  e6d50ec1
- [polish] rename col_attr -> colo_attr (#558) · 7675366f
  Jiarui Fang authored Mar 31, 2022
  
  7675366f
29 Mar, 2022 2 commits
- [zero] add zero context manager to change config during initialization (#546) · 8c90d4df
  HELSON authored Mar 29, 2022
  
  8c90d4df
- Refactored docstring to google style · ec5086c4
  Liang Bowen authored Mar 25, 2022
  
  ec5086c4
25 Mar, 2022 1 commit
- [test] fixed rerun_on_exception and adapted test cases (#487) · 3601b2ba
  Frank Lee authored Mar 25, 2022
  
  3601b2ba
23 Mar, 2022 1 commit
- [polish] polish singleton and global context (#500) · a445e118
  Jiarui Fang authored Mar 23, 2022
  
  a445e118
21 Mar, 2022 2 commits
- [format] polish name format for MOE (#481) · 65c0f380
  Jiarui Fang authored Mar 21, 2022
  
  65c0f380
- [MOE] add unitest for MOE experts layout, gradient handler and kernel (#469) · 75443471
  HELSON authored Mar 21, 2022
  
  75443471
18 Mar, 2022 1 commit
- add moe context, moe utilities and refactor gradient handler (#455) · 84fd7c1d
  HELSON authored Mar 18, 2022
  
  84fd7c1d
11 Mar, 2022 2 commits
- Added TPExpert for special situation · 82023779
  1SAA authored Feb 27, 2022
  
  82023779
- Optimized MoE layer and fixed some bugs; · 219df6e6
  1SAA authored Feb 18, 2022
```
Decreased moe tests;

Added FFNExperts and ViTMoE model
```
  219df6e6