Commits · 5be118f405fccc0dcad3820ecff1f9d4d93c9a11 · OpenDAS / ColossalAI

24 Sep, 2022 1 commit
- [feature] new zero implementation (#1623) · 5be118f4
  HELSON authored Sep 24, 2022
  
  5be118f4
23 Sep, 2022 14 commits
- [autoparallel] Add pofo sequence annotation (#1637) · f9217336
  Boyuan Yao authored Sep 24, 2022
```
* [autoparallel] annotate pofo sequence

* [autoparallel] remove unused print

* [autoparallel] fix some code
```
  f9217336
- [fx/profiler] provide a table of summary. (#1634) · 04bbabee
  Super Daniel authored Sep 23, 2022
```
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.
```
  04bbabee
- [moe] initialize MoE groups by ProcessGroup (#1640) · 95c35f73
  HELSON authored Sep 23, 2022
  
  95c35f73
- [embeddings] cache option (#1635) · e57df803
  Jiarui Fang authored Sep 23, 2022
  
  e57df803
- [moe] fix moe bugs (#1633) · a088022e
  HELSON authored Sep 23, 2022
  
  a088022e
- [tensor] use communication autograd func (#1617) · 702dbc52
  YuliangLiu0306 authored Sep 23, 2022
```
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
```
  702dbc52
- [autoparallel] add elementwise handler (#1622) · c7ac0f4a
  YuliangLiu0306 authored Sep 23, 2022
```
* [autoparallel] add elementwise handler

* polish code

* polish code

* reduce skipped strategies range

* polish code
```
  c7ac0f4a
- [autoparallel] add embedding handler (#1620) · 3a462151
  YuliangLiu0306 authored Sep 23, 2022
  
  3a462151
- [autoparallel] protect bcast handler from invalid strategies (#1631) · 69448f64
  YuliangLiu0306 authored Sep 23, 2022
  
  69448f64
- [autoparallel] add layernorm handler (#1629) · 0c703189
  YuliangLiu0306 authored Sep 23, 2022
  
  0c703189
- [autoparallel] recover the merged node strategy index (#1613) · bf77d3ab
  YuliangLiu0306 authored Sep 23, 2022
  
  bf77d3ab
- [fx] Modify offload codegen (#1618) · d6b01feb
  Boyuan Yao authored Sep 23, 2022
```
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
```
  d6b01feb
- [hotfix] add recompile after graph manipulatation (#1621) · 9eae8554
  YuliangLiu0306 authored Sep 23, 2022
  
  9eae8554
- [fx/profiler] tuned the calculation of memory estimation (#1619) · d967779a
  Super Daniel authored Sep 23, 2022
```
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
```
  d967779a
22 Sep, 2022 2 commits
- [moe] fix MoE bugs (#1628) · f7f22487
  HELSON authored Sep 22, 2022
```
* remove forced FP32 modules

* correct no_shard-contexts' positions
```
  f7f22487
- [embedding] rollback for better FAW performance (#1625) · 38c68b5b
  Jiarui Fang authored Sep 22, 2022
  
  38c68b5b
21 Sep, 2022 1 commit
- [autoparallel] added new linear module handler (#1616) · d9251220
  Frank Lee authored Sep 21, 2022
  
  d9251220
20 Sep, 2022 9 commits
- [pipeline/chimera] test chimera | fix bug of initializing (#1615) · 170fa810
  Kirigaya Kazuto authored Sep 20, 2022
```
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing
```
  170fa810
- [embeddings] use cache_ratio instead of cuda_row_num (#1611) · 504ff1d1
  Jiarui Fang authored Sep 20, 2022
  
  504ff1d1
- [hotfix] got sliced types (#1614) · 6a8f8cc0
  YuliangLiu0306 authored Sep 20, 2022
  
  6a8f8cc0
- [autoparallel] added new node handler (#1612) · d397842f
  Frank Lee authored Sep 20, 2022
  
  d397842f
- [fx] PoC of runtime shape consistency application (#1607) · 7d1bb71d
  YuliangLiu0306 authored Sep 20, 2022
```
* [fx] PoC of runtime shape consistency application

* polish code
```
  7d1bb71d
- [autoparallel]add bcast matmul strategies (#1605) · 47b11c43
  YuliangLiu0306 authored Sep 20, 2022
  
  47b11c43
- [autoparallel] refactored the data structure for sharding strategy (#1610) · edb67cb3
  Frank Lee authored Sep 20, 2022
  
  edb67cb3
- [fx] Add pofo solver (#1608) · 933b6c63
  Boyuan Yao authored Sep 20, 2022
```
* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import
```
  933b6c63
- Automated submodule synchronization (#1609) · d32cf84c
  github-actions[bot] authored Sep 20, 2022
```
Co-authored-by: github-actions <github-actions@github.com>
```
  d32cf84c
19 Sep, 2022 2 commits

[workflow] deactivate conda environment before removing (#1606) · 725666d6
Frank Lee authored Sep 19, 2022

725666d6

[pipeline/chimera] reconstruct PipelineBase and Worker to support more... · edc9e419

Kirigaya Kazuto authored Sep 19, 2022

[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)

* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

edc9e419

16 Sep, 2022 3 commits
- fix move fp32 shards (#1604) · c9e8ce67
  ver217 authored Sep 16, 2022
  
  c9e8ce67
- [autoparallel] add bcast op handler (#1600) · eac1b793
  YuliangLiu0306 authored Sep 16, 2022
```
* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish
```
  eac1b793
- [autoparallel] added all non-bcast matmul strategies (#1603) · 3abf98a6
  Frank Lee authored Sep 16, 2022
  
  3abf98a6
15 Sep, 2022 3 commits
- [autoparallel] added strategy generator and bmm strategies (#1602) · db98b695
  Frank Lee authored Sep 15, 2022
  
  db98b695
- [embedding] updates some default parameters · a19eb809
  Jiarui Fang authored Sep 15, 2022
  
  a19eb809
- [fx/tuning] tune performance on rotor with meta info. (#1599) · cd5cf2bc
  Super Daniel authored Sep 15, 2022
  
  cd5cf2bc
14 Sep, 2022 5 commits

[fx] Add offload codegen (#1598) · a7cda6f5

Boyuan Yao authored Sep 14, 2022

* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes

a7cda6f5

[hotfix/rotor] fix variable names (#1597) · c8e9b2ad

Super Daniel authored Sep 14, 2022

* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

c8e9b2ad

[autoparallel] add reshape handler (#1594) · faa23b9d
YuliangLiu0306 authored Sep 14, 2022
```
* [autoparallel] add reshape handler

* polish code
```
faa23b9d
Automated submodule synchronization (#1596) · c938dda0
github-actions[bot] authored Sep 14, 2022
```
Co-authored-by: github-actions <github-actions@github.com>
```
c938dda0

[fx] provide an accurate estimation of memory. (#1587) · 5c494d45

Super Daniel authored Sep 14, 2022

* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

5c494d45