Commits · 5be118f405fccc0dcad3820ecff1f9d4d93c9a11 · OpenDAS / ColossalAI

24 Sep, 2022 1 commit
- [feature] new zero implementation (#1623) · 5be118f4
  HELSON authored Sep 24, 2022
  
  5be118f4
23 Sep, 2022 8 commits
- [moe] initialize MoE groups by ProcessGroup (#1640) · 95c35f73
  HELSON authored Sep 23, 2022
  
  95c35f73
- [moe] fix moe bugs (#1633) · a088022e
  HELSON authored Sep 23, 2022
  
  a088022e
- [tensor] use communication autograd func (#1617) · 702dbc52
  YuliangLiu0306 authored Sep 23, 2022
```
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
```
  702dbc52
- [autoparallel] add layernorm handler (#1629) · 0c703189
  YuliangLiu0306 authored Sep 23, 2022
  
  0c703189
- [autoparallel] recover the merged node strategy index (#1613) · bf77d3ab
  YuliangLiu0306 authored Sep 23, 2022
  
  bf77d3ab
- [fx] Modify offload codegen (#1618) · d6b01feb
  Boyuan Yao authored Sep 23, 2022
```
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
```
  d6b01feb
- [hotfix] add recompile after graph manipulatation (#1621) · 9eae8554
  YuliangLiu0306 authored Sep 23, 2022
  
  9eae8554
- [fx/profiler] tuned the calculation of memory estimation (#1619) · d967779a
  Super Daniel authored Sep 23, 2022
```
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
```
  d967779a
22 Sep, 2022 2 commits
- [moe] fix MoE bugs (#1628) · f7f22487
  HELSON authored Sep 22, 2022
```
* remove forced FP32 modules

* correct no_shard-contexts' positions
```
  f7f22487
- [embedding] rollback for better FAW performance (#1625) · 38c68b5b
  Jiarui Fang authored Sep 22, 2022
  
  38c68b5b
21 Sep, 2022 1 commit
- [autoparallel] added new linear module handler (#1616) · d9251220
  Frank Lee authored Sep 21, 2022
  
  d9251220
20 Sep, 2022 5 commits

[pipeline/chimera] test chimera | fix bug of initializing (#1615) · 170fa810

Kirigaya Kazuto authored Sep 20, 2022

* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

170fa810

[embeddings] use cache_ratio instead of cuda_row_num (#1611) · 504ff1d1
Jiarui Fang authored Sep 20, 2022

504ff1d1
[fx] PoC of runtime shape consistency application (#1607) · 7d1bb71d
YuliangLiu0306 authored Sep 20, 2022
```
* [fx] PoC of runtime shape consistency application

* polish code
```
7d1bb71d
[autoparallel]add bcast matmul strategies (#1605) · 47b11c43
YuliangLiu0306 authored Sep 20, 2022

47b11c43

[fx] Add pofo solver (#1608) · 933b6c63

Boyuan Yao authored Sep 20, 2022

* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import

933b6c63

19 Sep, 2022 1 commit

[pipeline/chimera] reconstruct PipelineBase and Worker to support more... · edc9e419

Kirigaya Kazuto authored Sep 19, 2022

[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)

* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

edc9e419

16 Sep, 2022 1 commit

[autoparallel] add bcast op handler (#1600) · eac1b793

YuliangLiu0306 authored Sep 16, 2022

* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish

eac1b793

14 Sep, 2022 3 commits

[fx] Add offload codegen (#1598) · a7cda6f5

Boyuan Yao authored Sep 14, 2022

* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes

a7cda6f5

[hotfix/rotor] fix variable names (#1597) · c8e9b2ad

Super Daniel authored Sep 14, 2022

* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

c8e9b2ad

[autoparallel] add reshape handler (#1594) · faa23b9d
YuliangLiu0306 authored Sep 14, 2022
```
* [autoparallel] add reshape handler

* polish code
```
faa23b9d

13 Sep, 2022 5 commits
- [autoparallel] refactored shape consistency to remove redundancy (#1591) · 27fe8af6
  Frank Lee authored Sep 13, 2022
```
* [autoparallel] refactored shape consistency to remove redundancy

* polish code

* polish code

* polish code
```
  27fe8af6
- [autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) · d164449d
  YuliangLiu0306 authored Sep 13, 2022
  
  d164449d
- [autoparallel] added solver option dataclass (#1588) · 219f66c5
  Frank Lee authored Sep 13, 2022
  
  219f66c5
- [autoparallel] adapt solver with resnet (#1583) · 82d4376c
  YuliangLiu0306 authored Sep 13, 2022
```
* [autoparallel]adapt solver with resnet

* polish code

* polish code
```
  82d4376c
- [embeddings] add already_split_along_rank flag for tablewise mode (#1584) · f3403ff9
  CsRic authored Sep 13, 2022
  
  f3403ff9
12 Sep, 2022 1 commit

[fx] Add nested checkpoint in activation checkpoint codegen (#1585) · f3687e4e

Boyuan Yao authored Sep 12, 2022

* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix

f3687e4e

08 Sep, 2022 1 commit
- [NFC] polish test component gpt code style (#1567) · e615cfc3
  アマデウス authored Sep 08, 2022
  
  e615cfc3
07 Sep, 2022 4 commits

[pipeline/tuning] improve dispatch performance both time and space cost (#1544) · 6159d454
Kirigaya Kazuto authored Sep 07, 2022

6159d454

[fx] provide a stable but not accurate enough version of profiler. (#1547) · 4f596932

Super Daniel authored Sep 07, 2022

* [fx] compute memory stat and flop count for MetaInfoProp.

* [fx] modify node attribute.

* [fx] modify ckpt_chen.

* [fx] fix compatibility.

* [fx] fix import error.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip if torch 1.11.0.

* [fx] recover MetaInfoProp support for PyTorch 1.11.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix import error.

4f596932

[autoparallel]add backward cost info into strategies (#1524) · 0908d0fc
YuliangLiu0306 authored Sep 07, 2022

0908d0fc
[autoparallel] change the merge node logic (#1533) · 44c866a3
YuliangLiu0306 authored Sep 07, 2022

44c866a3

06 Sep, 2022 1 commit
- [embedding] polish parallel embedding tablewise (#1545) · 64169f3e
  Jiarui Fang authored Sep 06, 2022
  
  64169f3e
05 Sep, 2022 1 commit
- [embedding] freq_aware_embedding: add small functions for caller application (#1537) · 964123ae
  CsRic authored Sep 05, 2022
  
  964123ae
02 Sep, 2022 1 commit

[fx] Modify solver linearize and add corresponding test (#1531) · 56159049

Boyuan Yao authored Sep 02, 2022

* [fx] modify solver linearize and add test

* [fx] add torch11 test of linearize but skip it

* [fx] remove some unused imports

56159049

01 Sep, 2022 4 commits

[fx] add test for meta tensor. (#1527) · 7dc53237

Super Daniel authored Sep 01, 2022

* [fx] add test for meta tensor.

* [fx] add test for meta tensor.

* [fx] add test for meta tensor.

* [fx] add test for meta tensor.

* [fx] fix error.

7dc53237

[fx]patch nn.functional convolution (#1528) · 4b3d6cae
YuliangLiu0306 authored Sep 01, 2022

4b3d6cae
[embedding] add tablewise sharding for FAW (#1526) · 5156d5b4
CsRic authored Sep 01, 2022

5156d5b4

[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local... · f1e18362

Kirigaya Kazuto authored Sep 01, 2022

[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)

* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] skip process group test

* [pipeline/pipleline_process_group] remove test named function

f1e18362