Commits · d3446892746ad7d69a5f69c9389b372e4d6eaaff · OpenDAS / ColossalAI

"examples/llm/benchmarks/disagg_multinode.py" did not exist on "ac13ed0676e308931b4d0c0cb01617d33ed571ee"

11 Mar, 2022 26 commits
- [profiler] primary memory tracer · d3446892
  Jie Zhu authored Mar 04, 2022
  
  d3446892
- update unit testing CI rules · dfc3fafe
  FrankLeeeee authored Mar 03, 2022
  
  dfc3fafe
- added compatibility CI and options for release ci · bbbfe9b2
  FrankLeeeee authored Feb 28, 2022
  
  bbbfe9b2
- added pypi publication CI and remove formatting CI · 115bcc0b
  FrankLeeeee authored Feb 28, 2022
  
  115bcc0b
- rename shared adam to sharded optim v2 · b105371a
  ver217 authored Mar 03, 2022
  
  b105371a
- fix master params dtype · 70814dc2
  ver217 authored Mar 03, 2022
  
  70814dc2
- add fp32 master params in sharded adam · 795210dd
  ver217 authored Mar 03, 2022
  
  795210dd
- add sharded adam · a109225b
  ver217 authored Mar 03, 2022
  
  a109225b
- polish license (#300) · 8f74fbd9
  Jiarui Fang authored Mar 03, 2022
```
* init shard param from shape tuple

* add more unitest for shard param
```
  8f74fbd9
- Polish sharded parameter (#297) · e17e92c5
  Jiarui Fang authored Mar 03, 2022
```
* init shard param from shape tuple

* add more unitest for shard param

* add more unittests to shareded param
```
  e17e92c5
- [zero] add sharded grad and refactor grad hooks for ShardedModel (#287) · 7aef75ca
  ver217 authored Mar 02, 2022
  
  7aef75ca
- fixed typo in ShardParam (#294) · 9afb5c8b
  Frank Lee authored Mar 02, 2022
  
  9afb5c8b
- added unit test for sharded optimizer (#293) · 27155b85
  Frank Lee authored Mar 02, 2022
```
* added unit test for sharded optimizer

* refactor for elegance
```
  27155b85
- added buffer sync to naive amp model wrapper (#291) · e17e54e3
  Frank Lee authored Mar 02, 2022
  
  e17e54e3
- add a common util for hooks registered on parameter. (#292) · 8d653af4
  Jiarui Fang authored Mar 02, 2022
  
  8d653af4
- bug fix: pass hook_list to engine (#273) · f867365a
  Jie Zhu authored Mar 02, 2022
```
* bug fix: pass hook_list to engine

* change parameter name
```
  f867365a
- Feature/zero (#279) · 5a560a06
  Jiarui Fang authored Mar 01, 2022
```
* add zero1 (#209)

* add zero1

* add test zero1

* update zero stage 1 develop (#212)

* Implement naive zero3 (#240)

* naive zero3 works well

* add zero3 param manager

* add TODOs in comments

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* fix bugs of hook and add unit tests (#252)

* add gather full param ctx

* fix sub module streams

* add offload

* fix bugs of hook and add unit tests

* polish code and add state dict hook

* fix bug

* update unit test

* refactor reconstructed zero code

* clip_grad support zero3 and add unit test

* add unit test for Zero3ParameterManager

* [WIP] initialize the shard param class

* [WIP] Yet another sharded model implementation (#274)

* [WIP] initialize the shard param class

* [WIP] Yes another implementation of shardModel. Using a better hook method.

* torch.concat -> torch.cat

* fix test_zero_level_1.py::test_zero_level_1 unitest

* remove deepspeed implementation and refactor for the reconstructed zero module

* polish zero dp unittests
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
```
  5a560a06
- add community group and update issue template(#271) · 08eccfe6
  binmakeswell authored Feb 28, 2022
  
  08eccfe6
- update experimental visualization (#253) · 3312d716
  Sze-qq authored Feb 28, 2022
  
  3312d716
- add Chinese README · 753035ed
  binmakeswell authored Feb 18, 2022
  
  753035ed
- Added TPExpert for special situation · 82023779
  1SAA authored Feb 27, 2022
  
  82023779
- Fixed parameter initialization in FFNExpert (#251) · 36b84772
  HELSON authored Feb 27, 2022
  
  36b84772
- fixed CI dataset directory; fixed import error of 2.5d accuracy (#255) · e13293bb
  アマデウス authored Feb 24, 2022
  
  e13293bb
- Optimized MoE layer and fixed some bugs; · 219df6e6
  1SAA authored Feb 18, 2022
```
Decreased moe tests;

Added FFNExperts and ViTMoE model
```
  219df6e6
- fixed padding index issue for vocab parallel embedding layers; updated 3D... · 3dba0705
  zbian authored Feb 17, 2022
```
fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial
```
  3dba0705
- update setup info (#233) · 24f8583c
  ver217 authored Feb 15, 2022
  
  24f8583c
15 Feb, 2022 8 commits
- Automated submodule synchronization · b9f8521f
  github-actions authored Feb 09, 2022
  
  b9f8521f
- fixed apex import (#227) · f5ca88ec
  Frank Lee authored Feb 14, 2022
  
  f5ca88ec
- updated readme and change log (#224) · eb3fda4c
  Frank Lee authored Feb 14, 2022
  
  eb3fda4c
- update setup and workflow (#222) · 578ea058
  ver217 authored Feb 14, 2022
  
  578ea058
- fixed mkdir conflict and align yapf config with flake (#220) · 3a1a9820
  Frank Lee authored Feb 14, 2022
  
  3a1a9820
- added flake8 config (#219) · 65e72983
  Frank Lee authored Feb 14, 2022
  
  65e72983
- moved env variables to global variables; (#215) · 9ee197d0
  アマデウス authored Feb 14, 2022
```
added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs
```
  9ee197d0
- updated github action for develop branch (#214) · b82d60be
  Frank Lee authored Feb 11, 2022
  
  b82d60be
04 Feb, 2022 2 commits
- Update github actions (#205) · 7d15ec7f
  BoxiangW authored Feb 04, 2022
  
  7d15ec7f
- Automated submodule synchronization (#203) · 5420809f
  github-actions[bot] authored Feb 04, 2022
```
Co-authored-by: github-actions <github-actions@github.com>
```
  5420809f
03 Feb, 2022 4 commits
- add changelog and contributing doc (#202) · fd570ab2
  Frank Lee authored Feb 03, 2022
  
  fd570ab2
- add code quality badge (#201) · 02f13fa9
  Frank Lee authored Feb 03, 2022
  
  02f13fa9
- fixed utils docstring and add example to readme (#200) · 812357d6
  Frank Lee authored Feb 03, 2022
  
  812357d6
- added github action to synchronize submodule commits automatically (#193) · b9a761b9
  Frank Lee authored Feb 03, 2022
  
  b9a761b9