- 15 Aug, 2023 9 commits
-
-
ver217 authored
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt
-
Hongxin Liu authored
* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager
-
Jianghai authored
* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt
-
Hongxin Liu authored
* [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import
-
Hongxin Liu authored
* [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict
-
Hongxin Liu authored
* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager
-
Hongxin Liu authored
* [cluster] add process group mesh * [test] add process group mesh test * force sync
-
- 11 Aug, 2023 1 commit
-
-
LuGY authored
* improve stablility of zero * fix wrong index * add record stream
-
- 09 Aug, 2023 1 commit
-
-
flybird1111 authored
Updated coloattention tests of checking outputs and gradients
-
- 04 Aug, 2023 2 commits
-
-
flybird1111 authored
fixed an import error
-
flybird1111 authored
Improved ColoAttention interface to support flash attention 2. Solved #4322
-
- 01 Aug, 2023 1 commit
-
-
Hongxin Liu authored
* [test] remove legacy zero test * [test] remove lazy distribute test * [test] remove outdated checkpoint io
-
- 31 Jul, 2023 5 commits
-
-
LuGY authored
* support shard optimizer of zero * polish code * support sync grad manually
-
LuGY authored
* add state dict for zero * fix unit test * polish
-
LuGY authored
* allow passing process group to zero12 * union tp-zero and normal-zero * polish code
-
LuGY authored
* support no sync for zero1 plugin * polish * polish
-
LuGY authored
* refactor low level zero * fix zero2 and support cpu offload * avg gradient and modify unit test * refactor grad store, support layer drop * refactor bucket store, support grad accumulation * fix and update unit test of zero and ddp * compatible with tp, ga and unit test * fix memory leak and polish * add zero layer drop unittest * polish code * fix import err in unit test * support diffenert comm dtype, modify docstring style * polish code * test padding and fix * fix unit test of low level zero * fix pad recording in bucket store * support some models * polish
-
- 21 Jul, 2023 1 commit
-
-
Baizhou Zhang authored
* sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin
-
- 19 Jul, 2023 1 commit
-
-
Hongxin Liu authored
* [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version
-
- 18 Jul, 2023 1 commit
-
-
Cuiqing Li authored
* added softmax kernel * added qkv_kernel * added ops * adding tests * upload tets * fix tests * debugging * debugging tests * debugging * added * fixed errors * added softmax kernel * clean codes * added tests * update tests * update tests * added attention * add * fixed pytest checking * add cuda check * fix cuda version * fix typo
-
- 07 Jul, 2023 1 commit
-
-
Baizhou Zhang authored
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
-
- 04 Jul, 2023 17 commits
-
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
Frank Lee authored
* [shardformer] made tensor parallelism configurable * polish code
-
Frank Lee authored
* [shardformer] refactored some doc and api * polish code
-
Frank Lee authored
-
Frank Lee authored
-
Frank Lee authored
-
Frank Lee authored
-
Kun Lin authored
* first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth
-
jiangmingyan authored
* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix
-
Frank Lee authored
-
Frank Lee authored
* [test] fixed tests failed due to dtensor change * polish code
-
FoolPlayer authored
* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm
-
Frank Lee authored
-
FoolPlayer authored
* add linearconv1d test * add linearconv1d test
-
Frank Lee authored
* [shardformer] support module saving and loading * polish code
-
FoolPlayer authored
* support kit use for bert test * support kit test for gpt2
-
Frank Lee authored
-