- 31 Jul, 2023 5 commits
-
-
LuGY authored
* support shard optimizer of zero * polish code * support sync grad manually
-
LuGY authored
* add state dict for zero * fix unit test * polish
-
LuGY authored
* allow passing process group to zero12 * union tp-zero and normal-zero * polish code
-
LuGY authored
* support no sync for zero1 plugin * polish * polish
-
LuGY authored
* refactor low level zero * fix zero2 and support cpu offload * avg gradient and modify unit test * refactor grad store, support layer drop * refactor bucket store, support grad accumulation * fix and update unit test of zero and ddp * compatible with tp, ga and unit test * fix memory leak and polish * add zero layer drop unittest * polish code * fix import err in unit test * support diffenert comm dtype, modify docstring style * polish code * test padding and fix * fix unit test of low level zero * fix pad recording in bucket store * support some models * polish
-
- 21 Jul, 2023 1 commit
-
-
Baizhou Zhang authored
* sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin
-
- 19 Jul, 2023 1 commit
-
-
Hongxin Liu authored
* [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version
-
- 18 Jul, 2023 1 commit
-
-
Cuiqing Li authored
* added softmax kernel * added qkv_kernel * added ops * adding tests * upload tets * fix tests * debugging * debugging tests * debugging * added * fixed errors * added softmax kernel * clean codes * added tests * update tests * update tests * added attention * add * fixed pytest checking * add cuda check * fix cuda version * fix typo
-
- 07 Jul, 2023 1 commit
-
-
Baizhou Zhang authored
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
-
- 04 Jul, 2023 31 commits
-
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
Frank Lee authored
* [shardformer] made tensor parallelism configurable * polish code
-
Frank Lee authored
* [shardformer] refactored some doc and api * polish code
-
Frank Lee authored
-
Frank Lee authored
-
Frank Lee authored
-
Frank Lee authored
-
Kun Lin authored
* first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth
-
jiangmingyan authored
* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix
-
Frank Lee authored
-
Frank Lee authored
* [test] fixed tests failed due to dtensor change * polish code
-
FoolPlayer authored
* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm
-
Frank Lee authored
-
FoolPlayer authored
* add linearconv1d test * add linearconv1d test
-
Frank Lee authored
* [shardformer] support module saving and loading * polish code
-
FoolPlayer authored
* support kit use for bert test * support kit test for gpt2
-
Frank Lee authored
-
Frank Lee authored
* [shardformer] adapted T5 and LLaMa test to use kit * polish code
-
FoolPlayer authored
* add gpt2 test and layer class refactor * add dropout in gpt2 policy
-
Frank Lee authored
-
Frank Lee authored
-
FoolPlayer authored
* fix bert downstream with new api * remove comment line
-
FoolPlayer authored
-
Frank Lee authored
* [shardformer] refactored embedding and dropout to parallel module * polish code
-
FoolPlayer authored
-
Frank Lee authored
* [shardformer] integrated linear 1D with dtensor * polish code
-
Frank Lee authored
-
FoolPlayer authored
* add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage * add downstream model of bert * remove unused code
-
wukong1992 authored
test t5
-
wukong1992 authored
adjust layer attr
-
FoolPlayer authored
* fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo
-