- 05 Sep, 2023 5 commits
-
-
Hongxin Liu authored
* [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements
-
Hongxin Liu authored
* [legacy] move trainer to legacy * [doc] update docs related to trainer * [test] ignore legacy test
-
Hongxin Liu authored
-
Hongxin Liu authored
* [zero] add method to update master params * [zero] update zero plugin * [plugin] update low level zero plugin
-
Hongxin Liu authored
-
- 04 Sep, 2023 2 commits
-
-
Baizhou Zhang authored
-
Jianghai authored
* pytree test * test bert * test bert * test bert * revise * add register * add register
-
- 01 Sep, 2023 3 commits
-
-
Hongxin Liu authored
-
LuGY authored
* fix zero ckptio with offload * fix load device * saved tensors in ckpt should be on CPU * fix unit test * fix unit test * add clear cache * save memory for CI
-
Baizhou Zhang authored
* hybrid plugin support huggingface from_pretrained * add huggingface compatibility tests * add folder cleaning * fix bugs
-
- 31 Aug, 2023 2 commits
-
-
Baizhou Zhang authored
* implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp
-
Baizhou Zhang authored
-
- 30 Aug, 2023 2 commits
-
-
flybird11111 authored
* [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1
-
flybird11111 authored
* [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix
-
- 29 Aug, 2023 2 commits
-
-
Bin Jia authored
* add overlap support for gpt2 * remove unused code * remove unused code
-
Baizhou Zhang authored
-
- 28 Aug, 2023 2 commits
- 25 Aug, 2023 3 commits
-
-
Baizhou Zhang authored
* add APIs * implement save_sharded_model * add test for hybrid checkpointio * implement naive loading for sharded model * implement efficient sharded model loading * open a new file for hybrid checkpoint_io * small fix * fix circular importing * fix docstring * arrange arguments and apis * small fix
-
flybird11111 authored
* [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks * [Test] test ci * test ci * test ci * test ci * test ci * test ci * test ci * fix
-
LuGY authored
* support gradient accumulation with zero2 * fix type
-
- 24 Aug, 2023 2 commits
-
-
flybird11111 authored
* [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks
-
Hongxin Liu authored
* [gemini] remove distributed-related part from colotensor (#4379) * [gemini] remove process group dependency * [gemini] remove tp part from colo tensor * [gemini] patch inplace op * [gemini] fix param op hook and update tests * [test] remove useless tests * [test] remove useless tests * [misc] fix requirements * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [misc] update requirements * [gemini] refactor gemini optimizer and gemini ddp (#4398) * [gemini] update optimizer interface * [gemini] renaming gemini optimizer * [gemini] refactor gemini ddp class * [example] update gemini related example * [example] update gemini related example * [plugin] fix gemini plugin args * [test] update gemini ckpt tests * [gemini] fix checkpoint io * [example] fix opt example requirements * [example] fix opt example * [example] fix opt example * [example] fix opt example * [gemini] add static placement policy (#4443) * [gemini] add static placement policy * [gemini] fix param offload * [test] update gemini tests * [plugin] update gemini plugin * [plugin] update gemini plugin docstr * [misc] fix flash attn requirement * [test] fix gemini checkpoint io test * [example] update resnet example result (#4457) * [example] update bert example result (#4458) * [doc] update gemini doc (#4468) * [example] update gemini related examples (#4473) * [example] update gpt example * [example] update dreambooth example * [example] update vit * [example] update opt * [example] update palm * [example] update vit and opt benchmark * [hotfix] fix bert in model zoo (#4480) * [hotfix] fix bert in model zoo * [test] remove chatglm gemini test * [test] remove sam gemini test * [test] remove vit gemini test * [hotfix] fix opt tutorial example (#4497) * [hotfix] fix opt tutorial example * [hotfix] fix opt tutorial example
-
- 23 Aug, 2023 1 commit
-
-
Jianghai authored
-
- 22 Aug, 2023 2 commits
-
-
flybird11111 authored
* [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix
-
Jianghai authored
-
- 21 Aug, 2023 1 commit
-
-
Baizhou Zhang authored
* support tp+zero/input type cast for hybridplugin * add tp+zero tests * fix bucket arguments
-
- 18 Aug, 2023 2 commits
-
-
Jianghai authored
* add some base tests and policies * finish whisper base model * add conditional generation * finish basic tests * whisper * finish whisper * finish whisper * del useless whisper test * fix * add argmin to replace * finish revision
-
Bin Jia authored
* support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file
-
- 16 Aug, 2023 5 commits
-
-
LuGY authored
* support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd
-
Hongxin Liu authored
* [test] remove cpu marker * [test] remove gpu marker * [test] update pytest markers * [ci] update unit test ci
-
Baizhou Zhang authored
* support DDP for HybridPlugin/add tp+dp tests * add docstring for HybridParallelPlugin
-
Bin Jia authored
* [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384) * [sequence parallel] add sequence parallel linear col/row support (#4336) * add sequence parallel linear col/row support * add annotation * add annotation * add support for gpt2 fused qkv linear layer * support sequence parallel in GPT2 * add docstring and note * add requirments * remove unused flash-attb * modify flash attn test * modify flash attn setting * modify flash attn code * add assert before divide, rename forward function * [shardformer/test] fix gpt2 test with seq-parallel * [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401) * overlap gather input / grad computing during col backward * modify test for overlap * simplify code * fix code and modify cuda stream synchronize * [shardformer/sequence parallel] polish code
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
- 15 Aug, 2023 6 commits
-
-
Hongxin Liu authored
-
flybird11111 authored
[shardformer] update bloom/llama/vit/chatglm tests [shardformer] update opt tests [shardformer] update opt tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests
-
flybird11111 authored
* [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer]update t5 to use all optimizations
-
flybird11111 authored
[shardformer] update tests for all optimization
-
Baizhou Zhang authored
* rewrite opt tests * rewrite llama tests * rewrite bloom & vit tests * rewrite chatglm tests * fix LinearCol for classfiers * add judge for other tp layers, fix lazy init in util
-
flybird11111 authored
* [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer] gpt2 tests fix
-