- 07 Apr, 2022 1 commit
-
-
HELSON authored
* adapt model weight initialization for methods in Pytorch nn.init
-
- 03 Apr, 2022 2 commits
-
-
Jiarui Fang authored
-
YuliangLiu0306 authored
-
- 02 Apr, 2022 2 commits
- 01 Apr, 2022 4 commits
-
-
HELSON authored
-
アマデウス authored
-
FredHuang99 authored
-
Jiarui Fang authored
-
- 31 Mar, 2022 3 commits
-
-
HELSON authored
* support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler
-
ver217 authored
-
Jiarui Fang authored
-
- 30 Mar, 2022 3 commits
-
-
ver217 authored
* hijack p.grad in sharded model * polish comments * polish comments
-
Jiarui Fang authored
-
Jiarui Fang authored
-
- 29 Mar, 2022 5 commits
-
-
HELSON authored
-
Liang Bowen authored
-
Jiarui Fang authored
-
ver217 authored
-
Jiarui Fang authored
-
- 28 Mar, 2022 4 commits
-
-
HELSON authored
only process module's own parameters in Zero context add zero hooks for all modules that contrain parameters gather parameters only belonging to module itself
-
Jiarui Fang authored
-
Jiarui Fang authored
-
Jiarui Fang authored
-
- 25 Mar, 2022 8 commits
-
-
LuGY authored
* [zero]added hybrid adam, removed loss scale of adam * remove useless code
-
Jiarui Fang authored
-
Frank Lee authored
-
Jiarui Fang authored
-
LuGY authored
-
Jiarui Fang authored
-
Jiarui Fang authored
-
Jiarui Fang authored
-
- 24 Mar, 2022 2 commits
-
-
Jiarui Fang authored
-
Jiarui Fang authored
-
- 23 Mar, 2022 2 commits
-
-
Jiarui Fang authored
-
ver217 authored
* sharded model supports reuse fp16 shard * rename variable * polish code * polish code * polish code
-
- 22 Mar, 2022 2 commits
-
-
ver217 authored
* sharded optim support hybrid cpu adam * update unit test * polish docstring
-
Jiarui Fang authored
* [zero] polish sharded param name * polish code * polish * polish code * polish * polsih * polish
-
- 21 Mar, 2022 2 commits
-
-
Jiarui Fang authored
-
HELSON authored
-