Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
FastMoE
Commits
8f1f2ca54edf88bb4b0d1a4260967a8a8b3eef8e
Switch branch/tag
fastmoe
fmoe
05 Feb, 2021
5 commits
update instructions for megatron
· 59b27103
Rick Ho
authored
Feb 05, 2021
59b27103
sync in the whole world instead of mp world in megatron
· d6e7a429
Rick Ho
authored
Feb 05, 2021
d6e7a429
pass pylint
· f2040d9f
Rick Ho
authored
Feb 05, 2021
f2040d9f
support multiple pytorch versions prviate apis
· bf2fd0c0
Rick Ho
authored
Feb 05, 2021
bf2fd0c0
add functions to support checkpointing in megatron ddp
· 481f5c4f
Rick Ho
authored
Feb 05, 2021
481f5c4f
04 Feb, 2021
5 commits
adapt with pytorch 1.8.0 (deprecated 1.6.0)
· 15f98a10
Rick Ho
authored
Feb 04, 2021
15f98a10
setup pylint and write docs for functions
· 585604fe
Rick Ho
authored
Feb 04, 2021
585604fe
fix no grad after all-gather bug
· 56c1bd63
Rick Ho
authored
Feb 04, 2021
56c1bd63
use parallel label in gate
· d83234b0
Rick Ho
authored
Feb 04, 2021
d83234b0
ddp module for sophiscated hybrid parallel
· 67c667f2
Rick Ho
authored
Feb 04, 2021
67c667f2
03 Feb, 2021
2 commits
fix pure data parallel
· ae2c434e
Rick Ho
authored
Feb 03, 2021
ae2c434e
fmoefy
· 6b8d2f2e
Rick Ho
authored
Feb 03, 2021
6b8d2f2e
02 Feb, 2021
5 commits
fix bmm out shape
· 4b650671
Rick Ho
authored
Feb 02, 2021
4b650671
fix replica condition and minor optimizations
· dc3db673
Rick Ho
authored
Feb 02, 2021
dc3db673
remove debug output and todo for replicated mp input
· a8ecd3d7
Rick Ho
authored
Feb 02, 2021
a8ecd3d7
Optimize redundancy communication
· 01ae2d72
Sengxian
authored
Feb 02, 2021
01ae2d72
Format using black and add model_parallel_rank
· fdbac1df
Sengxian
authored
Feb 02, 2021
fdbac1df
01 Feb, 2021
3 commits
split more tests
· 01d9b418
Rick Ho
authored
Feb 01, 2021
01d9b418
complete test for reconstruction
· 22e1eb45
Rick Ho
authored
Feb 01, 2021
22e1eb45
swap local scatter and gather kernel functions
· d2039fc7
Rick Ho
authored
Feb 01, 2021
d2039fc7
29 Jan, 2021
5 commits
fix another bug to make global moe run
· 14c0eab4
Rick Ho
authored
Jan 29, 2021
14c0eab4
fix more bugs to make the layers run in the model
· 0fea2991
Rick Ho
authored
Jan 29, 2021
0fea2991
fix python bugs
· 6900f1de
Rick Ho
authored
Jan 29, 2021
6900f1de
reconstruct fmoe nn module
· 437afda2
Rick Ho
authored
Jan 29, 2021
437afda2
split fmoe functions
· 5e0af68d
Rick Ho
authored
Jan 29, 2021
5e0af68d
28 Jan, 2021
3 commits
fix scatter/gather bug to make it correct
· 952e3135
Rick Ho
authored
Jan 28, 2021
952e3135
make test run on nccl version, but fails in correctness
· 2d250fbf
Rick Ho
authored
Jan 28, 2021
2d250fbf
single node use torch cuda expert count
· a526f438
Rick Ho
authored
Jan 28, 2021
a526f438
27 Jan, 2021
1 commit
fix expert number
· bc8e8181
Rick Ho
authored
Jan 27, 2021
bc8e8181
26 Jan, 2021
1 commit
initial version to run with megatron
· 58e949cf
Rick Ho
authored
Jan 26, 2021
58e949cf
25 Jan, 2021
3 commits
basic megatron support frame
· d0f07ff7
Rick Ho
authored
Jan 25, 2021
d0f07ff7
pass test
· c5cfd5fb
Rick Ho
authored
Jan 25, 2021
c5cfd5fb
update fmoe files
· ed69591a
Rick Ho
authored
Jan 25, 2021
ed69591a