Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ColossalAI
Commits
8432dc70
"tests/git@developer.sourcefind.cn:OpenDAS/colossalai.git" did not exist on "21aa5de00b6138c019fae5f58024f2aff6f97a3a"
Unverified
Commit
8432dc70
authored
Apr 01, 2022
by
ver217
Committed by
GitHub
Apr 01, 2022
Browse files
polish moe docsrting (#618)
parent
c5b488ed
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
6 deletions
+13
-6
colossalai/nn/layer/moe/layers.py
colossalai/nn/layer/moe/layers.py
+10
-3
colossalai/nn/layer/moe/utils.py
colossalai/nn/layer/moe/utils.py
+3
-3
No files found.
colossalai/nn/layer/moe/layers.py
View file @
8432dc70
...
...
@@ -320,15 +320,22 @@ class MoeModule(nn.Module):
capacity_factor_eval (float, optional): Capacity factor in routing during evaluation
min_capacity (int, optional): The minimum number of the capacity of each expert
noisy_policy (str, optional): The policy of noisy function. Now we have 'Jitter' and 'Gaussian'.
'Jitter' can be found in Switch Transformer paper
(https://arxiv.org/abs/2101.03961)
.
'Gaussian' can be found in ViT-MoE paper
(https://arxiv.org/abs/2106.05974)
.
'Jitter' can be found in
`
Switch Transformer paper
`_
.
'Gaussian' can be found in
`
ViT-MoE paper
`_
.
drop_tks (bool, optional): Whether drops tokens in evaluation
use_residual (bool, optional): Makes this MoE layer a Residual MoE.
More information can be found in Microsoft paper
(https://arxiv.org/abs/2201.05596)
.
More information can be found in
`
Microsoft paper
`_
.
residual_instance (nn.Module, optional): The instance of residual module in Resiual MoE
expert_instance (MoeExperts, optional): The instance of experts module in MoeLayer
expert_cls (Type[nn.Module], optional): The class of each expert when no instance is given
expert_args (optional): The args of expert when no instance is given
.. _Switch Transformer paper:
https://arxiv.org/abs/2101.03961
.. _ViT-MoE paper:
https://arxiv.org/abs/2106.05974
.. _Microsoft paper:
https://arxiv.org/abs/2201.05596
"""
def
__init__
(
self
,
...
...
colossalai/nn/layer/moe/utils.py
View file @
8432dc70
...
...
@@ -14,8 +14,8 @@ class ForceFP32Parameter(torch.nn.Parameter):
class
NormalNoiseGenerator
:
"""Generates a random noisy mask for logtis tensor.
All noise is generated from a normal distribution (0, 1 / E^2), where
E = the number of experts.
All noise is generated from a normal distribution
:math:`
(0, 1 / E^2)
`
, where
`
E = the number of experts
`
.
Args:
num_experts (int): The number of experts.
...
...
@@ -34,7 +34,7 @@ class NormalNoiseGenerator:
class
UniformNoiseGenerator
:
"""Generates a random noisy mask for logtis tensor.
copied from mesh tensorflow:
Multiply values by a random number between 1-epsilon and 1+epsilon.
Multiply values by a random number between
:math:`
1-epsilon
`
and
:math:`
1+epsilon
`
.
Makes models more resilient to rounding errors introduced by bfloat16.
This seems particularly important for logits.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment