Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ColossalAI
Commits
8432dc70
Unverified
Commit
8432dc70
authored
Apr 01, 2022
by
ver217
Committed by
GitHub
Apr 01, 2022
Browse files
polish moe docsrting (#618)
parent
c5b488ed
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
6 deletions
+13
-6
colossalai/nn/layer/moe/layers.py
colossalai/nn/layer/moe/layers.py
+10
-3
colossalai/nn/layer/moe/utils.py
colossalai/nn/layer/moe/utils.py
+3
-3
No files found.
colossalai/nn/layer/moe/layers.py
View file @
8432dc70
...
@@ -320,15 +320,22 @@ class MoeModule(nn.Module):
...
@@ -320,15 +320,22 @@ class MoeModule(nn.Module):
capacity_factor_eval (float, optional): Capacity factor in routing during evaluation
capacity_factor_eval (float, optional): Capacity factor in routing during evaluation
min_capacity (int, optional): The minimum number of the capacity of each expert
min_capacity (int, optional): The minimum number of the capacity of each expert
noisy_policy (str, optional): The policy of noisy function. Now we have 'Jitter' and 'Gaussian'.
noisy_policy (str, optional): The policy of noisy function. Now we have 'Jitter' and 'Gaussian'.
'Jitter' can be found in Switch Transformer paper
(https://arxiv.org/abs/2101.03961)
.
'Jitter' can be found in
`
Switch Transformer paper
`_
.
'Gaussian' can be found in ViT-MoE paper
(https://arxiv.org/abs/2106.05974)
.
'Gaussian' can be found in
`
ViT-MoE paper
`_
.
drop_tks (bool, optional): Whether drops tokens in evaluation
drop_tks (bool, optional): Whether drops tokens in evaluation
use_residual (bool, optional): Makes this MoE layer a Residual MoE.
use_residual (bool, optional): Makes this MoE layer a Residual MoE.
More information can be found in Microsoft paper
(https://arxiv.org/abs/2201.05596)
.
More information can be found in
`
Microsoft paper
`_
.
residual_instance (nn.Module, optional): The instance of residual module in Resiual MoE
residual_instance (nn.Module, optional): The instance of residual module in Resiual MoE
expert_instance (MoeExperts, optional): The instance of experts module in MoeLayer
expert_instance (MoeExperts, optional): The instance of experts module in MoeLayer
expert_cls (Type[nn.Module], optional): The class of each expert when no instance is given
expert_cls (Type[nn.Module], optional): The class of each expert when no instance is given
expert_args (optional): The args of expert when no instance is given
expert_args (optional): The args of expert when no instance is given
.. _Switch Transformer paper:
https://arxiv.org/abs/2101.03961
.. _ViT-MoE paper:
https://arxiv.org/abs/2106.05974
.. _Microsoft paper:
https://arxiv.org/abs/2201.05596
"""
"""
def
__init__
(
self
,
def
__init__
(
self
,
...
...
colossalai/nn/layer/moe/utils.py
View file @
8432dc70
...
@@ -14,8 +14,8 @@ class ForceFP32Parameter(torch.nn.Parameter):
...
@@ -14,8 +14,8 @@ class ForceFP32Parameter(torch.nn.Parameter):
class
NormalNoiseGenerator
:
class
NormalNoiseGenerator
:
"""Generates a random noisy mask for logtis tensor.
"""Generates a random noisy mask for logtis tensor.
All noise is generated from a normal distribution (0, 1 / E^2), where
All noise is generated from a normal distribution
:math:`
(0, 1 / E^2)
`
, where
E = the number of experts.
`
E = the number of experts
`
.
Args:
Args:
num_experts (int): The number of experts.
num_experts (int): The number of experts.
...
@@ -34,7 +34,7 @@ class NormalNoiseGenerator:
...
@@ -34,7 +34,7 @@ class NormalNoiseGenerator:
class
UniformNoiseGenerator
:
class
UniformNoiseGenerator
:
"""Generates a random noisy mask for logtis tensor.
"""Generates a random noisy mask for logtis tensor.
copied from mesh tensorflow:
copied from mesh tensorflow:
Multiply values by a random number between 1-epsilon and 1+epsilon.
Multiply values by a random number between
:math:`
1-epsilon
`
and
:math:`
1+epsilon
`
.
Makes models more resilient to rounding errors introduced by bfloat16.
Makes models more resilient to rounding errors introduced by bfloat16.
This seems particularly important for logits.
This seems particularly important for logits.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment