@@ -139,6 +140,7 @@ class BCEWithLogitsLoss(Layer):
classCrossEntropyLoss(Layer):
r"""
By default, this operator implements the cross entropy loss function with softmax. This function
combines the calculation of the softmax operation and the cross entropy loss function
to provide a more numerically stable computing.
...
...
@@ -251,60 +253,35 @@ class CrossEntropyLoss(Layer):
Parameters:
- **weight** (Tensor, optional)
a manual rescaling weight given to each class.
weight (Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size C and the data type is float32, float64.
Default is ``'None'`` .
- **ignore_index** (int64, optional)
Specifies a target value that is ignored
ignore_index (int64, optional): Specifies a target value that is ignored
and does not contribute to the loss. A negative value means that no label
value needs to be ignored. Only valid when soft_label = False.
Default is ``-100`` .
- **reduction** (str, optional)
Indicate how to average the loss by batch_size,
reduction (str, optional): Indicate how to average the loss by batch_size,
the candicates are ``'none'`` | ``'mean'`` | ``'sum'``.
If :attr:`reduction` is ``'mean'``, the reduced mean loss is returned;
If :attr:`size_average` is ``'sum'``, the reduced sum loss is returned.
If :attr:`reduction` is ``'none'``, the unreduced loss is returned.
Default is ``'mean'``.
- **soft_label** (bool, optional)
Indicate whether label is soft.
soft_label (bool, optional): Indicate whether label is soft.
If soft_label=False, the label is hard. If soft_label=True, the label is soft.
Default is ``False``.
- **axis** (int, optional)
The index of dimension to perform softmax calculations.
axis (int, optional): The index of dimension to perform softmax calculations.
It should be in range :math:`[-1, rank - 1]`, where :math:`rank` is the number
of dimensions of input :attr:`input`.
Default is ``-1`` .
- **use_softmax** (bool, optional)
Indicate whether compute softmax before cross_entropy.
use_softmax (bool, optional): Indicate whether compute softmax before cross_entropy.
Default is ``True``.
- **name** (str, optional)
The name of the operator. Default is ``None`` .
name (str, optional): The name of the operator. Default is ``None`` .
For more information, please refer to :ref:`api_guide_Name` .
Shape:
- **input** (Tensor)
Input tensor, the data type is float32, float64. Shape is
:math:`[N_1, N_2, ..., N_k, C]`, where C is number of classes , ``k >= 1`` .
- **input** (Tensor), the data type is float32, float64. Shape is
:math:`[N_1, N_2, ..., N_k, C]`, where C is number of classes , ``k >= 1`` .
Note:
1. when use_softmax=True, it expects unscaled logits. This operator should not be used with the
...
...
@@ -312,7 +289,6 @@ class CrossEntropyLoss(Layer):
2. when use_softmax=False, it expects the output of softmax operator.
- **label** (Tensor)
1. If soft_label=False, the shape is
...
...
@@ -322,15 +298,10 @@ class CrossEntropyLoss(Layer):
2. If soft_label=True, the shape and data type should be same with ``input`` ,
and the sum of the labels for each sample should be 1.
- **output** (Tensor)
Return the softmax cross_entropy loss of ``input`` and ``label``.
The data type is the same as input.
If :attr:`reduction` is ``'mean'`` or ``'sum'`` , the dimension of return value is ``1``.
If :attr:`reduction` is ``'none'``:
- **output** (Tensor), Return the softmax cross_entropy loss of ``input`` and ``label``.
The data type is the same as input.
If :attr:`reduction` is ``'mean'`` or ``'sum'`` , the dimension of return value is ``1``.
If :attr:`reduction` is ``'none'``:
1. If soft_label = False, the dimension of return value is the same with ``label`` .
...
...
@@ -634,6 +605,7 @@ class MSELoss(Layer):
classL1Loss(Layer):
r"""
Construct a callable object of the ``L1Loss`` class.
The L1Loss layer calculates the L1 Loss of ``input`` and ``label`` as follows.
...
...
@@ -663,11 +635,11 @@ class L1Loss(Layer):
name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
Shape:
input (Tensor): The input tensor. The shapes is [N, *], where N is batch size and `*` means any number of additional dimensions. It's data type should be float32, float64, int32, int64.
label (Tensor): label. The shapes is [N, *], same shape as ``input`` . It's data type should be float32, float64, int32, int64.
output (Tensor): The L1 Loss of ``input`` and ``label``.
If `reduction` is ``'none'``, the shape of output loss is [N, *], the same as ``input`` .
If `reduction` is ``'mean'`` or ``'sum'``, the shape of output loss is [1].
- input (Tensor): The input tensor. The shapes is ``[N, *]``, where N is batch size and `*` means any number of additional dimensions. It's data type should be float32, float64, int32, int64.
- label (Tensor): label. The shapes is ``[N, *]``, same shape as ``input`` . It's data type should be float32, float64, int32, int64.
- output (Tensor): The L1 Loss of ``input`` and ``label``.
If `reduction` is ``'none'``, the shape of output loss is ``[N, *]``, the same as ``input`` .
If `reduction` is ``'mean'`` or ``'sum'``, the shape of output loss is [1].
Examples:
.. code-block:: python
...
...
@@ -692,6 +664,7 @@ class L1Loss(Layer):
print(output)
# [[0.20000005 0.19999999]
# [0.2 0.79999995]]
"""
def__init__(self,reduction='mean',name=None):
...
...
@@ -712,6 +685,7 @@ class L1Loss(Layer):
classBCELoss(Layer):
"""
This interface is used to construct a callable object of the ``BCELoss`` class.
The BCELoss layer measures the binary_cross_entropy loss between input predictions ``input``
and target labels ``label`` . The binary_cross_entropy loss can be described as:
...
...
@@ -755,14 +729,14 @@ class BCELoss(Layer):
For more information, please refer to :ref:`api_guide_Name`.
Shape:
input (Tensor): 2-D tensor with shape: [N, *], N is batch_size, `*` means
number of additional dimensions. The input ``input`` should always
be the output of sigmod. Available dtype is float32, float64.
label (Tensor): 2-D tensor with the same shape as ``input``. The target
labels which values should be numbers between 0 and 1. Available
dtype is float32, float64.
output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
same as ``input`` , else the shape of output is scalar.
- input (Tensor): 2-D tensor with shape: ``[N, *]``, N is batch_size, `*` means
number of additional dimensions. The input ``input`` should always
be the output of sigmod. Available dtype is float32, float64.
- label (Tensor): 2-D tensor with the same shape as ``input``. The target
labels which values should be numbers between 0 and 1. Available
dtype is float32, float64.
- output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
same as ``input`` , else the shape of output is scalar.
Returns:
A callable object of BCELoss.
...
...
@@ -855,7 +829,7 @@ class NLLLoss(Layer):
if `reduction` is ``'sum'``, the reduced sum loss is returned;
if `reduction` is ``'none'``, no reduction will be apllied.
Default is ``'mean'``.
name (str, optional): For details, please refer to :ref:`api_guide_Name`. Generally, no setting is required. Default: None.
name (str, optional): For details, please refer to :ref:`api_guide_Name`. Generally, no setting is required. Default: None.
Shape:
- input (Tensor): Input tensor, the shape is :math:`[N, C]`, `C` is the number of classes.
...
...
@@ -914,6 +888,7 @@ class NLLLoss(Layer):
classKLDivLoss(Layer):
r"""
Generate a callable object of 'KLDivLoss' to calculate the
Kullback-Leibler divergence loss between Input(X) and
Input(Target). Notes that Input(X) is the log-probability
...
...
@@ -933,14 +908,10 @@ class KLDivLoss(Layer):
Default is ``'mean'``.
Shape:
- input (Tensor): (N, *), where * means, any number of additional dimensions.
- label (Tensor): (N, *), same shape as input.
- input (Tensor): ``(N, *)``, where ``*`` means, any number of additional dimensions.
- label (Tensor): ``(N, *)``, same shape as input.
- output (Tensor): tensor with shape: [1] by default.
Examples:
.. code-block:: python
...
...
@@ -970,6 +941,7 @@ class KLDivLoss(Layer):
kldiv_criterion = nn.KLDivLoss(reduction='none')
pred_loss = kldiv_criterion(x, target)
# shape=[5, 20]
"""
def__init__(self,reduction='mean'):
...
...
@@ -1720,6 +1692,7 @@ class TripletMarginLoss(Layer):
classSoftMarginLoss(Layer):
r"""
Creates a criterion that measures a two-class soft margin loss between input predictions ``input``
and target labels ``label`` . It can be described as:
...
...
@@ -1738,17 +1711,14 @@ class SoftMarginLoss(Layer):
name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
Shapes:
Input (Tensor): The input tensor with shape: [N, *],
N is batch_size, `*` means any number of additional dimensions. The ``input`` ranges from -inf to inf
Available dtype is float32, float64.
Label (Tensor): The target labels tensor with the same shape as
``input``. The target labels which values should be numbers -1 or 1.
Available dtype is int32, int64, float32, float64.
Output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
same as ``input`` , else the shape of output is [1].
- Input (Tensor): The input tensor with shape: ``[N, *]``,
N is batch_size, `*` means any number of additional dimensions. The ``input`` ranges from -inf to inf
Available dtype is float32, float64.
- Label (Tensor): The target labels tensor with the same shape as
``input``. The target labels which values should be numbers -1 or 1.
Available dtype is int32, int64, float32, float64.
- Output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
same as ``input`` , else the shape of output is [1].
Returns:
A callable object of SoftMarginLoss.
...
...
@@ -1780,6 +1750,7 @@ class SoftMarginLoss(Layer):
@@ -1661,6 +1661,7 @@ class MultiplicativeDecay(LRScheduler):
classOneCycleLR(LRScheduler):
r"""
Sets the learning rate according to the one cycle learning rate scheduler.
The scheduler adjusts the learning rate from an initial learning rate to the maximum learning rate and then
from that maximum learning rate to the minimum learning rate, which is much less than the initial learning rate.
...
...
@@ -1674,22 +1675,25 @@ class OneCycleLR(LRScheduler):
Also note that you should update learning rate each step.
Args:
max_learning_rate (float): The maximum learning rate. It is a python float number.
Functionally, it defines the initial learning rate by ``divide_factor`` .
max_learning_rate (float): The maximum learning rate. It is a python float number. Functionally, it defines the initial learning rate by ``divide_factor`` .
total_steps (int): Number of total training steps.
divide_factor (float): Initial learning rate will be determined by initial_learning_rate = max_learning_rate / divide_factor. Default: 25.
divide_factor (float, optional): Initial learning rate will be determined by initial_learning_rate = max_learning_rate / divide_factor. Default: 25.
end_learning_rate (float, optional): The minimum learning rate during training, it should be much less than initial learning rate.
phase_pct (float): The percentage of total steps which used to increasing learning rate. Default: 0.3.
anneal_strategy (str, optional): Strategy of adjusting learning rate.'cos' for cosine annealing,
'linear' for linear annealing. Default: 'cos'.
anneal_strategy (str, optional): Strategy of adjusting learning rate.'cos' for cosine annealing, 'linear' for linear annealing. Default: 'cos'.
three_phase (bool, optional): Whether to use three phase.
If ``True``:
1. The learning rate will first increase from initial learning rate to maximum learning rate.
2. Then it will decrease to initial learning rate. Number of step in this phase is the same as the one in first phase.
3. Finally, it will decrease to minimum learning rate which is much less than initial learning rate.
If ``False``:
1. The learning rate will increase to maximum learning rate.
2. Then it will directly decrease to minimum learning rate.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool, optional): If ``True``, prints a message to stdout for each update. Default: ``False`` .
...
...
@@ -1741,6 +1745,7 @@ class OneCycleLR(LRScheduler):
},
fetch_list=loss.name)
scheduler.step() # You should update learning rate each step
- The tensor dimensions are labeled using uncased English letters. E.g., `ijk`
relates to a three dimensional tensor whose dimensions are labeled i, j, and k.
relates to a three dimensional tensor whose dimensions are labeled i, j, and k.
- The equation is `,` separated into terms, each being a distinct input's
dimension label string.
dimension label string.
- Ellipsis `...` enables broadcasting by automatically converting the unlabeled
dimensions into broadcasting dimensions.
dimensions into broadcasting dimensions.
- Singular labels are called free labels, duplicate are dummy labels. Dummy labeled
dimensions will be reduced and removed in the output.
- Output labels can be explicitly specified on the right hand side of `->` or omitted.
In the latter case, the output labels will be inferred from the input labels.
dimensions will be reduced and removed in the output.
- Output labels can be explicitly specified on the right hand side of `->` or omitted. In the latter case, the output labels will be inferred from the input labels.
- Inference of output labels
- Broadcasting label `...`, if present, is put on the leftmost position.
- Free labels are reordered alphabetically and put after `...`.
- On explicit output labels
- If broadcasting is enabled, then `...` must be present.
- The output labels can be an empty, an indication to output as a scalar
the sum over the original output.
the sum over the original output.
- Non-input labels are invalid.
- Duplicate labels are invalid.
- For any dummmy label which is present for the output, it's promoted to
a free label.
- For any dummy label which is present for the output, it's promoted to
a free label.
- For any free label which is not present for the output, it's lowered to
a dummy label.
a dummy label.
- Examples
- '...ij, ...jk', where i and k are free labels, j is dummy. The output label
string is '...ik'
- 'ij -> i', where i is a free label and j is a dummy label.
string is '...ik'
- 'ij -> i', where i is a free label and j is a dummy label.
- '...ij, ...jk -> ...ijk', where i, j and k are all free labels.
- '...ij, ...jk -> ij', an invalid equation since `...` is not present for