Commit 8835d93c authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Standardize on 'teacher forcing' rather than 'input feeding' which is… (#769)

Summary:
Input feeding generally refers to a slightly different concept
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769

Differential Revision: D16491898

Pulled By: myleott

fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287
parent 3d764a3d
......@@ -285,7 +285,7 @@ following contents::
max_source_positions=self.args.max_positions,
max_target_positions=1,
# Since our target is a single class label, there's no need for
# input feeding. If we set this to ``True`` then our Model's
# teacher forcing. If we set this to ``True`` then our Model's
# ``forward()`` method would receive an additional argument called
# *prev_output_tokens* that would contain a shifted version of the
# target sequence.
......
......@@ -125,9 +125,9 @@ Decoder
Our Decoder will predict the next word, conditioned on the Encoder's final
hidden state and an embedded representation of the previous target word -- which
is sometimes called *input feeding* or *teacher forcing*. More specifically,
we'll use a :class:`torch.nn.LSTM` to produce a sequence of hidden states that
we'll project to the size of the output vocabulary to predict each target word.
is sometimes called *teacher forcing*. More specifically, we'll use a
:class:`torch.nn.LSTM` to produce a sequence of hidden states that we'll project
to the size of the output vocabulary to predict each target word.
::
......@@ -171,7 +171,7 @@ we'll project to the size of the output vocabulary to predict each target word.
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
......@@ -387,8 +387,8 @@ previous hidden states.
In fairseq this is called :ref:`Incremental decoding`. Incremental decoding is a
special mode at inference time where the Model only receives a single timestep
of input corresponding to the immediately previous output token (for input
feeding) and must produce the next output incrementally. Thus the model must
of input corresponding to the immediately previous output token (for teacher
forcing) and must produce the next output incrementally. Thus the model must
cache any long-term state that is needed about the sequence, e.g., hidden
states, convolutional states, etc.
......
......@@ -88,8 +88,7 @@ class LanguagePairDataset(FairseqDataset):
shuffle (bool, optional): shuffle dataset elements before batching
(default: True).
input_feeding (bool, optional): create a shifted version of the targets
to be passed into the model for input feeding/teacher forcing
(default: True).
to be passed into the model for teacher forcing (default: True).
remove_eos_from_source (bool, optional): if set, removes eos from end
of source if it's present (default: False).
append_eos_to_target (bool, optional): if set, appends eos to end of
......@@ -167,10 +166,10 @@ class LanguagePairDataset(FairseqDataset):
- `src_lengths` (LongTensor): 1D Tensor of the unpadded
lengths of each source sentence of shape `(bsz)`
- `prev_output_tokens` (LongTensor): a padded 2D Tensor of
tokens in the target sentence, shifted right by one position
for input feeding/teacher forcing, of shape `(bsz,
tgt_len)`. This key will not be present if *input_feeding*
is ``False``. Padding will appear on the left if
tokens in the target sentence, shifted right by one
position for teacher forcing, of shape `(bsz, tgt_len)`.
This key will not be present if *input_feeding* is
``False``. Padding will appear on the left if
*left_pad_target* is ``True``.
- `target` (LongTensor): a padded 2D Tensor of tokens in the
......
......@@ -22,7 +22,7 @@ class FairseqDecoder(nn.Module):
"""
Args:
prev_output_tokens (LongTensor): shifted output tokens of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (dict, optional): output from the encoder, used for
encoder-side attention
......
......@@ -13,7 +13,7 @@ class FairseqIncrementalDecoder(FairseqDecoder):
Incremental decoding is a special mode at inference time where the Model
only receives a single timestep of input corresponding to the previous
output token (for input feeding) and must produce the next output
output token (for teacher forcing) and must produce the next output
*incrementally*. Thus the model must cache any long-term state that is
needed about the sequence, e.g., hidden states, convolutional states, etc.
......@@ -37,7 +37,7 @@ class FairseqIncrementalDecoder(FairseqDecoder):
"""
Args:
prev_output_tokens (LongTensor): shifted output tokens of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (dict, optional): output from the encoder, used for
encoder-side attention
incremental_state (dict, optional): dictionary used for storing
......
......@@ -202,8 +202,8 @@ class FairseqEncoderDecoderModel(BaseFairseqModel):
Run the forward pass for an encoder-decoder model.
First feed a batch of source tokens through the encoder. Then, feed the
encoder output and previous decoder outputs (i.e., input feeding/teacher
forcing) to the decoder to produce the next outputs::
encoder output and previous decoder outputs (i.e., teacher forcing) to
the decoder to produce the next outputs::
encoder_out = self.encoder(src_tokens, src_lengths)
return self.decoder(prev_output_tokens, encoder_out)
......@@ -213,7 +213,7 @@ class FairseqEncoderDecoderModel(BaseFairseqModel):
`(batch, src_len)`
src_lengths (LongTensor): source sentence lengths of shape `(batch)`
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
Returns:
tuple:
......
......@@ -345,7 +345,7 @@ class LightConvDecoder(FairseqIncrementalDecoder):
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
incremental_state (dict): dictionary used for storing state during
......
......@@ -370,7 +370,7 @@ class TransformerDecoder(FairseqIncrementalDecoder):
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
incremental_state (dict): dictionary used for storing state during
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment