Unverified Commit e1f3156b authored by Santiago Castro's avatar Santiago Castro Committed by GitHub
Browse files

Fix many typos (#8708)

parent 9c0afdaf
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the Dataset 📚 ## Details of the Dataset 📚
Dataset ID: ```wikisql``` from [HugginFace/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql) Dataset ID: ```wikisql``` from [Huggingface/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the Dataset 📚 ## Details of the Dataset 📚
Dataset ID: ```wikisql``` from [HugginFace/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql) Dataset ID: ```wikisql``` from [Huggingface/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Question Paraphrasing) - Dataset 📚❓↔️❓ ## Details of the downstream task (Question Paraphrasing) - Dataset 📚❓↔️❓
Dataset ID: ```quora``` from [HugginFace/NLP](https://github.com/huggingface/nlp) Dataset ID: ```quora``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓ ## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad``` from [HugginFace/NLP](https://github.com/huggingface/nlp) Dataset ID: ```squad``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓ ## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad_v2``` from [HugginFace/NLP](https://github.com/huggingface/nlp) Dataset ID: ```squad_v2``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the Dataset 📚 ## Details of the Dataset 📚
Dataset ID: ```wikisql``` from [HugginFace/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql) Dataset ID: ```wikisql``` from [Huggingface/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -39,7 +39,7 @@ def convert_tf_weight_name_to_pt_weight_name(tf_name, start_prefix_to_remove="") ...@@ -39,7 +39,7 @@ def convert_tf_weight_name_to_pt_weight_name(tf_name, start_prefix_to_remove="")
return tuple with: return tuple with:
- pytorch model weight name - pytorch model weight name
- transpose: boolean indicating wether TF2.0 and PyTorch weights matrices are transposed with regards to each - transpose: boolean indicating whether TF2.0 and PyTorch weights matrices are transposed with regards to each
other other
""" """
tf_name = tf_name.replace(":0", "") # device ids tf_name = tf_name.replace(":0", "") # device ids
......
...@@ -951,7 +951,7 @@ class FSMTModel(PretrainedFSMTModel): ...@@ -951,7 +951,7 @@ class FSMTModel(PretrainedFSMTModel):
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_dict=return_dict, return_dict=return_dict,
) )
# If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOuput when return_dict=False # If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOutput when return_dict=False
elif return_dict and not isinstance(encoder_outputs, BaseModelOutput): elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
encoder_outputs = BaseModelOutput( encoder_outputs = BaseModelOutput(
last_hidden_state=encoder_outputs[0], last_hidden_state=encoder_outputs[0],
......
...@@ -642,7 +642,7 @@ class TFT5MainLayer(tf.keras.layers.Layer): ...@@ -642,7 +642,7 @@ class TFT5MainLayer(tf.keras.layers.Layer):
raise ValueError(f"You have to specify either {err_msg_prefix}inputs or {err_msg_prefix}inputs_embeds") raise ValueError(f"You have to specify either {err_msg_prefix}inputs or {err_msg_prefix}inputs_embeds")
if inputs_embeds is None: if inputs_embeds is None:
assert self.embed_tokens is not None, "You have to intialize the model with valid token embeddings" assert self.embed_tokens is not None, "You have to initialize the model with valid token embeddings"
inputs_embeds = self.embed_tokens(input_ids) inputs_embeds = self.embed_tokens(input_ids)
batch_size, seq_length = input_shape batch_size, seq_length = input_shape
......
...@@ -667,9 +667,9 @@ class TransfoXLLMHeadModelOutput(ModelOutput): ...@@ -667,9 +667,9 @@ class TransfoXLLMHeadModelOutput(ModelOutput):
@property @property
def logits(self): def logits(self):
# prediciton scores are the output of the adaptive softmax, see # prediction scores are the output of the adaptive softmax, see
# the file `modeling_transfo_xl_utilities`. Since the adaptive # the file `modeling_transfo_xl_utilities`. Since the adaptive
# softmax returns the log softmax value, `self.prediciton_scores` # softmax returns the log softmax value, `self.prediction_scores`
# are strictly speaking not exactly `logits`, but behave the same # are strictly speaking not exactly `logits`, but behave the same
# way logits do. # way logits do.
return self.prediction_scores return self.prediction_scores
...@@ -886,7 +886,7 @@ class TransfoXLModel(TransfoXLPreTrainedModel): ...@@ -886,7 +886,7 @@ class TransfoXLModel(TransfoXLPreTrainedModel):
head_mask = head_mask.unsqueeze(1).unsqueeze(1).unsqueeze(1) head_mask = head_mask.unsqueeze(1).unsqueeze(1).unsqueeze(1)
head_mask = head_mask.to( head_mask = head_mask.to(
dtype=next(self.parameters()).dtype dtype=next(self.parameters()).dtype
) # switch to fload if need + fp16 compatibility ) # switch to float if need + fp16 compatibility
else: else:
head_mask = [None] * self.n_layer head_mask = [None] * self.n_layer
......
...@@ -91,8 +91,8 @@ class ProjectedAdaptiveLogSoftmax(nn.Module): ...@@ -91,8 +91,8 @@ class ProjectedAdaptiveLogSoftmax(nn.Module):
Return: Return:
if labels is None: out :: [len*bsz x n_tokens] log probabilities of tokens over the vocabulary else: out :: if labels is None: out :: [len*bsz x n_tokens] log probabilities of tokens over the vocabulary else: out ::
[(len-1)*bsz] Negative log likelihood We could replace this implementation by the native PyTorch one if [(len-1)*bsz] Negative log likelihood. We could replace this implementation by the native PyTorch one if
their's had an option to set bias on all clusters in the native one. here: theirs had an option to set bias on all clusters in the native one. here:
https://github.com/pytorch/pytorch/blob/dbe6a7a9ff1a364a8706bf5df58a1ca96d2fd9da/torch/nn/modules/adaptive.py#L138 https://github.com/pytorch/pytorch/blob/dbe6a7a9ff1a364a8706bf5df58a1ca96d2fd9da/torch/nn/modules/adaptive.py#L138
""" """
......
...@@ -633,11 +633,11 @@ XLM_INPUTS_DOCSTRING = r""" ...@@ -633,11 +633,11 @@ XLM_INPUTS_DOCSTRING = r"""
A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are
languages ids which can be obtained from the language names by using two conversion mappings provided in languages ids which can be obtained from the language names by using two conversion mappings provided in
the configuration of the model (only provided for multilingual models). More precisely, the `language name the configuration of the model (only provided for multilingual models). More precisely, the `language name
to language id` mapping is in :obj:`model.config.lang2id` (which is a dictionary strring to int) and the to language id` mapping is in :obj:`model.config.lang2id` (which is a dictionary string to int) and the
`language id to language name` mapping is in :obj:`model.config.id2lang` (dictionary int to string). `language id to language name` mapping is in :obj:`model.config.id2lang` (dictionary int to string).
See usage examples detailed in the :doc:`multilingual documentation <../multilingual>`. See usage examples detailed in the :doc:`multilingual documentation <../multilingual>`.
ttoken_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`): token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
Segment token indices to indicate first and second portions of the inputs. Indices are selected in ``[0, Segment token indices to indicate first and second portions of the inputs. Indices are selected in ``[0,
1]``: 1]``:
......
...@@ -54,7 +54,7 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = { ...@@ -54,7 +54,7 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
class XLMRobertaTokenizer(PreTrainedTokenizer): class XLMRobertaTokenizer(PreTrainedTokenizer):
""" """
Adapted from :class:`~transfomers.RobertaTokenizer` and class:`~transfomers.XLNetTokenizer`. Based on Adapted from :class:`~transformers.RobertaTokenizer` and class:`~transformers.XLNetTokenizer`. Based on
`SentencePiece <https://github.com/google/sentencepiece>`__. `SentencePiece <https://github.com/google/sentencepiece>`__.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods. This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
......
...@@ -904,7 +904,7 @@ XLNET_INPUTS_DOCSTRING = r""" ...@@ -904,7 +904,7 @@ XLNET_INPUTS_DOCSTRING = r"""
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
- 1 for tokens that are **masked**, - 1 for tokens that are **masked**,
- 0 for tokens that are **not maked**. - 0 for tokens that are **not masked**.
You can only uses one of :obj:`input_mask` and :obj:`attention_mask`. You can only uses one of :obj:`input_mask` and :obj:`attention_mask`.
head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`): head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
...@@ -1211,7 +1211,7 @@ class XLNetModel(XLNetPreTrainedModel): ...@@ -1211,7 +1211,7 @@ class XLNetModel(XLNetPreTrainedModel):
head_mask = head_mask.unsqueeze(1).unsqueeze(1).unsqueeze(1) head_mask = head_mask.unsqueeze(1).unsqueeze(1).unsqueeze(1)
head_mask = head_mask.to( head_mask = head_mask.to(
dtype=next(self.parameters()).dtype dtype=next(self.parameters()).dtype
) # switch to fload if need + fp16 compatibility ) # switch to float if need + fp16 compatibility
else: else:
head_mask = [None] * self.n_layer head_mask = [None] * self.n_layer
......
...@@ -167,9 +167,9 @@ class AdamWeightDecay(tf.keras.optimizers.Adam): ...@@ -167,9 +167,9 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
beta_2 (:obj:`float`, `optional`, defaults to 0.999): beta_2 (:obj:`float`, `optional`, defaults to 0.999):
The beta2 parameter in Adam, which is the exponential decay rate for the 2nd momentum estimates. The beta2 parameter in Adam, which is the exponential decay rate for the 2nd momentum estimates.
epsilon (:obj:`float`, `optional`, defaults to 1e-7): epsilon (:obj:`float`, `optional`, defaults to 1e-7):
The epsilon paramenter in Adam, which is a small constant for numerical stability. The epsilon parameter in Adam, which is a small constant for numerical stability.
amsgrad (:obj:`bool`, `optional`, default to `False`): amsgrad (:obj:`bool`, `optional`, default to `False`):
Whether to apply AMSGrad varient of this algorithm or not, see `On the Convergence of Adam and Beyond Whether to apply AMSGrad variant of this algorithm or not, see `On the Convergence of Adam and Beyond
<https://arxiv.org/abs/1904.09237>`__. <https://arxiv.org/abs/1904.09237>`__.
weight_decay_rate (:obj:`float`, `optional`, defaults to 0): weight_decay_rate (:obj:`float`, `optional`, defaults to 0):
The weight decay to apply. The weight decay to apply.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment