Fix doc errors and typos across the board (#8139)

* Fix doc errors and typos across the board * Fix a typo * Fix the CI * Fix more typos * Fix CI * More fixes * Fix CI * More fixes * More fixes

Fix doc errors and typos across the board (#8139)
* Fix doc errors and typos across the board * Fix a typo * Fix the CI * Fix more typos * Fix CI * More fixes * Fix CI * More fixes * More fixes
969859d5 · Santiago Castro · GitHub · 4731a00c · 969859d5 · 969859d5
Unverified Commit 969859d5 authored Oct 29, 2020 by Santiago Castro Committed by GitHub Oct 29, 2020
20 changed files
--- a/examples/rag/use_own_knowledge_dataset.py
+++ b/examples/rag/use_own_knowledge_dataset.py
@@ -26,7 +26,7 @@ device = "cuda" if torch.cuda.is_available() else "cpu"


 def split_text(text: str, n=100, character=" ") -> List[str]:
-    """Split the text every ``n``-th occurence of ``character``"""
+    """Split the text every ``n``-th occurrence of ``character``"""
    text = text.split(character)
    return [character.join(text[i : i + n]).strip() for i in range(0, len(text), n)]


--- a/examples/seq2seq/bertabs/configuration_bertabs.py
+++ b/examples/seq2seq/bertabs/configuration_bertabs.py
@@ -44,7 +44,7 @@ class BertAbsConfig(PretrainedConfig):
        enc_ff_size: int
            The size of the encoder's feed-forward layers.
        enc_dropout: int
-            The dropout probabilitiy for all fully connected layers in the
+            The dropout probability for all fully connected layers in the
            embeddings, layers, pooler and also the attention probabilities in
            the encoder.
        dec_layer: int
@@ -56,7 +56,7 @@ class BertAbsConfig(PretrainedConfig):
        dec_ff_size: int
            The size of the decoder's feed-forward layers.
        dec_dropout: int
-            The dropout probabilitiy for all fully connected layers in the
+            The dropout probability for all fully connected layers in the
            embeddings, layers, pooler and also the attention probabilities in
            the decoder.
    """

--- a/examples/seq2seq/bertabs/modeling_bertabs.py
+++ b/examples/seq2seq/bertabs/modeling_bertabs.py
@@ -152,7 +152,7 @@ class TransformerDecoder(nn.Module):
       dropout (float): dropout parameters
       embeddings (:obj:`onmt.modules.Embeddings`):
          embeddings to use, should have positional encodings
-       attn_type (str): if using a seperate copy attention
+       attn_type (str): if using a separate copy attention
    """

    def __init__(self, num_layers, d_model, heads, d_ff, dropout, embeddings, vocab_size):
@@ -817,11 +817,7 @@ class Translator(object):

        Args:
           batch (:obj:`Batch`): a batch from a dataset object
-           data (:obj:`Dataset`): the dataset object
           fast (bool): enables fast beam search (may not support all features)
-
-        Todo:
-           Shouldn't need the original dataset.
        """
        with torch.no_grad():
            return self._fast_translate_batch(batch, self.max_length, min_length=self.min_length)

--- a/examples/seq2seq/convert_model_to_fp16.py
+++ b/examples/seq2seq/convert_model_to_fp16.py
@@ -12,7 +12,7 @@ def convert(src_path: str, map_location: str = "cpu", save_path: Union[str, None
    state_dict = torch.load(src_path, map_location=map_location)
    for k, v in tqdm(state_dict.items()):
        if not isinstance(v, torch.Tensor):
-            raise TypeError("FP16 conversion only works on paths that are saved state dics, like pytorch_model.bin")
+            raise TypeError("FP16 conversion only works on paths that are saved state dicts, like pytorch_model.bin")
        state_dict[k] = v.half()
    if save_path is None:  # overwrite src_path
        save_path = src_path

--- a/model_cards/aubmindlab/bert-base-arabert/README.md
+++ b/model_cards/aubmindlab/bert-base-arabert/README.md
@@ -7,7 +7,7 @@ language: ar

 **AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT PAPER](https://arxiv.org/abs/2003.00104v2) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)

-There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were splitted using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).
+There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).

 The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words. The training corpora are a collection of publically available large scale raw arabic text ([Arabic Wikidumps](https://archive.org/details/arwiki-20190201), [The 1.5B words Arabic Corpus](https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4), [The OSIAN Corpus](https://www.aclweb.org/anthology/W19-4619), Assafir news articles, and 4 other manually crawled news websites (Al-Akhbar, Annahar, AL-Ahram, AL-Wafd) from [the Wayback Machine](http://web.archive.org/))


--- a/model_cards/aubmindlab/bert-base-arabertv01/README.md
+++ b/model_cards/aubmindlab/bert-base-arabertv01/README.md
@@ -7,7 +7,7 @@ language: ar

 **AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT PAPER](https://arxiv.org/abs/2003.00104v2) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)

-There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were splitted using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).
+There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).

 The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words. The training corpora are a collection of publically available large scale raw arabic text ([Arabic Wikidumps](https://archive.org/details/arwiki-20190201), [The 1.5B words Arabic Corpus](https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4), [The OSIAN Corpus](https://www.aclweb.org/anthology/W19-4619), Assafir news articles, and 4 other manually crawled news websites (Al-Akhbar, Annahar, AL-Ahram, AL-Wafd) from [the Wayback Machine](http://web.archive.org/))


--- a/model_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
+++ b/model_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
@@ -4,7 +4,7 @@ tags:
 ---

 ## CS224n SQuAD2.0 Project Dataset
-The goal of this model is to save CS224n students GPU time when establising
+The goal of this model is to save CS224n students GPU time when establishing
 baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
 The training set used to fine-tune this model is the same as
 the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,

--- a/model_cards/jannesg/takalane_afr_roberta/README.md
+++ b/model_cards/jannesg/takalane_afr_roberta/README.md
@@ -34,7 +34,7 @@ model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_afr_roberta")

 #### Limitations and bias

-Updates will be added continously to improve performance. 
+Updates will be added continuously to improve performance. 

 ## Training data


--- a/model_cards/mrm8488/CodeBERTaPy/README.md
+++ b/model_cards/mrm8488/CodeBERTaPy/README.md
@@ -94,7 +94,7 @@ fill_mask(PYTHON_CODE3)

 > Great! 🎉

-## This work is heavely inspired on [CodeBERTa](https://github.com/huggingface/transformers/blob/master/model_cards/huggingface/CodeBERTa-small-v1/README.md) by huggingface team
+## This work is heavily inspired on [CodeBERTa](https://github.com/huggingface/transformers/blob/master/model_cards/huggingface/CodeBERTa-small-v1/README.md) by huggingface team

 <br>


--- a/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md
+++ b/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md
@@ -11,7 +11,7 @@ This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corp

 - [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 

-I preprocessed the dataset and splitted it as train / dev (80/20)
+I preprocessed the dataset and split it as train / dev (80/20)

 | Dataset                | # Examples |
 | ---------------------- | ----- |

--- a/model_cards/mrm8488/bert-multi-cased-finetuned-xquadv1/README.md
+++ b/model_cards/mrm8488/bert-multi-cased-finetuned-xquadv1/README.md
@@ -65,7 +65,7 @@ Citation:

 </details>

-As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
+As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:

 | Dataset     | # samples |
 | ----------- | --------- |

--- a/model_cards/mrm8488/bert-multi-uncased-finetuned-xquadv1/README.md
+++ b/model_cards/mrm8488/bert-multi-uncased-finetuned-xquadv1/README.md
@@ -65,7 +65,7 @@ Citation:

 </details>

-As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
+As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:

 | Dataset     | # samples |
 | ----------- | --------- |

--- a/model_cards/mrm8488/bert-spanish-cased-finetuned-ner/README.md
+++ b/model_cards/mrm8488/bert-spanish-cased-finetuned-ner/README.md
@@ -11,7 +11,7 @@ This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corp

 - [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 

-I preprocessed the dataset and splitted it as train / dev (80/20)
+I preprocessed the dataset and split it as train / dev (80/20)

 | Dataset                | # Examples |
 | ---------------------- | ----- |

--- a/model_cards/mrm8488/bert-spanish-cased-finetuned-pos/README.md
+++ b/model_cards/mrm8488/bert-spanish-cased-finetuned-pos/README.md
@@ -11,7 +11,7 @@ This model is a fine-tuned on Spanish [CONLL CORPORA](https://www.kaggle.com/nlt

 - [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) with data augmentation techniques

-I preprocessed the dataset and splitted it as train / dev (80/20)
+I preprocessed the dataset and split it as train / dev (80/20)

 | Dataset                | # Examples |
 | ---------------------- | ----- |

--- a/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv1/README.md
+++ b/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv1/README.md
@@ -44,7 +44,7 @@ python transformers/examples/question-answering/run_squad.py \
  --save_steps 1000
 ```

-It is importatnt to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.
+It is important to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.

 ## Test set Results 🧾


--- a/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv2/README.md
+++ b/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv2/README.md
@@ -44,7 +44,7 @@ python transformers/examples/question-answering/run_squad.py \
  --version_2_with_negative
 ```

-It is importatnt to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.
+It is important to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.

 ## Test set Results 🧾


--- a/model_cards/mrm8488/spanbert-base-finetuned-squadv1/README.md
+++ b/model_cards/mrm8488/spanbert-base-finetuned-squadv1/README.md
@@ -48,7 +48,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)         | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action


--- a/model_cards/mrm8488/spanbert-base-finetuned-squadv2/README.md
+++ b/model_cards/mrm8488/spanbert-base-finetuned-squadv2/README.md
@@ -54,7 +54,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)          | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action


--- a/model_cards/mrm8488/spanbert-base-finetuned-tacred/README.md
+++ b/model_cards/mrm8488/spanbert-base-finetuned-tacred/README.md
@@ -45,7 +45,7 @@ python code/run_tacred.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)        | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-base-finetuned-tacred)   |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.


 > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)

--- a/model_cards/mrm8488/spanbert-large-finetuned-squadv1/README.md
+++ b/model_cards/mrm8488/spanbert-large-finetuned-squadv1/README.md
@@ -48,7 +48,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | **94.6** (this)         | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action