Unverified Commit 969859d5 authored by Santiago Castro's avatar Santiago Castro Committed by GitHub
Browse files

Fix doc errors and typos across the board (#8139)

* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
parent 4731a00c
......@@ -54,7 +54,7 @@ python code/run_squad.py \
| SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | **88.7** (this) | 79.6 | [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred) |
Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
## Model in action
......
......@@ -45,7 +45,7 @@ python code/run_tacred.py \
| SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2) | 79.6 | **70.8** (this one) |
Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
......
......@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql-to-en")
def get_explanation(query):
input_text = "translante Sql to English: %s </s>" % query
input_text = "translate Sql to English: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
......
......@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
def get_sql(query):
input_text = "translante English to SQL: %s </s>" % query
input_text = "translate English to SQL: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
......
......@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
def get_sql(query):
input_text = "translante English to SQL: %s </s>" % query
input_text = "translate English to SQL: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
......
......@@ -71,7 +71,7 @@ Citation:
</details>
As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
| Dataset | # samples |
| ----------- | --------- |
......
......@@ -172,7 +172,7 @@ class MemorySummary(NamedTuple):
`MemorySummary` namedtuple otherwise with the fields:
- `sequential`: a list of `MemoryState` namedtuple (see below) computed from the provided `memory_trace` by
substracting the memory after executing each line from the memory before executing said line.
subtracting the memory after executing each line from the memory before executing said line.
- `cumulative`: a list of `MemoryState` namedtuple (see below) with cumulative increase in memory for each line
obtained by summing repeated memory increase for a line if it's executed several times. The list is sorted
from the frame with the largest memory consumption to the frame with the smallest (can be negative if memory
......@@ -208,7 +208,7 @@ def measure_peak_memory_cpu(function: Callable[[], None], interval=0.5, device_i
Returns:
- `max_memory`: (`int`) cosumed memory peak in Bytes
- `max_memory`: (`int`) consumed memory peak in Bytes
"""
def get_cpu_memory(process_id: int) -> int:
......@@ -221,7 +221,7 @@ def measure_peak_memory_cpu(function: Callable[[], None], interval=0.5, device_i
Returns
- `memory`: (`int`) cosumed memory in Bytes
- `memory`: (`int`) consumed memory in Bytes
"""
process = psutil.Process(process_id)
try:
......@@ -367,7 +367,7 @@ def start_memory_tracing(
devices = list(range(nvml.nvmlDeviceGetCount())) if gpus_to_trace is None else gpus_to_trace
nvml.nvmlShutdown()
except (OSError, nvml.NVMLError):
logger.warning("Error while initializing comunication with GPU. " "We won't perform GPU memory tracing.")
logger.warning("Error while initializing communication with GPU. " "We won't perform GPU memory tracing.")
log_gpu = False
else:
log_gpu = is_torch_available() or is_tf_available()
......@@ -472,9 +472,10 @@ def stop_memory_tracing(
Args:
- `memory_trace` (optional output of start_memory_tracing, default: None): memory trace to convert in summary
- `ignore_released_memory` (boolean, default: None): if True we only sum memory increase to compute total
memory
`memory_trace` (optional output of start_memory_tracing, default: None):
memory trace to convert in summary
`ignore_released_memory` (boolean, default: None):
if True we only sum memory increase to compute total memory
Return:
......@@ -482,7 +483,7 @@ def stop_memory_tracing(
- `MemorySummary` namedtuple otherwise with the fields:
- `sequential`: a list of `MemoryState` namedtuple (see below) computed from the provided `memory_trace` by
substracting the memory after executing each line from the memory before executing said line.
subtracting the memory after executing each line from the memory before executing said line.
- `cumulative`: a list of `MemoryState` namedtuple (see below) with cumulative increase in memory for each
line obtained by summing repeated memory increase for a line if it's executed several times. The list is
sorted from the frame with the largest memory consumption to the frame with the smallest (can be negative
......
......@@ -41,7 +41,7 @@ class ConvertCommand(BaseTransformersCLICommand):
"--tf_checkpoint", type=str, required=True, help="TensorFlow checkpoint path or folder."
)
train_parser.add_argument(
"--pytorch_dump_output", type=str, required=True, help="Path to the PyTorch savd model output."
"--pytorch_dump_output", type=str, required=True, help="Path to the PyTorch saved model output."
)
train_parser.add_argument("--config", type=str, default="", help="Configuration file path or folder.")
train_parser.add_argument(
......
......@@ -61,7 +61,7 @@ class BartConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
......@@ -76,7 +76,7 @@ class BertConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
......@@ -42,7 +42,7 @@ class BertGenerationConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......@@ -62,7 +62,7 @@ class BertGenerationConfig(PretrainedConfig):
>>> # Initializing a BertGeneration config
>>> configuration = BertGenerationConfig()
>>> # Initializing a modelfrom the config
>>> # Initializing a model from the config
>>> model = BertGenerationEncoder(configuration)
>>> # Accessing the model configuration
......
......@@ -58,7 +58,7 @@ class BlenderbotConfig(BartConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
......@@ -55,7 +55,7 @@ class DebertaConfig(PretrainedConfig):
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"`, :obj:`"gelu"`, :obj:`"tanh"`, :obj:`"gelu_fast"`,
:obj:`"mish"`, :obj:`"linear"`, :obj:`"sigmoid"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
......@@ -61,7 +61,7 @@ class DistilBertConfig(PretrainedConfig):
hidden_dim (:obj:`int`, `optional`, defaults to 3072):
The size of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
activation (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
......
......@@ -57,7 +57,7 @@ class DPRConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
......@@ -62,7 +62,7 @@ class ElectraConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
......@@ -59,11 +59,11 @@ class FlaubertConfig(XLMConfig):
attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for the attention mechanism
gelu_activation (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to use a `gelu` actibation instead of `relu`.
Whether or not to use a `gelu` activation instead of `relu`.
sinusoidal_embeddings (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to use sinusoidal positional embeddings instead of absolute positional embeddings.
causal (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model shoul behave in a causal manner. Causal models use a triangular attention mask in
Whether or not the model should behave in a causal manner. Causal models use a triangular attention mask in
order to only attend to the left-side context instead if a bidirectional context.
asm (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to use an adaptive log softmax projection layer instead of a linear layer for the prediction
......
......@@ -73,7 +73,7 @@ class FSMTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
......@@ -68,7 +68,7 @@ class FunnelConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
......@@ -54,7 +54,7 @@ class LayoutLMConfig(BertConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment