Unverified Commit 969859d5 authored by Santiago Castro's avatar Santiago Castro Committed by GitHub
Browse files

Fix doc errors and typos across the board (#8139)

* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
parent 4731a00c
...@@ -54,7 +54,7 @@ python code/run_squad.py \ ...@@ -54,7 +54,7 @@ python code/run_squad.py \
| SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | **88.7** (this) | 79.6 | [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred) | | SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | **88.7** (this) | 79.6 | [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred) |
Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers. Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
## Model in action ## Model in action
......
...@@ -45,7 +45,7 @@ python code/run_tacred.py \ ...@@ -45,7 +45,7 @@ python code/run_tacred.py \
| SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2) | 79.6 | **70.8** (this one) | | SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2) | 79.6 | **70.8** (this one) |
Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers. Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
......
...@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql ...@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql-to-en") model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql-to-en")
def get_explanation(query): def get_explanation(query):
input_text = "translante Sql to English: %s </s>" % query input_text = "translate Sql to English: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt') features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'], output = model.generate(input_ids=features['input_ids'],
......
...@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL") ...@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL") model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
def get_sql(query): def get_sql(query):
input_text = "translante English to SQL: %s </s>" % query input_text = "translate English to SQL: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt') features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'], output = model.generate(input_ids=features['input_ids'],
......
...@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL") ...@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL") model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
def get_sql(query): def get_sql(query):
input_text = "translante English to SQL: %s </s>" % query input_text = "translate English to SQL: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt') features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'], output = model.generate(input_ids=features['input_ids'],
......
...@@ -71,7 +71,7 @@ Citation: ...@@ -71,7 +71,7 @@ Citation:
</details> </details>
As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got: As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
| Dataset | # samples | | Dataset | # samples |
| ----------- | --------- | | ----------- | --------- |
......
...@@ -172,7 +172,7 @@ class MemorySummary(NamedTuple): ...@@ -172,7 +172,7 @@ class MemorySummary(NamedTuple):
`MemorySummary` namedtuple otherwise with the fields: `MemorySummary` namedtuple otherwise with the fields:
- `sequential`: a list of `MemoryState` namedtuple (see below) computed from the provided `memory_trace` by - `sequential`: a list of `MemoryState` namedtuple (see below) computed from the provided `memory_trace` by
substracting the memory after executing each line from the memory before executing said line. subtracting the memory after executing each line from the memory before executing said line.
- `cumulative`: a list of `MemoryState` namedtuple (see below) with cumulative increase in memory for each line - `cumulative`: a list of `MemoryState` namedtuple (see below) with cumulative increase in memory for each line
obtained by summing repeated memory increase for a line if it's executed several times. The list is sorted obtained by summing repeated memory increase for a line if it's executed several times. The list is sorted
from the frame with the largest memory consumption to the frame with the smallest (can be negative if memory from the frame with the largest memory consumption to the frame with the smallest (can be negative if memory
...@@ -208,7 +208,7 @@ def measure_peak_memory_cpu(function: Callable[[], None], interval=0.5, device_i ...@@ -208,7 +208,7 @@ def measure_peak_memory_cpu(function: Callable[[], None], interval=0.5, device_i
Returns: Returns:
- `max_memory`: (`int`) cosumed memory peak in Bytes - `max_memory`: (`int`) consumed memory peak in Bytes
""" """
def get_cpu_memory(process_id: int) -> int: def get_cpu_memory(process_id: int) -> int:
...@@ -221,7 +221,7 @@ def measure_peak_memory_cpu(function: Callable[[], None], interval=0.5, device_i ...@@ -221,7 +221,7 @@ def measure_peak_memory_cpu(function: Callable[[], None], interval=0.5, device_i
Returns Returns
- `memory`: (`int`) cosumed memory in Bytes - `memory`: (`int`) consumed memory in Bytes
""" """
process = psutil.Process(process_id) process = psutil.Process(process_id)
try: try:
...@@ -367,7 +367,7 @@ def start_memory_tracing( ...@@ -367,7 +367,7 @@ def start_memory_tracing(
devices = list(range(nvml.nvmlDeviceGetCount())) if gpus_to_trace is None else gpus_to_trace devices = list(range(nvml.nvmlDeviceGetCount())) if gpus_to_trace is None else gpus_to_trace
nvml.nvmlShutdown() nvml.nvmlShutdown()
except (OSError, nvml.NVMLError): except (OSError, nvml.NVMLError):
logger.warning("Error while initializing comunication with GPU. " "We won't perform GPU memory tracing.") logger.warning("Error while initializing communication with GPU. " "We won't perform GPU memory tracing.")
log_gpu = False log_gpu = False
else: else:
log_gpu = is_torch_available() or is_tf_available() log_gpu = is_torch_available() or is_tf_available()
...@@ -472,9 +472,10 @@ def stop_memory_tracing( ...@@ -472,9 +472,10 @@ def stop_memory_tracing(
Args: Args:
- `memory_trace` (optional output of start_memory_tracing, default: None): memory trace to convert in summary `memory_trace` (optional output of start_memory_tracing, default: None):
- `ignore_released_memory` (boolean, default: None): if True we only sum memory increase to compute total memory trace to convert in summary
memory `ignore_released_memory` (boolean, default: None):
if True we only sum memory increase to compute total memory
Return: Return:
...@@ -482,7 +483,7 @@ def stop_memory_tracing( ...@@ -482,7 +483,7 @@ def stop_memory_tracing(
- `MemorySummary` namedtuple otherwise with the fields: - `MemorySummary` namedtuple otherwise with the fields:
- `sequential`: a list of `MemoryState` namedtuple (see below) computed from the provided `memory_trace` by - `sequential`: a list of `MemoryState` namedtuple (see below) computed from the provided `memory_trace` by
substracting the memory after executing each line from the memory before executing said line. subtracting the memory after executing each line from the memory before executing said line.
- `cumulative`: a list of `MemoryState` namedtuple (see below) with cumulative increase in memory for each - `cumulative`: a list of `MemoryState` namedtuple (see below) with cumulative increase in memory for each
line obtained by summing repeated memory increase for a line if it's executed several times. The list is line obtained by summing repeated memory increase for a line if it's executed several times. The list is
sorted from the frame with the largest memory consumption to the frame with the smallest (can be negative sorted from the frame with the largest memory consumption to the frame with the smallest (can be negative
......
...@@ -41,7 +41,7 @@ class ConvertCommand(BaseTransformersCLICommand): ...@@ -41,7 +41,7 @@ class ConvertCommand(BaseTransformersCLICommand):
"--tf_checkpoint", type=str, required=True, help="TensorFlow checkpoint path or folder." "--tf_checkpoint", type=str, required=True, help="TensorFlow checkpoint path or folder."
) )
train_parser.add_argument( train_parser.add_argument(
"--pytorch_dump_output", type=str, required=True, help="Path to the PyTorch savd model output." "--pytorch_dump_output", type=str, required=True, help="Path to the PyTorch saved model output."
) )
train_parser.add_argument("--config", type=str, default="", help="Configuration file path or folder.") train_parser.add_argument("--config", type=str, default="", help="Configuration file path or folder.")
train_parser.add_argument( train_parser.add_argument(
......
...@@ -61,7 +61,7 @@ class BartConfig(PretrainedConfig): ...@@ -61,7 +61,7 @@ class BartConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1): dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0): attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0): activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
...@@ -76,7 +76,7 @@ class BertConfig(PretrainedConfig): ...@@ -76,7 +76,7 @@ class BertConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
...@@ -42,7 +42,7 @@ class BertGenerationConfig(PretrainedConfig): ...@@ -42,7 +42,7 @@ class BertGenerationConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
...@@ -62,7 +62,7 @@ class BertGenerationConfig(PretrainedConfig): ...@@ -62,7 +62,7 @@ class BertGenerationConfig(PretrainedConfig):
>>> # Initializing a BertGeneration config >>> # Initializing a BertGeneration config
>>> configuration = BertGenerationConfig() >>> configuration = BertGenerationConfig()
>>> # Initializing a modelfrom the config >>> # Initializing a model from the config
>>> model = BertGenerationEncoder(configuration) >>> model = BertGenerationEncoder(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
......
...@@ -58,7 +58,7 @@ class BlenderbotConfig(BartConfig): ...@@ -58,7 +58,7 @@ class BlenderbotConfig(BartConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1): dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0): attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0): activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
...@@ -55,7 +55,7 @@ class DebertaConfig(PretrainedConfig): ...@@ -55,7 +55,7 @@ class DebertaConfig(PretrainedConfig):
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"`, :obj:`"gelu"`, :obj:`"tanh"`, :obj:`"gelu_fast"`, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"`, :obj:`"gelu"`, :obj:`"tanh"`, :obj:`"gelu_fast"`,
:obj:`"mish"`, :obj:`"linear"`, :obj:`"sigmoid"` and :obj:`"gelu_new"` are supported. :obj:`"mish"`, :obj:`"linear"`, :obj:`"sigmoid"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
...@@ -61,7 +61,7 @@ class DistilBertConfig(PretrainedConfig): ...@@ -61,7 +61,7 @@ class DistilBertConfig(PretrainedConfig):
hidden_dim (:obj:`int`, `optional`, defaults to 3072): hidden_dim (:obj:`int`, `optional`, defaults to 3072):
The size of the "intermediate" (often named feed-forward) layer in the Transformer encoder. The size of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
dropout (:obj:`float`, `optional`, defaults to 0.1): dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1): attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
activation (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`): activation (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
......
...@@ -57,7 +57,7 @@ class DPRConfig(PretrainedConfig): ...@@ -57,7 +57,7 @@ class DPRConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
...@@ -62,7 +62,7 @@ class ElectraConfig(PretrainedConfig): ...@@ -62,7 +62,7 @@ class ElectraConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
...@@ -59,11 +59,11 @@ class FlaubertConfig(XLMConfig): ...@@ -59,11 +59,11 @@ class FlaubertConfig(XLMConfig):
attention_dropout (:obj:`float`, `optional`, defaults to 0.1): attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for the attention mechanism The dropout probability for the attention mechanism
gelu_activation (:obj:`bool`, `optional`, defaults to :obj:`True`): gelu_activation (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to use a `gelu` actibation instead of `relu`. Whether or not to use a `gelu` activation instead of `relu`.
sinusoidal_embeddings (:obj:`bool`, `optional`, defaults to :obj:`False`): sinusoidal_embeddings (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to use sinusoidal positional embeddings instead of absolute positional embeddings. Whether or not to use sinusoidal positional embeddings instead of absolute positional embeddings.
causal (:obj:`bool`, `optional`, defaults to :obj:`False`): causal (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model shoul behave in a causal manner. Causal models use a triangular attention mask in Whether or not the model should behave in a causal manner. Causal models use a triangular attention mask in
order to only attend to the left-side context instead if a bidirectional context. order to only attend to the left-side context instead if a bidirectional context.
asm (:obj:`bool`, `optional`, defaults to :obj:`False`): asm (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to use an adaptive log softmax projection layer instead of a linear layer for the prediction Whether or not to use an adaptive log softmax projection layer instead of a linear layer for the prediction
......
...@@ -73,7 +73,7 @@ class FSMTConfig(PretrainedConfig): ...@@ -73,7 +73,7 @@ class FSMTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1): dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0): attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0): activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
...@@ -68,7 +68,7 @@ class FunnelConfig(PretrainedConfig): ...@@ -68,7 +68,7 @@ class FunnelConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1): attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for the attention probabilities. The dropout probability for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0): activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
......
...@@ -54,7 +54,7 @@ class LayoutLMConfig(BertConfig): ...@@ -54,7 +54,7 @@ class LayoutLMConfig(BertConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment