Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
5c00e344
Commit
5c00e344
authored
Dec 13, 2019
by
thomwolf
Browse files
update model doc - swith 3B/11B to 3b/11b
parent
110394b2
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
20 additions
and
25 deletions
+20
-25
docs/source/pretrained_models.rst
docs/source/pretrained_models.rst
+10
-15
transformers/configuration_t5.py
transformers/configuration_t5.py
+2
-2
transformers/modeling_t5.py
transformers/modeling_t5.py
+2
-2
transformers/modeling_tf_t5.py
transformers/modeling_tf_t5.py
+2
-2
transformers/tokenization_t5.py
transformers/tokenization_t5.py
+4
-4
No files found.
docs/source/pretrained_models.rst
View file @
5c00e344
...
@@ -217,25 +217,20 @@ Here is the full list of the currently provided pretrained models together with
...
@@ -217,25 +217,20 @@ Here is the full list of the currently provided pretrained models together with
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| T5 | ``t5-small`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-base`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | ``t5-base`` | | ~220M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 12-heads, |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-large`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | ``t5-large`` | | ~770M parameters with 24-layers, 1024-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-3b`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | ``t5-3B`` | | ~2.8B parameters with 24-layers, 1024-hidden-state, 16384 feed-forward hidden-state, 32-heads, |
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-11b`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
...
...
transformers/configuration_t5.py
View file @
5c00e344
...
@@ -30,8 +30,8 @@ T5_PRETRAINED_CONFIG_ARCHIVE_MAP = {
...
@@ -30,8 +30,8 @@ T5_PRETRAINED_CONFIG_ARCHIVE_MAP = {
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-small-config.json"
,
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-small-config.json"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-config.json"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-config.json"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-large-config.json"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-large-config.json"
,
't5-3
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-3
B
-config.json"
,
't5-3
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-3
b
-config.json"
,
't5-11
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-11
B
-config.json"
,
't5-11
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-11
b
-config.json"
,
}
}
...
...
transformers/modeling_t5.py
View file @
5c00e344
...
@@ -44,8 +44,8 @@ T5_PRETRAINED_MODEL_ARCHIVE_MAP = {
...
@@ -44,8 +44,8 @@ T5_PRETRAINED_MODEL_ARCHIVE_MAP = {
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-small-pytorch_model.bin"
,
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-small-pytorch_model.bin"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-pytorch_model.bin"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-pytorch_model.bin"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-large-pytorch_model.bin"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-large-pytorch_model.bin"
,
't5-3
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-3
B
-pytorch_model.bin"
,
't5-3
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-3
b
-pytorch_model.bin"
,
't5-11
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-11
B
-pytorch_model.bin"
,
't5-11
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-11
b
-pytorch_model.bin"
,
}
}
####################################################
####################################################
...
...
transformers/modeling_tf_t5.py
View file @
5c00e344
...
@@ -34,8 +34,8 @@ TF_T5_PRETRAINED_MODEL_ARCHIVE_MAP = {
...
@@ -34,8 +34,8 @@ TF_T5_PRETRAINED_MODEL_ARCHIVE_MAP = {
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-small-tf_model.h5"
,
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-small-tf_model.h5"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-tf_model.h5"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-tf_model.h5"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-large-tf_model.h5"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-large-tf_model.h5"
,
't5-3
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-3
B
-tf_model.h5"
,
't5-3
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-3
b
-tf_model.h5"
,
't5-11
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-11
B
-tf_model.h5"
,
't5-11
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-11
b
-tf_model.h5"
,
}
}
####################################################
####################################################
...
...
transformers/tokenization_t5.py
View file @
5c00e344
...
@@ -44,8 +44,8 @@ PRETRAINED_VOCAB_FILES_MAP = {
...
@@ -44,8 +44,8 @@ PRETRAINED_VOCAB_FILES_MAP = {
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-small'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-base'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-large'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-3
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-3
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-11
B
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
't5-11
b
'
:
"https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model"
,
}
}
}
}
...
@@ -56,8 +56,8 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
...
@@ -56,8 +56,8 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
't5-small'
:
512
,
't5-small'
:
512
,
't5-base'
:
512
,
't5-base'
:
512
,
't5-large'
:
512
,
't5-large'
:
512
,
't5-3
B
'
:
512
,
't5-3
b
'
:
512
,
't5-11
B
'
:
512
,
't5-11
b
'
:
512
,
}
}
class
T5Tokenizer
(
PreTrainedTokenizer
):
class
T5Tokenizer
(
PreTrainedTokenizer
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment