Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
56d5d160
Unverified
Commit
56d5d160
authored
Jun 05, 2020
by
Sylvain Gugger
Committed by
GitHub
Jun 05, 2020
Browse files
Add model and doc badges (#4811)
* Add badges for models and docs
parent
4ab74245
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
165 additions
and
40 deletions
+165
-40
docs/source/summary.rst
docs/source/summary.rst
+165
-40
No files found.
docs/source/summary.rst
View file @
56d5d160
...
...
@@ -50,6 +50,15 @@ that at each position, the model can only look at the tokens before in the atten
Original
GPT
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=openai-gpt"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-openai--gpt-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/gpt"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet"
>
</
a
>
`
Improving
Language
Understanding
by
Generative
Pre
-
Training
<
https
://
cdn
.
openai
.
com
/
research
-
covers
/
language
-
unsupervised
/
language_understanding_paper
.
pdf
>`
_
,
Alec
Radford
et
al
.
...
...
@@ -58,11 +67,18 @@ The first autoregressive model based on the transformer architecture, pretrained
The
library
provides
versions
of
the
model
for
language
modeling
and
multitask
language
modeling
/
multiple
choice
classification
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
gpt
>`.
GPT
-
2
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=gpt2"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-gpt2-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/gpt2"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-gpt2-blueviolet"
>
</
a
>
`
Language
Models
are
Unsupervised
Multitask
Learners
<
https
://
d4mucfpksywv
.
cloudfront
.
net
/
better
-
language
-
models
/
language_models_are_unsupervised_multitask_learners
.
pdf
>`
_
,
Alec
Radford
et
al
.
...
...
@@ -72,11 +88,18 @@ more).
The
library
provides
versions
of
the
model
for
language
modeling
and
multitask
language
modeling
/
multiple
choice
classification
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
gpt2
>`.
CTRL
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=ctrl"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-ctrl-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/ctrl"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-ctrl-blueviolet"
>
</
a
>
`
CTRL
:
A
Conditional
Transformer
Language
Model
for
Controllable
Generation
<
https
://
arxiv
.
org
/
abs
/
1909.05858
>`
_
,
Nitish
Shirish
Keskar
et
al
.
...
...
@@ -86,11 +109,18 @@ wikipedia article, a book or a movie review.
The
library
provides
a
version
of
the
model
for
language
modeling
only
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
ctrl
>`.
Transformer
-
XL
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=transfo-xl"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-transfo--xl-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/transformerxl"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet"
>
</
a
>
`
Transformer
-
XL
:
Attentive
Language
Models
Beyond
a
Fixed
-
Length
Context
<
https
://
arxiv
.
org
/
abs
/
1901.02860
>`
_
,
Zihang
Dai
et
al
.
...
...
@@ -108,13 +138,20 @@ adjustments in the way attention scores are computed.
The
library
provides
a
version
of
the
model
for
language
modeling
only
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
transformerxl
>`.
..
_reformer
:
Reformer
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=reformer"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-reformer-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/reformer"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-reformer-blueviolet"
>
</
a
>
`
Reformer
:
The
Efficient
Transformer
<
https
://
arxiv
.
org
/
abs
/
2001.04451
>`
_
,
Nikita
Kitaev
et
al
.
...
...
@@ -138,11 +175,18 @@ pretraining yet, though.
The library provides a version of the model for language modeling only.
More information in this :doc:`model documentation </model_doc/reformer>`.
XLNet
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlnet-blueviolet">
</a>
<a href="/model_doc/xlnet">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlnet-blueviolet">
</a>
`XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_,
Zhilin Yang et al.
...
...
@@ -156,20 +200,27 @@ XLNet also uses the same recurrence mechanism as TransformerXL to build long-ter
The library provides a version of the model for language modeling, token classification, sentence classification,
multiple choice classification and question answering.
More information in this :doc:`model documentation </model_doc/xlnet>`.
.. _autoencoding-models:
Autoencoding models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
`
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
look at all the tokens in the attention heads. For pretraining, inputs are a corrupted version of the sentence, usually
obtained by masking tokens, and targets are the original sentences.
BERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=bert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet">
</a>
<a href="/model_doc/bert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bert-blueviolet">
</a>
`BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_,
Jacob Devlin et al.
...
...
@@ -187,11 +238,18 @@ they are not related. The model has to predict if the sentences are consecutive
The library provides a version of the model for language modeling (traditional or masked), next sentence prediction,
token classification, sentence classification, multiple choice classification and question answering.
More information in this :doc:`model documentation </model_doc/bert>`.
ALBERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=albert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-albert-blueviolet">
</a>
<a href="/model_doc/albert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-albert-blueviolet">
</a>
`ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_,
Zhenzhong Lan et al.
...
...
@@ -209,11 +267,18 @@ Same as BERT but with a few tweaks:
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
More information in this :doc:`model documentation </model_doc/albert>`.
RoBERTa
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=roberta">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-roberta-blueviolet">
</a>
<a href="/model_doc/roberta">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-roberta-blueviolet">
</a>
`RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_,
Yinhan Liu et al.
...
...
@@ -228,11 +293,18 @@ Same as BERT with better pretraining tricks:
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
More information in this :doc:`model documentation </model_doc/roberta>`.
DistilBERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=distilbert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-distilbert-blueviolet">
</a>
<a href="/model_doc/distilbert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-distilbert-blueviolet">
</a>
`DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_,
Victor Sanh et al.
...
...
@@ -246,11 +318,18 @@ the same probabilities as the larger model. The actual objective is a combinatio
The
library
provides
a
version
of
the
model
for
masked
language
modeling
,
token
classification
,
sentence
classification
and
question
answering
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
distilbert
>`.
XLM
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=xlm"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-xlm-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/xlm"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-xlm-blueviolet"
>
</
a
>
`
Cross
-
lingual
Language
Model
Pretraining
<
https
://
arxiv
.
org
/
abs
/
1901.07291
>`
_
,
Guillaume
Lample
and
Alexis
Conneau
A
transformer
model
trained
on
several
languages
.
There
are
three
different
type
of
training
for
this
model
and
the
...
...
@@ -274,11 +353,18 @@ language.
The
library
provides
a
version
of
the
model
for
language
modeling
,
token
classification
,
sentence
classification
and
question
answering
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
xlm
>`.
XLM
-
RoBERTa
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=xlm-roberta"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-xlm--roberta-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/xlmroberta"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet"
>
</
a
>
`
Unsupervised
Cross
-
lingual
Representation
Learning
at
Scale
<
https
://
arxiv
.
org
/
abs
/
1911.02116
>`
_
,
Alexis
Conneau
et
al
.
...
...
@@ -289,22 +375,36 @@ masked language modeling on sentences coming from one language. However, the mod
The
library
provides
a
version
of
the
model
for
masked
language
modeling
,
token
classification
,
sentence
classification
,
multiple
choice
classification
and
question
answering
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
xlmroberta
>`.
FlauBERT
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=flaubert"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-flaubert-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/flaubert"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-flaubert-blueviolet"
>
</
a
>
`
FlauBERT
:
Unsupervised
Language
Model
Pre
-
training
for
French
<
https
://
arxiv
.
org
/
abs
/
1912.05372
>`
_
,
Hang
Le
et
al
.
Like
RoBERTa
,
without
the
sentence
ordering
prediction
(
so
just
trained
on
the
MLM
objective
).
The
library
provides
a
version
of
the
model
for
language
modeling
and
sentence
classification
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
flaubert
>`.
ELECTRA
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=electra"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-electra-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/electra"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-electra-blueviolet"
>
</
a
>
`
ELECTRA
:
Pre
-
training
Text
Encoders
as
Discriminators
Rather
Than
Generators
<
https
://
arxiv
.
org
/
abs
/
2003.10555
>`
_
,
Kevin
Clark
et
al
.
...
...
@@ -317,13 +417,20 @@ traditional GAN setting) then the ELECTRA model is trained for a few steps.
The
library
provides
a
version
of
the
model
for
masked
language
modeling
,
token
classification
and
sentence
classification
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
electra
>`.
..
_longformer
:
Longformer
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=longformer"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-longformer-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/longformer"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-longformer-blueviolet"
>
</
a
>
`
Longformer
:
The
Long
-
Document
Transformer
<
https
://
arxiv
.
org
/
abs
/
2004.05150
>`
_
,
Iz
Beltagy
et
al
.
A
transformer
model
replacing
the
attention
matrices
by
sparse
matrices
to
go
faster
.
Often
,
the
local
context
(
e
.
g
.,
...
...
@@ -339,9 +446,6 @@ pretraining yet, though.
The
library
provides
a
version
of
the
model
for
masked
language
modeling
,
token
classification
,
sentence
classification
,
multiple
choice
classification
and
question
answering
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
longformer
>`.
..
_seq
-
to
-
seq
-
models
:
Sequence
-
to
-
sequence
models
...
...
@@ -352,8 +456,17 @@ As mentioned before, these models keep both the encoder and the decoder of the o
BART
----------------------------------------------
`
BART
:
Denoising
Sequence
-
to
-
Sequence
Pre
-
training
for
Natural
Language
Generation
,
Translation
,
and
Comprehension
<
https
://
arxiv
.
org
/
abs
/
1910.13461
>`
_
,
Mike
Lewis
et
al
.
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=bart"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-bart-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/bart"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-bart-blueviolet"
>
</
a
>
`
BART
:
Denoising
Sequence
-
to
-
Sequence
Pre
-
training
for
Natural
Language
Generation
,
Translation
,
and
Comprehension
<
https
://
arxiv
.
org
/
abs
/
1910.13461
>`
_
,
Mike
Lewis
et
al
.
Sequence
-
to
-
sequence
model
with
an
encoder
and
a
decoder
.
Encoder
is
fed
a
corrupted
version
of
the
tokens
,
decoder
is
fed
the
tokens
(
but
has
a
mask
to
hide
the
future
words
like
a
regular
transformers
decoder
).
For
the
encoder
,
on
the
...
...
@@ -367,22 +480,36 @@ pretraining tasks, a composition of the following transformations are applied:
The
library
provides
a
version
of
this
model
for
conditional
generation
and
sequence
classification
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
bart
>`.
MarianMT
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=marian"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-marian-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/marian"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-marian-blueviolet"
>
</
a
>
`
Marian
:
Fast
Neural
Machine
Translation
in
C
++
<
https
://
arxiv
.
org
/
abs
/
1804.00344
>`
_
,
Marcin
Junczys
-
Dowmunt
et
al
.
A
framework
for
translation
models
,
using
the
same
models
as
BART
The
library
provides
a
version
of
this
model
for
conditional
generation
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
marian
>`.
T5
----------------------------------------------
..
raw
::
html
<
a
href
=
"https://huggingface.co/models?filter=t5"
>
<
img
alt
=
"Models"
src
=
"https://img.shields.io/badge/All_model_pages-t5-blueviolet"
>
</
a
>
<
a
href
=
"/model_doc/t5"
>
<
img
alt
=
"Doc"
src
=
"https://img.shields.io/badge/Model_documentation-t5-blueviolet"
>
</
a
>
`
Exploring
the
Limits
of
Transfer
Learning
with
a
Unified
Text
-
to
-
Text
Transformer
<
https
://
arxiv
.
org
/
abs
/
1910.10683
>`
_
,
Colin
Raffel
et
al
.
...
...
@@ -403,8 +530,6 @@ input becomes “My <x> very <y> .” and the target is “<x> dog is <y> . <z>
The
library
provides
a
version
of
this
model
for
conditional
generation
.
More
information
in
this
:
doc
:`
model
documentation
</
model_doc
/
t5
>`.
..
_multimodal
-
models
:
Multimodal
models
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment