Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
3b8b0e01
Commit
3b8b0e01
authored
Jul 16, 2019
by
thomwolf
Browse files
update readme
parent
76da9765
Changes
4
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
365 additions
and
1613 deletions
+365
-1613
README.md
README.md
+184
-1603
docs/source/serialization.rst
docs/source/serialization.rst
+171
-0
pytorch_transformers/modeling_utils.py
pytorch_transformers/modeling_utils.py
+5
-5
pytorch_transformers/modeling_xlnet.py
pytorch_transformers/modeling_xlnet.py
+5
-5
No files found.
README.md
View file @
3b8b0e01
This diff is collapsed.
Click to expand it.
docs/source/serialization.rst
0 → 100644
View file @
3b8b0e01
###
Loading
Google
AI
or
OpenAI
pre
-
trained
weights
or
PyTorch
dump
###
`
from_pretrained
()`
method
To
load
one
of
Google
AI
's, OpenAI'
s
pre
-
trained
models
or
a
PyTorch
saved
model
(
an
instance
of
`
BertForPreTraining
`
saved
with
`
torch
.
save
()`),
the
PyTorch
model
classes
and
the
tokenizer
can
be
instantiated
using
the
`
from_pretrained
()`
method
:
```
python
model
=
BERT_CLASS
.
from_pretrained
(
PRE_TRAINED_MODEL_NAME_OR_PATH
,
cache_dir
=
None
,
from_tf
=
False
,
state_dict
=
None
,
*
input
,
**
kwargs
)
```
where
-
`
BERT_CLASS
`
is
either
a
tokenizer
to
load
the
vocabulary
(`
BertTokenizer
`
or
`
OpenAIGPTTokenizer
`
classes
)
or
one
of
the
eight
BERT
or
three
OpenAI
GPT
PyTorch
model
classes
(
to
load
the
pre
-
trained
weights
):
`
BertModel
`,
`
BertForMaskedLM
`,
`
BertForNextSentencePrediction
`,
`
BertForPreTraining
`,
`
BertForSequenceClassification
`,
`
BertForTokenClassification
`,
`
BertForMultipleChoice
`,
`
BertForQuestionAnswering
`,
`
OpenAIGPTModel
`,
`
OpenAIGPTLMHeadModel
`
or
`
OpenAIGPTDoubleHeadsModel
`,
and
-
`
PRE_TRAINED_MODEL_NAME_OR_PATH
`
is
either
:
-
the
shortcut
name
of
a
Google
AI
's or OpenAI'
s
pre
-
trained
model
selected
in
the
list
:
-
`
bert
-
base
-
uncased
`:
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
-
`
bert
-
large
-
uncased
`:
24
-
layer
,
1024
-
hidden
,
16
-
heads
,
340
M
parameters
-
`
bert
-
base
-
cased
`:
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
-
`
bert
-
large
-
cased
`:
24
-
layer
,
1024
-
hidden
,
16
-
heads
,
340
M
parameters
-
`
bert
-
base
-
multilingual
-
uncased
`:
(
Orig
,
not
recommended
)
102
languages
,
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
-
`
bert
-
base
-
multilingual
-
cased
`:
**(
New
,
recommended
)**
104
languages
,
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
-
`
bert
-
base
-
chinese
`:
Chinese
Simplified
and
Traditional
,
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
-
`
bert
-
base
-
german
-
cased
`:
Trained
on
German
data
only
,
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
[
Performance
Evaluation
](
https
://
deepset
.
ai
/
german
-
bert
)
-
`
bert
-
large
-
uncased
-
whole
-
word
-
masking
`:
24
-
layer
,
1024
-
hidden
,
16
-
heads
,
340
M
parameters
-
Trained
with
Whole
Word
Masking
(
mask
all
of
the
the
tokens
corresponding
to
a
word
at
once
)
-
`
bert
-
large
-
cased
-
whole
-
word
-
masking
`:
24
-
layer
,
1024
-
hidden
,
16
-
heads
,
340
M
parameters
-
Trained
with
Whole
Word
Masking
(
mask
all
of
the
the
tokens
corresponding
to
a
word
at
once
)
-
`
bert
-
large
-
uncased
-
whole
-
word
-
masking
-
finetuned
-
squad
`:
The
`
bert
-
large
-
uncased
-
whole
-
word
-
masking
`
model
finetuned
on
SQuAD
(
using
the
`
run_bert_squad
.
py
`
examples
).
Results
:
*
exact_match
:
86.91579943235573
,
f1
:
93.1532499015869
*
-
`
openai
-
gpt
`:
OpenAI
GPT
English
model
,
12
-
layer
,
768
-
hidden
,
12
-
heads
,
110
M
parameters
-
`
gpt2
`:
OpenAI
GPT
-
2
English
model
,
12
-
layer
,
768
-
hidden
,
12
-
heads
,
117
M
parameters
-
`
gpt2
-
medium
`:
OpenAI
GPT
-
2
English
model
,
24
-
layer
,
1024
-
hidden
,
16
-
heads
,
345
M
parameters
-
`
transfo
-
xl
-
wt103
`:
Transformer
-
XL
English
model
trained
on
wikitext
-
103
,
18
-
layer
,
1024
-
hidden
,
16
-
heads
,
257
M
parameters
-
a
path
or
url
to
a
pretrained
model
archive
containing
:
-
`
bert_config
.
json
`
or
`
openai_gpt_config
.
json
`
a
configuration
file
for
the
model
,
and
-
`
pytorch_model
.
bin
`
a
PyTorch
dump
of
a
pre
-
trained
instance
of
`
BertForPreTraining
`,
`
OpenAIGPTModel
`,
`
TransfoXLModel
`,
`
GPT2LMHeadModel
`
(
saved
with
the
usual
`
torch
.
save
()`)
If
`
PRE_TRAINED_MODEL_NAME_OR_PATH
`
is
a
shortcut
name
,
the
pre
-
trained
weights
will
be
downloaded
from
AWS
S3
(
see
the
links
[
here
](
pytorch_transformers
/
modeling
.
py
))
and
stored
in
a
cache
folder
to
avoid
future
download
(
the
cache
folder
can
be
found
at
`~/.
pytorch_transformers
/`).
-
`
cache_dir
`
can
be
an
optional
path
to
a
specific
directory
to
download
and
cache
the
pre
-
trained
model
weights
.
This
option
is
useful
in
particular
when
you
are
using
distributed
training
:
to
avoid
concurrent
access
to
the
same
weights
you
can
set
for
example
`
cache_dir
=
'./pretrained_model_{}'
.
format
(
args
.
local_rank
)`
(
see
the
section
on
distributed
training
for
more
information
).
-
`
from_tf
`:
should
we
load
the
weights
from
a
locally
saved
TensorFlow
checkpoint
-
`
state_dict
`:
an
optional
state
dictionnary
(
collections
.
OrderedDict
object
)
to
use
instead
of
Google
pre
-
trained
models
-
`*
inputs
`,
`**
kwargs
`:
additional
input
for
the
specific
Bert
class
(
ex
:
num_labels
for
BertForSequenceClassification
)
`
Uncased
`
means
that
the
text
has
been
lowercased
before
WordPiece
tokenization
,
e
.
g
.,
`
John
Smith
`
becomes
`
john
smith
`.
The
Uncased
model
also
strips
out
any
accent
markers
.
`
Cased
`
means
that
the
true
case
and
accent
markers
are
preserved
.
Typically
,
the
Uncased
model
is
better
unless
you
know
that
case
information
is
important
for
your
task
(
e
.
g
.,
Named
Entity
Recognition
or
Part
-
of
-
Speech
tagging
).
For
information
about
the
Multilingual
and
Chinese
model
,
see
the
[
Multilingual
README
](
https
://
github
.
com
/
google
-
research
/
bert
/
blob
/
master
/
multilingual
.
md
)
or
the
original
TensorFlow
repository
.
**
When
using
an
`
uncased
model
`,
make
sure
to
pass
`--
do_lower_case
`
to
the
example
training
scripts
(
or
pass
`
do_lower_case
=
True
`
to
FullTokenizer
if
you
're using your own script and loading the tokenizer your-self.).**
Examples:
```python
# BERT
tokenizer = BertTokenizer.from_pretrained('
bert
-
base
-
uncased
', do_lower_case=True, do_basic_tokenize=True)
model = BertForSequenceClassification.from_pretrained('
bert
-
base
-
uncased
')
# OpenAI GPT
tokenizer = OpenAIGPTTokenizer.from_pretrained('
openai
-
gpt
')
model = OpenAIGPTModel.from_pretrained('
openai
-
gpt
')
# Transformer-XL
tokenizer = TransfoXLTokenizer.from_pretrained('
transfo
-
xl
-
wt103
')
model = TransfoXLModel.from_pretrained('
transfo
-
xl
-
wt103
')
# OpenAI GPT-2
tokenizer = GPT2Tokenizer.from_pretrained('
gpt2
')
model = GPT2Model.from_pretrained('
gpt2
')
```
#### Cache directory
`pytorch_transformers` save the pretrained weights in a cache directory which is located at (in this order of priority):
- `cache_dir` optional arguments to the `from_pretrained()` method (see above),
- shell environment variable `PYTORCH_PRETRAINED_BERT_CACHE`,
- PyTorch cache home + `/pytorch_transformers/`
where PyTorch cache home is defined by (in this order):
- shell environment variable `ENV_TORCH_HOME`
- shell environment variable `ENV_XDG_CACHE_HOME` + `/torch/`)
- default: `~/.cache/torch/`
Usually, if you don'
t
set
any
specific
environment
variable
,
`
pytorch_transformers
`
cache
will
be
at
`~/.
cache
/
torch
/
pytorch_transformers
/`.
You
can
alsways
safely
delete
`
pytorch_transformers
`
cache
but
the
pretrained
model
weights
and
vocabulary
files
wil
have
to
be
re
-
downloaded
from
our
S3
.
###
Serialization
best
-
practices
This
section
explain
how
you
can
save
and
re
-
load
a
fine
-
tuned
model
(
BERT
,
GPT
,
GPT
-
2
and
Transformer
-
XL
).
There
are
three
types
of
files
you
need
to
save
to
be
able
to
reload
a
fine
-
tuned
model
:
-
the
model
it
-
self
which
should
be
saved
following
PyTorch
serialization
[
best
practices
](
https
://
pytorch
.
org
/
docs
/
stable
/
notes
/
serialization
.
html
#
best
-
practices
),
-
the
configuration
file
of
the
model
which
is
saved
as
a
JSON
file
,
and
-
the
vocabulary
(
and
the
merges
for
the
BPE
-
based
models
GPT
and
GPT
-
2
).
The
*
default
filenames
*
of
these
files
are
as
follow
:
-
the
model
weights
file
:
`
pytorch_model
.
bin
`,
-
the
configuration
file
:
`
config
.
json
`,
-
the
vocabulary
file
:
`
vocab
.
txt
`
for
BERT
and
Transformer
-
XL
,
`
vocab
.
json
`
for
GPT
/
GPT
-
2
(
BPE
vocabulary
),
-
for
GPT
/
GPT
-
2
(
BPE
vocabulary
)
the
additional
merges
file
:
`
merges
.
txt
`.
**
If
you
save
a
model
using
these
*
default
filenames
*,
you
can
then
re
-
load
the
model
and
tokenizer
using
the
`
from_pretrained
()`
method
.**
Here
is
the
recommended
way
of
saving
the
model
,
configuration
and
vocabulary
to
an
`
output_dir
`
directory
and
reloading
the
model
and
tokenizer
afterwards
:
```
python
from
pytorch_transformers
import
WEIGHTS_NAME
,
CONFIG_NAME
output_dir
=
"./models/"
#
Step
1
:
Save
a
model
,
configuration
and
vocabulary
that
you
have
fine
-
tuned
#
If
we
have
a
distributed
model
,
save
only
the
encapsulated
model
#
(
it
was
wrapped
in
PyTorch
DistributedDataParallel
or
DataParallel
)
model_to_save
=
model
.
module
if
hasattr
(
model
,
'module'
)
else
model
#
If
we
save
using
the
predefined
names
,
we
can
load
using
`
from_pretrained
`
output_model_file
=
os
.
path
.
join
(
output_dir
,
WEIGHTS_NAME
)
output_config_file
=
os
.
path
.
join
(
output_dir
,
CONFIG_NAME
)
torch
.
save
(
model_to_save
.
state_dict
(),
output_model_file
)
model_to_save
.
config
.
to_json_file
(
output_config_file
)
tokenizer
.
save_vocabulary
(
output_dir
)
#
Step
2
:
Re
-
load
the
saved
model
and
vocabulary
#
Example
for
a
Bert
model
model
=
BertForQuestionAnswering
.
from_pretrained
(
output_dir
)
tokenizer
=
BertTokenizer
.
from_pretrained
(
output_dir
,
do_lower_case
=
args
.
do_lower_case
)
#
Add
specific
options
if
needed
#
Example
for
a
GPT
model
model
=
OpenAIGPTDoubleHeadsModel
.
from_pretrained
(
output_dir
)
tokenizer
=
OpenAIGPTTokenizer
.
from_pretrained
(
output_dir
)
```
Here
is
another
way
you
can
save
and
reload
the
model
if
you
want
to
use
specific
paths
for
each
type
of
files
:
```
python
output_model_file
=
"./models/my_own_model_file.bin"
output_config_file
=
"./models/my_own_config_file.bin"
output_vocab_file
=
"./models/my_own_vocab_file.bin"
#
Step
1
:
Save
a
model
,
configuration
and
vocabulary
that
you
have
fine
-
tuned
#
If
we
have
a
distributed
model
,
save
only
the
encapsulated
model
#
(
it
was
wrapped
in
PyTorch
DistributedDataParallel
or
DataParallel
)
model_to_save
=
model
.
module
if
hasattr
(
model
,
'module'
)
else
model
torch
.
save
(
model_to_save
.
state_dict
(),
output_model_file
)
model_to_save
.
config
.
to_json_file
(
output_config_file
)
tokenizer
.
save_vocabulary
(
output_vocab_file
)
#
Step
2
:
Re
-
load
the
saved
model
and
vocabulary
#
We
didn
't save using the predefined WEIGHTS_NAME, CONFIG_NAME names, we cannot load using `from_pretrained`.
# Here is how to do it in this situation:
# Example for a Bert model
config = BertConfig.from_json_file(output_config_file)
model = BertForQuestionAnswering(config)
state_dict = torch.load(output_model_file)
model.load_state_dict(state_dict)
tokenizer = BertTokenizer(output_vocab_file, do_lower_case=args.do_lower_case)
# Example for a GPT model
config = OpenAIGPTConfig.from_json_file(output_config_file)
model = OpenAIGPTDoubleHeadsModel(config)
state_dict = torch.load(output_model_file)
model.load_state_dict(state_dict)
tokenizer = OpenAIGPTTokenizer(output_vocab_file)
```
pytorch_transformers/modeling_utils.py
View file @
3b8b0e01
...
...
@@ -614,19 +614,19 @@ class SQuADHead(nn.Module):
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
**loss**: (`optional`, returned if both ``start_positions`` and ``end_positions`` are provided) ``torch.FloatTensor`` of shape ``(1,)``:
Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.
**start_top_log_probs**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**start_top_log_probs**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top)``
Log probabilities for the top config.start_n_top start token possibilities (beam-search).
**start_top_index**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**start_top_index**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.LongTensor`` of shape ``(batch_size, config.start_n_top)``
Indices for the top config.start_n_top start token possibilities (beam-search).
**end_top_log_probs**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**end_top_log_probs**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``
Log probabilities for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
**end_top_index**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**end_top_index**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.LongTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``
Indices for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
**cls_logits**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**cls_logits**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.FloatTensor`` of shape ``(batch_size,)``
Log probabilities for the ``is_impossible`` label of the answers.
"""
...
...
pytorch_transformers/modeling_xlnet.py
View file @
3b8b0e01
...
...
@@ -1169,19 +1169,19 @@ class XLNetForQuestionAnswering(XLNetPreTrainedModel):
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
**loss**: (`optional`, returned if both ``start_positions`` and ``end_positions`` are provided) ``torch.FloatTensor`` of shape ``(1,)``:
Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.
**start_top_log_probs**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**start_top_log_probs**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top)``
Log probabilities for the top config.start_n_top start token possibilities (beam-search).
**start_top_index**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**start_top_index**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.LongTensor`` of shape ``(batch_size, config.start_n_top)``
Indices for the top config.start_n_top start token possibilities (beam-search).
**end_top_log_probs**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**end_top_log_probs**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``
Log probabilities for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
**end_top_index**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**end_top_index**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.LongTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``
Indices for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
**cls_logits**:
`
(`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
**cls_logits**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
``torch.FloatTensor`` of shape ``(batch_size,)``
Log probabilities for the ``is_impossible`` label of the answers.
**mems**:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment