Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
2276bf69
Commit
2276bf69
authored
Nov 14, 2019
by
Rémi Louf
Browse files
update the examples, docs and template
parent
022525b0
Changes
13
Hide whitespace changes
Inline
Side-by-side
Showing
13 changed files
with
35 additions
and
39 deletions
+35
-39
README.md
README.md
+4
-4
docs/source/main_classes/optimizer_schedules.rst
docs/source/main_classes/optimizer_schedules.rst
+5
-9
docs/source/migration.md
docs/source/migration.md
+4
-4
examples/contrib/run_openai_gpt.py
examples/contrib/run_openai_gpt.py
+2
-2
examples/contrib/run_swag.py
examples/contrib/run_swag.py
+2
-2
examples/distillation/distiller.py
examples/distillation/distiller.py
+4
-4
examples/distillation/run_squad_w_distillation.py
examples/distillation/run_squad_w_distillation.py
+2
-2
examples/run_glue.py
examples/run_glue.py
+2
-2
examples/run_lm_finetuning.py
examples/run_lm_finetuning.py
+2
-2
examples/run_multiple_choice.py
examples/run_multiple_choice.py
+2
-2
examples/run_ner.py
examples/run_ner.py
+2
-2
examples/run_squad.py
examples/run_squad.py
+2
-2
templates/adding_a_new_example_script/run_xxx.py
templates/adding_a_new_example_script/run_xxx.py
+2
-2
No files found.
README.md
View file @
2276bf69
...
@@ -521,12 +521,12 @@ Here is a conversion examples from `BertAdam` with a linear warmup and decay sch
...
@@ -521,12 +521,12 @@ Here is a conversion examples from `BertAdam` with a linear warmup and decay sch
# Parameters:
# Parameters:
lr
=
1e-3
lr
=
1e-3
max_grad_norm
=
1.0
max_grad_norm
=
1.0
num_t
otal
_steps
=
1000
num_t
raining
_steps
=
1000
num_warmup_steps
=
100
num_warmup_steps
=
100
warmup_proportion
=
float
(
num_warmup_steps
)
/
float
(
num_t
otal
_steps
)
# 0.1
warmup_proportion
=
float
(
num_warmup_steps
)
/
float
(
num_t
raining
_steps
)
# 0.1
### Previously BertAdam optimizer was instantiated like this:
### Previously BertAdam optimizer was instantiated like this:
optimizer
=
BertAdam
(
model
.
parameters
(),
lr
=
lr
,
schedule
=
'warmup_linear'
,
warmup
=
warmup_proportion
,
t_total
=
num_t
otal
_steps
)
optimizer
=
BertAdam
(
model
.
parameters
(),
lr
=
lr
,
schedule
=
'warmup_linear'
,
warmup
=
warmup_proportion
,
t_total
=
num_t
raining
_steps
)
### and used like this:
### and used like this:
for
batch
in
train_data
:
for
batch
in
train_data
:
loss
=
model
(
batch
)
loss
=
model
(
batch
)
...
@@ -535,7 +535,7 @@ for batch in train_data:
...
@@ -535,7 +535,7 @@ for batch in train_data:
### In Transformers, optimizer and schedules are splitted and instantiated like this:
### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer
=
AdamW
(
model
.
parameters
(),
lr
=
lr
,
correct_bias
=
False
)
# To reproduce BertAdam specific behavior set correct_bias=False
optimizer
=
AdamW
(
model
.
parameters
(),
lr
=
lr
,
correct_bias
=
False
)
# To reproduce BertAdam specific behavior set correct_bias=False
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
num_warmup_steps
,
t_total
=
num_total
_steps
)
# PyTorch scheduler
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
num_warmup_steps
,
num_training_steps
=
num_training
_steps
)
# PyTorch scheduler
### and used like this:
### and used like this:
for
batch
in
train_data
:
for
batch
in
train_data
:
model
.
train
()
model
.
train
()
...
...
docs/source/main_classes/optimizer_schedules.rst
View file @
2276bf69
...
@@ -18,19 +18,17 @@ Schedules
...
@@ -18,19 +18,17 @@ Schedules
Learning Rate Schedules
Learning Rate Schedules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.ConstantLRSchedule
.. autofunction:: transformers.get_constant_schedule
:members:
.. autoclass:: transformers.WarmupConstantSchedule
.. autofunction:: transformers.get_constant_schedule_with_warmup
:members:
.. image:: /imgs/warmup_constant_schedule.png
.. image:: /imgs/warmup_constant_schedule.png
:target: /imgs/warmup_constant_schedule.png
:target: /imgs/warmup_constant_schedule.png
:alt:
:alt:
.. auto
class
:: transformers.
WarmupC
osine
S
chedule
.. auto
function
:: transformers.
get_c
osine
_s
chedule
_with_warmup
:members:
:members:
.. image:: /imgs/warmup_cosine_schedule.png
.. image:: /imgs/warmup_cosine_schedule.png
...
@@ -38,8 +36,7 @@ Learning Rate Schedules
...
@@ -38,8 +36,7 @@ Learning Rate Schedules
:alt:
:alt:
.. autoclass:: transformers.WarmupCosineWithHardRestartsSchedule
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
:members:
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
:target: /imgs/warmup_cosine_hard_restarts_schedule.png
:target: /imgs/warmup_cosine_hard_restarts_schedule.png
...
@@ -47,8 +44,7 @@ Learning Rate Schedules
...
@@ -47,8 +44,7 @@ Learning Rate Schedules
.. autoclass:: transformers.WarmupLinearSchedule
.. autofunction:: transformers.get_linear_schedule_with_warmup
:members:
.. image:: /imgs/warmup_linear_schedule.png
.. image:: /imgs/warmup_linear_schedule.png
:target: /imgs/warmup_linear_schedule.png
:target: /imgs/warmup_linear_schedule.png
...
...
docs/source/migration.md
View file @
2276bf69
...
@@ -84,12 +84,12 @@ Here is a conversion examples from `BertAdam` with a linear warmup and decay sch
...
@@ -84,12 +84,12 @@ Here is a conversion examples from `BertAdam` with a linear warmup and decay sch
# Parameters:
# Parameters:
lr
=
1e-3
lr
=
1e-3
max_grad_norm
=
1.0
max_grad_norm
=
1.0
num_t
otal
_steps
=
1000
num_t
raining
_steps
=
1000
num_warmup_steps
=
100
num_warmup_steps
=
100
warmup_proportion
=
float
(
num_warmup_steps
)
/
float
(
num_t
otal
_steps
)
# 0.1
warmup_proportion
=
float
(
num_warmup_steps
)
/
float
(
num_t
raining
_steps
)
# 0.1
### Previously BertAdam optimizer was instantiated like this:
### Previously BertAdam optimizer was instantiated like this:
optimizer
=
BertAdam
(
model
.
parameters
(),
lr
=
lr
,
schedule
=
'warmup_linear'
,
warmup
=
warmup_proportion
,
t_total
=
num_total
_steps
)
optimizer
=
BertAdam
(
model
.
parameters
(),
lr
=
lr
,
schedule
=
'warmup_linear'
,
warmup
=
warmup_proportion
,
num_training_steps
=
num_training
_steps
)
### and used like this:
### and used like this:
for
batch
in
train_data
:
for
batch
in
train_data
:
loss
=
model
(
batch
)
loss
=
model
(
batch
)
...
@@ -98,7 +98,7 @@ for batch in train_data:
...
@@ -98,7 +98,7 @@ for batch in train_data:
### In Transformers, optimizer and schedules are splitted and instantiated like this:
### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer
=
AdamW
(
model
.
parameters
(),
lr
=
lr
,
correct_bias
=
False
)
# To reproduce BertAdam specific behavior set correct_bias=False
optimizer
=
AdamW
(
model
.
parameters
(),
lr
=
lr
,
correct_bias
=
False
)
# To reproduce BertAdam specific behavior set correct_bias=False
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
num_warmup_steps
,
t_total
=
num_total
_steps
)
# PyTorch scheduler
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
num_warmup_steps
,
num_training_steps
=
num_training
_steps
)
# PyTorch scheduler
### and used like this:
### and used like this:
for
batch
in
train_data
:
for
batch
in
train_data
:
loss
=
model
(
batch
)
loss
=
model
(
batch
)
...
...
examples/contrib/run_openai_gpt.py
View file @
2276bf69
...
@@ -41,7 +41,7 @@ from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
...
@@ -41,7 +41,7 @@ from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
from
transformers
import
(
OpenAIGPTDoubleHeadsModel
,
OpenAIGPTTokenizer
,
from
transformers
import
(
OpenAIGPTDoubleHeadsModel
,
OpenAIGPTTokenizer
,
AdamW
,
cached_path
,
WEIGHTS_NAME
,
CONFIG_NAME
,
AdamW
,
cached_path
,
WEIGHTS_NAME
,
CONFIG_NAME
,
WarmupL
inear
S
chedule
)
get_l
inear
_s
chedule
_with_warmup
)
ROCSTORIES_URL
=
"https://s3.amazonaws.com/datasets.huggingface.co/ROCStories.tar.gz"
ROCSTORIES_URL
=
"https://s3.amazonaws.com/datasets.huggingface.co/ROCStories.tar.gz"
...
@@ -211,7 +211,7 @@ def main():
...
@@ -211,7 +211,7 @@ def main():
{
'params'
:
[
p
for
n
,
p
in
param_optimizer
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
param_optimizer
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
do_train
:
if
args
.
do_train
:
nb_tr_steps
,
tr_loss
,
exp_average_loss
=
0
,
0
,
None
nb_tr_steps
,
tr_loss
,
exp_average_loss
=
0
,
0
,
None
...
...
examples/contrib/run_swag.py
View file @
2276bf69
...
@@ -42,7 +42,7 @@ from tqdm import tqdm, trange
...
@@ -42,7 +42,7 @@ from tqdm import tqdm, trange
from
transformers
import
(
WEIGHTS_NAME
,
BertConfig
,
from
transformers
import
(
WEIGHTS_NAME
,
BertConfig
,
BertForMultipleChoice
,
BertTokenizer
)
BertForMultipleChoice
,
BertTokenizer
)
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
logger
=
logging
.
getLogger
(
__name__
)
logger
=
logging
.
getLogger
(
__name__
)
...
@@ -322,7 +322,7 @@ def train(args, train_dataset, model, tokenizer):
...
@@ -322,7 +322,7 @@ def train(args, train_dataset, model, tokenizer):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
examples/distillation/distiller.py
View file @
2276bf69
...
@@ -35,7 +35,7 @@ try:
...
@@ -35,7 +35,7 @@ try:
except
:
except
:
from
tensorboardX
import
SummaryWriter
from
tensorboardX
import
SummaryWriter
from
transformers
import
WarmupL
inear
S
chedule
from
transformers
import
get_l
inear
_s
chedule
_with_warmup
from
utils
import
logger
from
utils
import
logger
from
lm_seqs_dataset
import
LmSeqsDataset
from
lm_seqs_dataset
import
LmSeqsDataset
...
@@ -137,9 +137,9 @@ class Distiller:
...
@@ -137,9 +137,9 @@ class Distiller:
betas
=
(
0.9
,
0.98
))
betas
=
(
0.9
,
0.98
))
warmup_steps
=
math
.
ceil
(
num_train_optimization_steps
*
params
.
warmup_prop
)
warmup_steps
=
math
.
ceil
(
num_train_optimization_steps
*
params
.
warmup_prop
)
self
.
scheduler
=
WarmupL
inear
S
chedule
(
self
.
optimizer
,
self
.
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
self
.
optimizer
,
warmup_steps
=
warmup_steps
,
num_
warmup_steps
=
warmup_steps
,
t_total
=
num_train_optimization_steps
)
num_training_steps
=
num_train_optimization_steps
)
if
self
.
fp16
:
if
self
.
fp16
:
try
:
try
:
...
...
examples/distillation/run_squad_w_distillation.py
View file @
2276bf69
...
@@ -46,7 +46,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
...
@@ -46,7 +46,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
XLNetTokenizer
,
XLNetTokenizer
,
DistilBertConfig
,
DistilBertForQuestionAnswering
,
DistilBertTokenizer
)
DistilBertConfig
,
DistilBertForQuestionAnswering
,
DistilBertTokenizer
)
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
from
..utils_squad
import
(
read_squad_examples
,
convert_examples_to_features
,
from
..utils_squad
import
(
read_squad_examples
,
convert_examples_to_features
,
RawResult
,
write_predictions
,
RawResult
,
write_predictions
,
...
@@ -101,7 +101,7 @@ def train(args, train_dataset, model, tokenizer, teacher=None):
...
@@ -101,7 +101,7 @@ def train(args, train_dataset, model, tokenizer, teacher=None):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
examples/run_glue.py
View file @
2276bf69
...
@@ -49,7 +49,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
...
@@ -49,7 +49,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
DistilBertForSequenceClassification
,
DistilBertForSequenceClassification
,
DistilBertTokenizer
)
DistilBertTokenizer
)
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
from
transformers
import
glue_compute_metrics
as
compute_metrics
from
transformers
import
glue_compute_metrics
as
compute_metrics
from
transformers
import
glue_output_modes
as
output_modes
from
transformers
import
glue_output_modes
as
output_modes
...
@@ -100,7 +100,7 @@ def train(args, train_dataset, model, tokenizer):
...
@@ -100,7 +100,7 @@ def train(args, train_dataset, model, tokenizer):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
examples/run_lm_finetuning.py
View file @
2276bf69
...
@@ -42,7 +42,7 @@ except:
...
@@ -42,7 +42,7 @@ except:
from
tqdm
import
tqdm
,
trange
from
tqdm
import
tqdm
,
trange
from
transformers
import
(
WEIGHTS_NAME
,
AdamW
,
WarmupL
inear
S
chedule
,
from
transformers
import
(
WEIGHTS_NAME
,
AdamW
,
get_l
inear
_s
chedule
_with_warmup
,
BertConfig
,
BertForMaskedLM
,
BertTokenizer
,
BertConfig
,
BertForMaskedLM
,
BertTokenizer
,
GPT2Config
,
GPT2LMHeadModel
,
GPT2Tokenizer
,
GPT2Config
,
GPT2LMHeadModel
,
GPT2Tokenizer
,
OpenAIGPTConfig
,
OpenAIGPTLMHeadModel
,
OpenAIGPTTokenizer
,
OpenAIGPTConfig
,
OpenAIGPTLMHeadModel
,
OpenAIGPTTokenizer
,
...
@@ -185,7 +185,7 @@ def train(args, train_dataset, model, tokenizer):
...
@@ -185,7 +185,7 @@ def train(args, train_dataset, model, tokenizer):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
examples/run_multiple_choice.py
View file @
2276bf69
...
@@ -43,7 +43,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
...
@@ -43,7 +43,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
XLNetTokenizer
,
RobertaConfig
,
XLNetTokenizer
,
RobertaConfig
,
RobertaForMultipleChoice
,
RobertaTokenizer
)
RobertaForMultipleChoice
,
RobertaTokenizer
)
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
from
utils_multiple_choice
import
(
convert_examples_to_features
,
processors
)
from
utils_multiple_choice
import
(
convert_examples_to_features
,
processors
)
...
@@ -101,7 +101,7 @@ def train(args, train_dataset, model, tokenizer):
...
@@ -101,7 +101,7 @@ def train(args, train_dataset, model, tokenizer):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
examples/run_ner.py
View file @
2276bf69
...
@@ -33,7 +33,7 @@ from torch.utils.data.distributed import DistributedSampler
...
@@ -33,7 +33,7 @@ from torch.utils.data.distributed import DistributedSampler
from
tqdm
import
tqdm
,
trange
from
tqdm
import
tqdm
,
trange
from
utils_ner
import
convert_examples_to_features
,
get_labels
,
read_examples_from_file
from
utils_ner
import
convert_examples_to_features
,
get_labels
,
read_examples_from_file
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
from
transformers
import
WEIGHTS_NAME
,
BertConfig
,
BertForTokenClassification
,
BertTokenizer
from
transformers
import
WEIGHTS_NAME
,
BertConfig
,
BertForTokenClassification
,
BertTokenizer
from
transformers
import
RobertaConfig
,
RobertaForTokenClassification
,
RobertaTokenizer
from
transformers
import
RobertaConfig
,
RobertaForTokenClassification
,
RobertaTokenizer
...
@@ -80,7 +80,7 @@ def train(args, train_dataset, model, tokenizer, labels, pad_token_label_id):
...
@@ -80,7 +80,7 @@ def train(args, train_dataset, model, tokenizer, labels, pad_token_label_id):
{
"params"
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
"weight_decay"
:
0.0
}
{
"params"
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
"weight_decay"
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
examples/run_squad.py
View file @
2276bf69
...
@@ -45,7 +45,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
...
@@ -45,7 +45,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
XLNetTokenizer
,
XLNetTokenizer
,
DistilBertConfig
,
DistilBertForQuestionAnswering
,
DistilBertTokenizer
)
DistilBertConfig
,
DistilBertForQuestionAnswering
,
DistilBertTokenizer
)
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
from
utils_squad
import
(
read_squad_examples
,
convert_examples_to_features
,
from
utils_squad
import
(
read_squad_examples
,
convert_examples_to_features
,
RawResult
,
write_predictions
,
RawResult
,
write_predictions
,
...
@@ -100,7 +100,7 @@ def train(args, train_dataset, model, tokenizer):
...
@@ -100,7 +100,7 @@ def train(args, train_dataset, model, tokenizer):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
templates/adding_a_new_example_script/run_xxx.py
View file @
2276bf69
...
@@ -43,7 +43,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
...
@@ -43,7 +43,7 @@ from transformers import (WEIGHTS_NAME, BertConfig,
XLNetTokenizer
,
XLNetTokenizer
,
DistilBertConfig
,
DistilBertForQuestionAnswering
,
DistilBertTokenizer
)
DistilBertConfig
,
DistilBertForQuestionAnswering
,
DistilBertTokenizer
)
from
transformers
import
AdamW
,
WarmupL
inear
S
chedule
from
transformers
import
AdamW
,
get_l
inear
_s
chedule
_with_warmup
from
utils_squad
import
(
read_squad_examples
,
convert_examples_to_features
,
from
utils_squad
import
(
read_squad_examples
,
convert_examples_to_features
,
RawResult
,
write_predictions
,
RawResult
,
write_predictions
,
...
@@ -98,7 +98,7 @@ def train(args, train_dataset, model, tokenizer):
...
@@ -98,7 +98,7 @@ def train(args, train_dataset, model, tokenizer):
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
{
'params'
:
[
p
for
n
,
p
in
model
.
named_parameters
()
if
any
(
nd
in
n
for
nd
in
no_decay
)],
'weight_decay'
:
0.0
}
]
]
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
optimizer
=
AdamW
(
optimizer_grouped_parameters
,
lr
=
args
.
learning_rate
,
eps
=
args
.
adam_epsilon
)
scheduler
=
WarmupL
inear
S
chedule
(
optimizer
,
warmup_steps
=
args
.
warmup_steps
,
t_total
=
t_total
)
scheduler
=
get_l
inear
_s
chedule
_with_warmup
(
optimizer
,
num_
warmup_steps
=
args
.
warmup_steps
,
num_training_steps
=
t_total
)
if
args
.
fp16
:
if
args
.
fp16
:
try
:
try
:
from
apex
import
amp
from
apex
import
amp
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment