Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
dcuai
dlexamples
Commits
c0f05c10
Commit
c0f05c10
authored
Nov 29, 2022
by
hepj
Browse files
更新transformer代码
parent
c056df78
Changes
321
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
2142 additions
and
0 deletions
+2142
-0
PyTorch/NLP/new-Transformer/docs/overview.rst
PyTorch/NLP/new-Transformer/docs/overview.rst
+74
-0
PyTorch/NLP/new-Transformer/docs/requirements.txt
PyTorch/NLP/new-Transformer/docs/requirements.txt
+2
-0
PyTorch/NLP/new-Transformer/docs/tasks.rst
PyTorch/NLP/new-Transformer/docs/tasks.rst
+61
-0
PyTorch/NLP/new-Transformer/docs/tutorial_classifying_names.rst
...h/NLP/new-Transformer/docs/tutorial_classifying_names.rst
+415
-0
PyTorch/NLP/new-Transformer/docs/tutorial_simple_lstm.rst
PyTorch/NLP/new-Transformer/docs/tutorial_simple_lstm.rst
+518
-0
PyTorch/NLP/new-Transformer/env.sh
PyTorch/NLP/new-Transformer/env.sh
+39
-0
PyTorch/NLP/new-Transformer/examples/.gitignore
PyTorch/NLP/new-Transformer/examples/.gitignore
+0
-0
PyTorch/NLP/new-Transformer/examples/translation/README.md
PyTorch/NLP/new-Transformer/examples/translation/README.md
+0
-0
PyTorch/NLP/new-Transformer/examples/translation/prepare-iwslt14.sh
...P/new-Transformer/examples/translation/prepare-iwslt14.sh
+0
-0
PyTorch/NLP/new-Transformer/examples/translation/prepare-wmt14en2de.sh
...ew-Transformer/examples/translation/prepare-wmt14en2de.sh
+0
-0
PyTorch/NLP/new-Transformer/examples/translation/prepare-wmt14en2fr.sh
...ew-Transformer/examples/translation/prepare-wmt14en2fr.sh
+0
-0
PyTorch/NLP/new-Transformer/fairseq/__init__.py
PyTorch/NLP/new-Transformer/fairseq/__init__.py
+45
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/__init__.py
PyTorch/NLP/new-Transformer/fairseq/benchmark/__init__.py
+7
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/benchmark_multihead_attention.py
...former/fairseq/benchmark/benchmark_multihead_attention.py
+172
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_dataset.py
...ch/NLP/new-Transformer/fairseq/benchmark/dummy_dataset.py
+36
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_lm.py
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_lm.py
+83
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_masked_lm.py
.../NLP/new-Transformer/fairseq/benchmark/dummy_masked_lm.py
+94
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_model.py
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_model.py
+96
-0
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_mt.py
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_mt.py
+119
-0
PyTorch/NLP/new-Transformer/fairseq/binarizer.py
PyTorch/NLP/new-Transformer/fairseq/binarizer.py
+381
-0
No files found.
Too many changes to show.
To preserve performance only
321 of 321+
files are displayed.
Plain diff
Email patch
PyTorch/NLP/new-Transformer/docs/overview.rst
0 → 100644
View file @
c0f05c10
Overview
========
Fairseq can be extended through user-supplied `plug-ins
<https://en.wikipedia.org/wiki/Plug-in_(computing)>`_. We support five kinds of
plug-ins:
- :ref:`Models` define the neural network architecture and encapsulate all of the
learnable parameters.
- :ref:`Criterions` compute the loss function given the model outputs and targets.
- :ref:`Tasks` store dictionaries and provide helpers for loading/iterating over
Datasets, initializing the Model/Criterion and calculating the loss.
- :ref:`Optimizers` update the Model parameters based on the gradients.
- :ref:`Learning Rate Schedulers` update the learning rate over the course of
training.
**Training Flow**
Given a ``model``, ``criterion``, ``task``, ``optimizer`` and ``lr_scheduler``,
fairseq implements the following high-level training flow::
for epoch in range(num_epochs):
itr = task.get_batch_iterator(task.dataset('train'))
for num_updates, batch in enumerate(itr):
task.train_step(batch, model, criterion, optimizer)
average_and_clip_gradients()
optimizer.step()
lr_scheduler.step_update(num_updates)
lr_scheduler.step(epoch)
where the default implementation for ``task.train_step`` is roughly::
def train_step(self, batch, model, criterion, optimizer, **unused):
loss = criterion(model, batch)
optimizer.backward(loss)
return loss
**Registering new plug-ins**
New plug-ins are *registered* through a set of ``@register`` function
decorators, for example::
@register_model('my_lstm')
class MyLSTM(FairseqEncoderDecoderModel):
(...)
Once registered, new plug-ins can be used with the existing :ref:`Command-line
Tools`. See the Tutorial sections for more detailed walkthroughs of how to add
new plug-ins.
**Loading plug-ins from another directory**
New plug-ins can be defined in a custom module stored in the user system. In
order to import the module, and make the plugin available to *fairseq*, the
command line supports the ``--user-dir`` flag that can be used to specify a
custom location for additional modules to load into *fairseq*.
For example, assuming this directory tree::
/home/user/my-module/
└── __init__.py
with ``__init__.py``::
from fairseq.models import register_model_architecture
from fairseq.models.transformer import transformer_vaswani_wmt_en_de_big
@register_model_architecture('transformer', 'my_transformer')
def transformer_mmt_big(args):
transformer_vaswani_wmt_en_de_big(args)
it is possible to invoke the :ref:`fairseq-train` script with the new architecture with::
fairseq-train ... --user-dir /home/user/my-module -a my_transformer --task translation
PyTorch/NLP/new-Transformer/docs/requirements.txt
0 → 100644
View file @
c0f05c10
sphinx<2.0
sphinx-argparse
PyTorch/NLP/new-Transformer/docs/tasks.rst
0 → 100644
View file @
c0f05c10
..
role
::
hidden
:
class
:
hidden
-
section
..
module
::
fairseq
.
tasks
..
_Tasks
:
Tasks
=====
Tasks
store
dictionaries
and
provide
helpers
for
loading
/
iterating
over
Datasets
,
initializing
the
Model
/
Criterion
and
calculating
the
loss
.
Tasks
can
be
selected
via
the
``--
task
``
command
-
line
argument
.
Once
selected
,
a
task
may
expose
additional
command
-
line
arguments
for
further
configuration
.
Example
usage
::
#
setup
the
task
(
e
.
g
.,
load
dictionaries
)
task
=
fairseq
.
tasks
.
setup_task
(
args
)
#
build
model
and
criterion
model
=
task
.
build_model
(
args
)
criterion
=
task
.
build_criterion
(
args
)
#
load
datasets
task
.
load_dataset
(
'train'
)
task
.
load_dataset
(
'valid'
)
#
iterate
over
mini
-
batches
of
data
batch_itr
=
task
.
get_batch_iterator
(
task
.
dataset
(
'train'
),
max_tokens
=
4096
,
)
for
batch
in
batch_itr
:
#
compute
the
loss
loss
,
sample_size
,
logging_output
=
task
.
get_loss
(
model
,
criterion
,
batch
,
)
loss
.
backward
()
Translation
-----------
..
autoclass
::
fairseq
.
tasks
.
translation
.
TranslationTask
..
_language
modeling
:
Language
Modeling
-----------------
..
autoclass
::
fairseq
.
tasks
.
language_modeling
.
LanguageModelingTask
Adding
new
tasks
----------------
..
autofunction
::
fairseq
.
tasks
.
register_task
..
autoclass
::
fairseq
.
tasks
.
FairseqTask
:
members
:
:
undoc
-
members
:
PyTorch/NLP/new-Transformer/docs/tutorial_classifying_names.rst
0 → 100644
View file @
c0f05c10
Tutorial
:
Classifying
Names
with
a
Character
-
Level
RNN
======================================================
In
this
tutorial
we
will
extend
fairseq
to
support
*
classification
*
tasks
.
In
particular
we
will
re
-
implement
the
PyTorch
tutorial
for
`
Classifying
Names
with
a
Character
-
Level
RNN
<
https
://
pytorch
.
org
/
tutorials
/
intermediate
/
char_rnn_classification_tutorial
.
html
>`
_
in
fairseq
.
It
is
recommended
to
quickly
skim
that
tutorial
before
beginning
this
one
.
This
tutorial
covers
:
1.
**
Preprocessing
the
data
**
to
create
dictionaries
.
2.
**
Registering
a
new
Model
**
that
encodes
an
input
sentence
with
a
simple
RNN
and
predicts
the
output
label
.
3.
**
Registering
a
new
Task
**
that
loads
our
dictionaries
and
dataset
.
4.
**
Training
the
Model
**
using
the
existing
command
-
line
tools
.
5.
**
Writing
an
evaluation
script
**
that
imports
fairseq
and
allows
us
to
interactively
evaluate
our
model
on
new
inputs
.
1.
Preprocessing
the
data
-------------------------
The
original
tutorial
provides
raw
data
,
but
we
'll work with a modified version
of the data that is already tokenized into characters and split into separate
train, valid and test sets.
Download and extract the data from here:
`tutorial_names.tar.gz <https://dl.fbaipublicfiles.com/fairseq/data/tutorial_names.tar.gz>`_
Once extracted, let'
s
preprocess
the
data
using
the
:
ref
:`
fairseq
-
preprocess
`
command
-
line
tool
to
create
the
dictionaries
.
While
this
tool
is
primarily
intended
for
sequence
-
to
-
sequence
problems
,
we
're able to reuse it here by
treating the label as a "target" sequence of length 1. We'
ll
also
output
the
preprocessed
files
in
"raw"
format
using
the
``--
dataset
-
impl
``
option
to
enhance
readability
:
..
code
-
block
::
console
>
fairseq
-
preprocess
\
--
trainpref
names
/
train
--
validpref
names
/
valid
--
testpref
names
/
test
\
--
source
-
lang
input
--
target
-
lang
label
\
--
destdir
names
-
bin
--
dataset
-
impl
raw
After
running
the
above
command
you
should
see
a
new
directory
,
:
file
:`
names
-
bin
/`,
containing
the
dictionaries
for
*
inputs
*
and
*
labels
*.
2.
Registering
a
new
Model
--------------------------
Next
we
'll register a new model in fairseq that will encode an input sentence
with a simple RNN and predict the output label. Compared to the original PyTorch
tutorial, our version will also work with batches of data and GPU Tensors.
First let'
s
copy
the
simple
RNN
module
implemented
in
the
`
PyTorch
tutorial
<
https
://
pytorch
.
org
/
tutorials
/
intermediate
/
char_rnn_classification_tutorial
.
html
#
creating
-
the
-
network
>`
_
.
Create
a
new
file
named
:
file
:`
fairseq
/
models
/
rnn_classifier
.
py
`
with
the
following
contents
::
import
torch
import
torch
.
nn
as
nn
class
RNN
(
nn
.
Module
):
def
__init__
(
self
,
input_size
,
hidden_size
,
output_size
):
super
(
RNN
,
self
).
__init__
()
self
.
hidden_size
=
hidden_size
self
.
i2h
=
nn
.
Linear
(
input_size
+
hidden_size
,
hidden_size
)
self
.
i2o
=
nn
.
Linear
(
input_size
+
hidden_size
,
output_size
)
self
.
softmax
=
nn
.
LogSoftmax
(
dim
=
1
)
def
forward
(
self
,
input
,
hidden
):
combined
=
torch
.
cat
((
input
,
hidden
),
1
)
hidden
=
self
.
i2h
(
combined
)
output
=
self
.
i2o
(
combined
)
output
=
self
.
softmax
(
output
)
return
output
,
hidden
def
initHidden
(
self
):
return
torch
.
zeros
(
1
,
self
.
hidden_size
)
We
must
also
*
register
*
this
model
with
fairseq
using
the
:
func
:`~
fairseq
.
models
.
register_model
`
function
decorator
.
Once
the
model
is
registered
we
'll be able to use it with the existing :ref:`Command-line Tools`.
All registered models must implement the :class:`~fairseq.models.BaseFairseqModel`
interface, so we'
ll
create
a
small
wrapper
class
in
the
same
file
and
register
it
in
fairseq
with
the
name
``
'rnn_classifier'
``::
from
fairseq
.
models
import
BaseFairseqModel
,
register_model
#
Note
:
the
register_model
"decorator"
should
immediately
precede
the
#
definition
of
the
Model
class
.
@
register_model
(
'rnn_classifier'
)
class
FairseqRNNClassifier
(
BaseFairseqModel
):
@
staticmethod
def
add_args
(
parser
):
#
Models
can
override
this
method
to
add
new
command
-
line
arguments
.
#
Here
we
'll add a new command-line argument to configure the
# dimensionality of the hidden state.
parser.add_argument(
'
--
hidden
-
dim
', type=int, metavar='
N
',
help='
dimensionality
of
the
hidden
state
',
)
@classmethod
def build_model(cls, args, task):
# Fairseq initializes models by calling the ``build_model()``
# function. This provides more flexibility, since the returned model
# instance can be of a different type than the one that was called.
# In this case we'
ll
just
return
a
FairseqRNNClassifier
instance
.
#
Initialize
our
RNN
module
rnn
=
RNN
(
#
We
'll define the Task in the next section, but for now just
# notice that the task holds the dictionaries for the "source"
# (i.e., the input sentence) and "target" (i.e., the label).
input_size=len(task.source_dictionary),
hidden_size=args.hidden_dim,
output_size=len(task.target_dictionary),
)
# Return the wrapped version of the module
return FairseqRNNClassifier(
rnn=rnn,
input_vocab=task.source_dictionary,
)
def __init__(self, rnn, input_vocab):
super(FairseqRNNClassifier, self).__init__()
self.rnn = rnn
self.input_vocab = input_vocab
# The RNN module in the tutorial expects one-hot inputs, so we can
# precompute the identity matrix to help convert from indices to
# one-hot vectors. We register it as a buffer so that it is moved to
# the GPU when ``cuda()`` is called.
self.register_buffer('
one_hot_inputs
', torch.eye(len(input_vocab)))
def forward(self, src_tokens, src_lengths):
# The inputs to the ``forward()`` function are determined by the
# Task, and in particular the ``'
net_input
'`` key in each
# mini-batch. We'
ll
define
the
Task
in
the
next
section
,
but
for
#
now
just
know
that
*
src_tokens
*
has
shape
`(
batch
,
src_len
)`
and
#
*
src_lengths
*
has
shape
`(
batch
)`.
bsz
,
max_src_len
=
src_tokens
.
size
()
#
Initialize
the
RNN
hidden
state
.
Compared
to
the
original
PyTorch
#
tutorial
we
'll also handle batched inputs and work on the GPU.
hidden = self.rnn.initHidden()
hidden = hidden.repeat(bsz, 1) # expand for batched inputs
hidden = hidden.to(src_tokens.device) # move to GPU
for i in range(max_src_len):
# WARNING: The inputs have padding, so we should mask those
# elements here so that padding doesn'
t
affect
the
results
.
#
This
is
left
as
an
exercise
for
the
reader
.
The
padding
symbol
#
is
given
by
``
self
.
input_vocab
.
pad
()``
and
the
unpadded
length
#
of
each
input
is
given
by
*
src_lengths
*.
#
One
-
hot
encode
a
batch
of
input
characters
.
input
=
self
.
one_hot_inputs
[
src_tokens
[:,
i
].
long
()]
#
Feed
the
input
to
our
RNN
.
output
,
hidden
=
self
.
rnn
(
input
,
hidden
)
#
Return
the
final
output
state
for
making
a
prediction
return
output
Finally
let
's define a *named architecture* with the configuration for our
model. This is done with the :func:`~fairseq.models.register_model_architecture`
function decorator. Thereafter this named architecture can be used with the
``--arch`` command-line argument, e.g., ``--arch pytorch_tutorial_rnn``::
from fairseq.models import register_model_architecture
# The first argument to ``register_model_architecture()`` should be the name
# of the model we registered above (i.e., '
rnn_classifier
'). The function we
# register here should take a single argument *args* and modify it in-place
# to match the desired architecture.
@register_model_architecture('
rnn_classifier
', '
pytorch_tutorial_rnn
')
def pytorch_tutorial_rnn(args):
# We use ``getattr()`` to prioritize arguments that are explicitly given
# on the command-line, so that the defaults defined below are only used
# when no other value has been specified.
args.hidden_dim = getattr(args, '
hidden_dim
', 128)
3. Registering a new Task
-------------------------
Now we'
ll
register
a
new
:
class
:`~
fairseq
.
tasks
.
FairseqTask
`
that
will
load
our
dictionaries
and
dataset
.
Tasks
can
also
control
how
the
data
is
batched
into
mini
-
batches
,
but
in
this
tutorial
we
'll reuse the batching provided by
:class:`fairseq.data.LanguagePairDataset`.
Create a new file named :file:`fairseq/tasks/simple_classification.py` with the
following contents::
import os
import torch
from fairseq.data import Dictionary, LanguagePairDataset
from fairseq.tasks import FairseqTask, register_task
@register_task('
simple_classification
')
class SimpleClassificationTask(LegacyFairseqTask):
@staticmethod
def add_args(parser):
# Add some command-line arguments for specifying where the data is
# located and the maximum supported input length.
parser.add_argument('
data
', metavar='
FILE
',
help='
file
prefix
for
data
')
parser.add_argument('
--
max
-
positions
', default=1024, type=int,
help='
max
input
length
')
@classmethod
def setup_task(cls, args, **kwargs):
# Here we can perform any setup required for the task. This may include
# loading Dictionaries, initializing shared Embedding layers, etc.
# In this case we'
ll
just
load
the
Dictionaries
.
input_vocab
=
Dictionary
.
load
(
os
.
path
.
join
(
args
.
data
,
'dict.input.txt'
))
label_vocab
=
Dictionary
.
load
(
os
.
path
.
join
(
args
.
data
,
'dict.label.txt'
))
print
(
'| [input] dictionary: {} types'
.
format
(
len
(
input_vocab
)))
print
(
'| [label] dictionary: {} types'
.
format
(
len
(
label_vocab
)))
return
SimpleClassificationTask
(
args
,
input_vocab
,
label_vocab
)
def
__init__
(
self
,
args
,
input_vocab
,
label_vocab
):
super
().
__init__
(
args
)
self
.
input_vocab
=
input_vocab
self
.
label_vocab
=
label_vocab
def
load_dataset
(
self
,
split
,
**
kwargs
):
"""Load a given dataset split (e.g., train, valid, test)."""
prefix
=
os
.
path
.
join
(
self
.
args
.
data
,
'{}.input-label'
.
format
(
split
))
#
Read
input
sentences
.
sentences
,
lengths
=
[],
[]
with
open
(
prefix
+
'.input'
,
encoding
=
'utf-8'
)
as
file
:
for
line
in
file
:
sentence
=
line
.
strip
()
#
Tokenize
the
sentence
,
splitting
on
spaces
tokens
=
self
.
input_vocab
.
encode_line
(
sentence
,
add_if_not_exist
=
False
,
)
sentences
.
append
(
tokens
)
lengths
.
append
(
tokens
.
numel
())
#
Read
labels
.
labels
=
[]
with
open
(
prefix
+
'.label'
,
encoding
=
'utf-8'
)
as
file
:
for
line
in
file
:
label
=
line
.
strip
()
labels
.
append
(
#
Convert
label
to
a
numeric
ID
.
torch
.
LongTensor
([
self
.
label_vocab
.
add_symbol
(
label
)])
)
assert
len
(
sentences
)
==
len
(
labels
)
print
(
'| {} {} {} examples'
.
format
(
self
.
args
.
data
,
split
,
len
(
sentences
)))
#
We
reuse
LanguagePairDataset
since
classification
can
be
modeled
as
a
#
sequence
-
to
-
sequence
task
where
the
target
sequence
has
length
1.
self
.
datasets
[
split
]
=
LanguagePairDataset
(
src
=
sentences
,
src_sizes
=
lengths
,
src_dict
=
self
.
input_vocab
,
tgt
=
labels
,
tgt_sizes
=
torch
.
ones
(
len
(
labels
)),
#
targets
have
length
1
tgt_dict
=
self
.
label_vocab
,
left_pad_source
=
False
,
#
Since
our
target
is
a
single
class
label
,
there
's no need for
# teacher forcing. If we set this to ``True`` then our Model'
s
#
``
forward
()``
method
would
receive
an
additional
argument
called
#
*
prev_output_tokens
*
that
would
contain
a
shifted
version
of
the
#
target
sequence
.
input_feeding
=
False
,
)
def
max_positions
(
self
):
"""Return the max input length allowed by the task."""
#
The
source
should
be
less
than
*
args
.
max_positions
*
and
the
"target"
#
has
max
length
1.
return
(
self
.
args
.
max_positions
,
1
)
@
property
def
source_dictionary
(
self
):
"""Return the source :class:`~fairseq.data.Dictionary`."""
return
self
.
input_vocab
@
property
def
target_dictionary
(
self
):
"""Return the target :class:`~fairseq.data.Dictionary`."""
return
self
.
label_vocab
#
We
could
override
this
method
if
we
wanted
more
control
over
how
batches
#
are
constructed
,
but
it
's not necessary for this tutorial since we can
# reuse the batching provided by LanguagePairDataset.
#
# def get_batch_iterator(
# self, dataset, max_tokens=None, max_sentences=None, max_positions=None,
# ignore_invalid_inputs=False, required_batch_size_multiple=1,
# seed=1, num_shards=1, shard_id=0, num_workers=0, epoch=1,
# data_buffer_size=0, disable_iterator_cache=False,
# ):
# (...)
4. Training the Model
---------------------
Now we'
re
ready
to
train
the
model
.
We
can
use
the
existing
:
ref
:`
fairseq
-
train
`
command
-
line
tool
for
this
,
making
sure
to
specify
our
new
Task
(``--
task
simple_classification
``)
and
Model
architecture
(``--
arch
pytorch_tutorial_rnn
``):
..
note
::
You
can
also
configure
the
dimensionality
of
the
hidden
state
by
passing
the
``--
hidden
-
dim
``
argument
to
:
ref
:`
fairseq
-
train
`.
..
code
-
block
::
console
>
fairseq
-
train
names
-
bin
\
--
task
simple_classification
\
--
arch
pytorch_tutorial_rnn
\
--
optimizer
adam
--
lr
0.001
--
lr
-
shrink
0.5
\
--
max
-
tokens
1000
(...)
|
epoch
027
|
loss
1.200
|
ppl
2.30
|
wps
15728
|
ups
119.4
|
wpb
116
|
bsz
116
|
num_updates
3726
|
lr
1.5625e-05
|
gnorm
1.290
|
clip
0
%
|
oom
0
|
wall
32
|
train_wall
21
|
epoch
027
|
valid
on
'valid'
subset
|
valid_loss
1.41304
|
valid_ppl
2.66
|
num_updates
3726
|
best
1.41208
|
done
training
in
31.6
seconds
The
model
files
should
appear
in
the
:
file
:`
checkpoints
/`
directory
.
5.
Writing
an
evaluation
script
-------------------------------
Finally
we
can
write
a
short
script
to
evaluate
our
model
on
new
inputs
.
Create
a
new
file
named
:
file
:`
eval_classifier
.
py
`
with
the
following
contents
::
from
fairseq
import
checkpoint_utils
,
data
,
options
,
tasks
#
Parse
command
-
line
arguments
for
generation
parser
=
options
.
get_generation_parser
(
default_task
=
'simple_classification'
)
args
=
options
.
parse_args_and_arch
(
parser
)
#
Setup
task
task
=
tasks
.
setup_task
(
args
)
#
Load
model
print
(
'| loading model from {}'
.
format
(
args
.
path
))
models
,
_model_args
=
checkpoint_utils
.
load_model_ensemble
([
args
.
path
],
task
=
task
)
model
=
models
[
0
]
while
True
:
sentence
=
input
(
'\nInput: '
)
#
Tokenize
into
characters
chars
=
' '
.
join
(
list
(
sentence
.
strip
()))
tokens
=
task
.
source_dictionary
.
encode_line
(
chars
,
add_if_not_exist
=
False
,
)
#
Build
mini
-
batch
to
feed
to
the
model
batch
=
data
.
language_pair_dataset
.
collate
(
samples
=[{
'id'
:
-
1
,
'source'
:
tokens
}],
#
bsz
=
1
pad_idx
=
task
.
source_dictionary
.
pad
(),
eos_idx
=
task
.
source_dictionary
.
eos
(),
left_pad_source
=
False
,
input_feeding
=
False
,
)
#
Feed
batch
to
the
model
and
get
predictions
preds
=
model
(**
batch
[
'net_input'
])
#
Print
top
3
predictions
and
their
log
-
probabilities
top_scores
,
top_labels
=
preds
[
0
].
topk
(
k
=
3
)
for
score
,
label_idx
in
zip
(
top_scores
,
top_labels
):
label_name
=
task
.
target_dictionary
.
string
([
label_idx
])
print
(
'({:.2f})\t{}'
.
format
(
score
,
label_name
))
Now
we
can
evaluate
our
model
interactively
.
Note
that
we
have
included
the
original
data
path
(:
file
:`
names
-
bin
/`)
so
that
the
dictionaries
can
be
loaded
:
..
code
-
block
::
console
>
python
eval_classifier
.
py
names
-
bin
--
path
checkpoints
/
checkpoint_best
.
pt
|
[
input
]
dictionary
:
64
types
|
[
label
]
dictionary
:
24
types
|
loading
model
from
checkpoints
/
checkpoint_best
.
pt
Input
:
Satoshi
(-
0.61
)
Japanese
(-
1.20
)
Arabic
(-
2.86
)
Italian
Input
:
Sinbad
(-
0.30
)
Arabic
(-
1.76
)
English
(-
4.08
)
Russian
PyTorch/NLP/new-Transformer/docs/tutorial_simple_lstm.rst
0 → 100644
View file @
c0f05c10
Tutorial
:
Simple
LSTM
=====================
In
this
tutorial
we
will
extend
fairseq
by
adding
a
new
:
class
:`~
fairseq
.
models
.
FairseqEncoderDecoderModel
`
that
encodes
a
source
sentence
with
an
LSTM
and
then
passes
the
final
hidden
state
to
a
second
LSTM
that
decodes
the
target
sentence
(
without
attention
).
This
tutorial
covers
:
1.
**
Writing
an
Encoder
and
Decoder
**
to
encode
/
decode
the
source
/
target
sentence
,
respectively
.
2.
**
Registering
a
new
Model
**
so
that
it
can
be
used
with
the
existing
:
ref
:`
Command
-
line
tools
`.
3.
**
Training
the
Model
**
using
the
existing
command
-
line
tools
.
4.
**
Making
generation
faster
**
by
modifying
the
Decoder
to
use
:
ref
:`
Incremental
decoding
`.
1.
Building
an
Encoder
and
Decoder
----------------------------------
In
this
section
we
'll define a simple LSTM Encoder and Decoder. All Encoders
should implement the :class:`~fairseq.models.FairseqEncoder` interface and
Decoders should implement the :class:`~fairseq.models.FairseqDecoder` interface.
These interfaces themselves extend :class:`torch.nn.Module`, so FairseqEncoders
and FairseqDecoders can be written and used in the same ways as ordinary PyTorch
Modules.
Encoder
~~~~~~~
Our Encoder will embed the tokens in the source sentence, feed them to a
:class:`torch.nn.LSTM` and return the final hidden state. To create our encoder
save the following in a new file named :file:`fairseq/models/simple_lstm.py`::
import torch.nn as nn
from fairseq import utils
from fairseq.models import FairseqEncoder
class SimpleLSTMEncoder(FairseqEncoder):
def __init__(
self, args, dictionary, embed_dim=128, hidden_dim=128, dropout=0.1,
):
super().__init__(dictionary)
self.args = args
# Our encoder will embed the inputs before feeding them to the LSTM.
self.embed_tokens = nn.Embedding(
num_embeddings=len(dictionary),
embedding_dim=embed_dim,
padding_idx=dictionary.pad(),
)
self.dropout = nn.Dropout(p=dropout)
# We'
ll
use
a
single
-
layer
,
unidirectional
LSTM
for
simplicity
.
self
.
lstm
=
nn
.
LSTM
(
input_size
=
embed_dim
,
hidden_size
=
hidden_dim
,
num_layers
=
1
,
bidirectional
=
False
,
batch_first
=
True
,
)
def
forward
(
self
,
src_tokens
,
src_lengths
):
#
The
inputs
to
the
``
forward
()``
function
are
determined
by
the
#
Task
,
and
in
particular
the
``
'net_input'
``
key
in
each
#
mini
-
batch
.
We
discuss
Tasks
in
the
next
tutorial
,
but
for
now
just
#
know
that
*
src_tokens
*
has
shape
`(
batch
,
src_len
)`
and
*
src_lengths
*
#
has
shape
`(
batch
)`.
#
Note
that
the
source
is
typically
padded
on
the
left
.
This
can
be
#
configured
by
adding
the
`--
left
-
pad
-
source
"False"
`
command
-
line
#
argument
,
but
here
we
'll make the Encoder handle either kind of
# padding by converting everything to be right-padded.
if self.args.left_pad_source:
# Convert left-padding to right-padding.
src_tokens = utils.convert_padding_direction(
src_tokens,
padding_idx=self.dictionary.pad(),
left_to_right=True
)
# Embed the source.
x = self.embed_tokens(src_tokens)
# Apply dropout.
x = self.dropout(x)
# Pack the sequence into a PackedSequence object to feed to the LSTM.
x = nn.utils.rnn.pack_padded_sequence(x, src_lengths, batch_first=True)
# Get the output from the LSTM.
_outputs, (final_hidden, _final_cell) = self.lstm(x)
# Return the Encoder'
s
output
.
This
can
be
any
object
and
will
be
#
passed
directly
to
the
Decoder
.
return
{
#
this
will
have
shape
`(
bsz
,
hidden_dim
)`
'final_hidden'
:
final_hidden
.
squeeze
(
0
),
}
#
Encoders
are
required
to
implement
this
method
so
that
we
can
rearrange
#
the
order
of
the
batch
elements
during
inference
(
e
.
g
.,
beam
search
).
def
reorder_encoder_out
(
self
,
encoder_out
,
new_order
):
"""
Reorder encoder output according to `new_order`.
Args:
encoder_out: output from the ``forward()`` method
new_order (LongTensor): desired order
Returns:
`encoder_out` rearranged according to `new_order`
"""
final_hidden
=
encoder_out
[
'final_hidden'
]
return
{
'final_hidden'
:
final_hidden
.
index_select
(
0
,
new_order
),
}
Decoder
~~~~~~~
Our
Decoder
will
predict
the
next
word
,
conditioned
on
the
Encoder
's final
hidden state and an embedded representation of the previous target word -- which
is sometimes called *teacher forcing*. More specifically, we'
ll
use
a
:
class
:`
torch
.
nn
.
LSTM
`
to
produce
a
sequence
of
hidden
states
that
we
'll project
to the size of the output vocabulary to predict each target word.
::
import torch
from fairseq.models import FairseqDecoder
class SimpleLSTMDecoder(FairseqDecoder):
def __init__(
self, dictionary, encoder_hidden_dim=128, embed_dim=128, hidden_dim=128,
dropout=0.1,
):
super().__init__(dictionary)
# Our decoder will embed the inputs before feeding them to the LSTM.
self.embed_tokens = nn.Embedding(
num_embeddings=len(dictionary),
embedding_dim=embed_dim,
padding_idx=dictionary.pad(),
)
self.dropout = nn.Dropout(p=dropout)
# We'
ll
use
a
single
-
layer
,
unidirectional
LSTM
for
simplicity
.
self
.
lstm
=
nn
.
LSTM
(
#
For
the
first
layer
we
'll concatenate the Encoder'
s
final
hidden
#
state
with
the
embedded
target
tokens
.
input_size
=
encoder_hidden_dim
+
embed_dim
,
hidden_size
=
hidden_dim
,
num_layers
=
1
,
bidirectional
=
False
,
)
#
Define
the
output
projection
.
self
.
output_projection
=
nn
.
Linear
(
hidden_dim
,
len
(
dictionary
))
#
During
training
Decoders
are
expected
to
take
the
entire
target
sequence
#
(
shifted
right
by
one
position
)
and
produce
logits
over
the
vocabulary
.
#
The
*
prev_output_tokens
*
tensor
begins
with
the
end
-
of
-
sentence
symbol
,
#
``
dictionary
.
eos
()``,
followed
by
the
target
sequence
.
def
forward
(
self
,
prev_output_tokens
,
encoder_out
):
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
Returns:
tuple:
- the last decoder layer's output of shape
`(batch, tgt_len, vocab)`
- the last decoder layer's attention weights of shape
`(batch, tgt_len, src_len)`
"""
bsz
,
tgt_len
=
prev_output_tokens
.
size
()
#
Extract
the
final
hidden
state
from
the
Encoder
.
final_encoder_hidden
=
encoder_out
[
'final_hidden'
]
#
Embed
the
target
sequence
,
which
has
been
shifted
right
by
one
#
position
and
now
starts
with
the
end
-
of
-
sentence
symbol
.
x
=
self
.
embed_tokens
(
prev_output_tokens
)
#
Apply
dropout
.
x
=
self
.
dropout
(
x
)
#
Concatenate
the
Encoder
's final hidden state to *every* embedded
# target token.
x = torch.cat(
[x, final_encoder_hidden.unsqueeze(1).expand(bsz, tgt_len, -1)],
dim=2,
)
# Using PackedSequence objects in the Decoder is harder than in the
# Encoder, since the targets are not sorted in descending length order,
# which is a requirement of ``pack_padded_sequence()``. Instead we'
ll
#
feed
nn
.
LSTM
directly
.
initial_state
=
(
final_encoder_hidden
.
unsqueeze
(
0
),
#
hidden
torch
.
zeros_like
(
final_encoder_hidden
).
unsqueeze
(
0
),
#
cell
)
output
,
_
=
self
.
lstm
(
x
.
transpose
(
0
,
1
),
#
convert
to
shape
`(
tgt_len
,
bsz
,
dim
)`
initial_state
,
)
x
=
output
.
transpose
(
0
,
1
)
#
convert
to
shape
`(
bsz
,
tgt_len
,
hidden
)`
#
Project
the
outputs
to
the
size
of
the
vocabulary
.
x
=
self
.
output_projection
(
x
)
#
Return
the
logits
and
``
None
``
for
the
attention
weights
return
x
,
None
2.
Registering
the
Model
------------------------
Now
that
we
've defined our Encoder and Decoder we must *register* our model with
fairseq using the :func:`~fairseq.models.register_model` function decorator.
Once the model is registered we'
ll
be
able
to
use
it
with
the
existing
:
ref
:`
Command
-
line
Tools
`.
All
registered
models
must
implement
the
:
class
:`~
fairseq
.
models
.
BaseFairseqModel
`
interface
.
For
sequence
-
to
-
sequence
models
(
i
.
e
.,
any
model
with
a
single
Encoder
and
Decoder
),
we
can
instead
implement
the
:
class
:`~
fairseq
.
models
.
FairseqEncoderDecoderModel
`
interface
.
Create
a
small
wrapper
class
in
the
same
file
and
register
it
in
fairseq
with
the
name
``
'simple_lstm'
``::
from
fairseq
.
models
import
FairseqEncoderDecoderModel
,
register_model
#
Note
:
the
register_model
"decorator"
should
immediately
precede
the
#
definition
of
the
Model
class
.
@
register_model
(
'simple_lstm'
)
class
SimpleLSTMModel
(
FairseqEncoderDecoderModel
):
@
staticmethod
def
add_args
(
parser
):
#
Models
can
override
this
method
to
add
new
command
-
line
arguments
.
#
Here
we
'll add some new command-line arguments to configure dropout
# and the dimensionality of the embeddings and hidden states.
parser.add_argument(
'
--
encoder
-
embed
-
dim
', type=int, metavar='
N
',
help='
dimensionality
of
the
encoder
embeddings
',
)
parser.add_argument(
'
--
encoder
-
hidden
-
dim
', type=int, metavar='
N
',
help='
dimensionality
of
the
encoder
hidden
state
',
)
parser.add_argument(
'
--
encoder
-
dropout
', type=float, default=0.1,
help='
encoder
dropout
probability
',
)
parser.add_argument(
'
--
decoder
-
embed
-
dim
', type=int, metavar='
N
',
help='
dimensionality
of
the
decoder
embeddings
',
)
parser.add_argument(
'
--
decoder
-
hidden
-
dim
', type=int, metavar='
N
',
help='
dimensionality
of
the
decoder
hidden
state
',
)
parser.add_argument(
'
--
decoder
-
dropout
', type=float, default=0.1,
help='
decoder
dropout
probability
',
)
@classmethod
def build_model(cls, args, task):
# Fairseq initializes models by calling the ``build_model()``
# function. This provides more flexibility, since the returned model
# instance can be of a different type than the one that was called.
# In this case we'
ll
just
return
a
SimpleLSTMModel
instance
.
#
Initialize
our
Encoder
and
Decoder
.
encoder
=
SimpleLSTMEncoder
(
args
=
args
,
dictionary
=
task
.
source_dictionary
,
embed_dim
=
args
.
encoder_embed_dim
,
hidden_dim
=
args
.
encoder_hidden_dim
,
dropout
=
args
.
encoder_dropout
,
)
decoder
=
SimpleLSTMDecoder
(
dictionary
=
task
.
target_dictionary
,
encoder_hidden_dim
=
args
.
encoder_hidden_dim
,
embed_dim
=
args
.
decoder_embed_dim
,
hidden_dim
=
args
.
decoder_hidden_dim
,
dropout
=
args
.
decoder_dropout
,
)
model
=
SimpleLSTMModel
(
encoder
,
decoder
)
#
Print
the
model
architecture
.
print
(
model
)
return
model
#
We
could
override
the
``
forward
()``
if
we
wanted
more
control
over
how
#
the
encoder
and
decoder
interact
,
but
it
's not necessary for this
# tutorial since we can inherit the default implementation provided by
# the FairseqEncoderDecoderModel base class, which looks like:
#
# def forward(self, src_tokens, src_lengths, prev_output_tokens):
# encoder_out = self.encoder(src_tokens, src_lengths)
# decoder_out = self.decoder(prev_output_tokens, encoder_out)
# return decoder_out
Finally let'
s
define
a
*
named
architecture
*
with
the
configuration
for
our
model
.
This
is
done
with
the
:
func
:`~
fairseq
.
models
.
register_model_architecture
`
function
decorator
.
Thereafter
this
named
architecture
can
be
used
with
the
``--
arch
``
command
-
line
argument
,
e
.
g
.,
``--
arch
tutorial_simple_lstm
``::
from
fairseq
.
models
import
register_model_architecture
#
The
first
argument
to
``
register_model_architecture
()``
should
be
the
name
#
of
the
model
we
registered
above
(
i
.
e
.,
'simple_lstm'
).
The
function
we
#
register
here
should
take
a
single
argument
*
args
*
and
modify
it
in
-
place
#
to
match
the
desired
architecture
.
@
register_model_architecture
(
'simple_lstm'
,
'tutorial_simple_lstm'
)
def
tutorial_simple_lstm
(
args
):
#
We
use
``
getattr
()``
to
prioritize
arguments
that
are
explicitly
given
#
on
the
command
-
line
,
so
that
the
defaults
defined
below
are
only
used
#
when
no
other
value
has
been
specified
.
args
.
encoder_embed_dim
=
getattr
(
args
,
'encoder_embed_dim'
,
256
)
args
.
encoder_hidden_dim
=
getattr
(
args
,
'encoder_hidden_dim'
,
256
)
args
.
decoder_embed_dim
=
getattr
(
args
,
'decoder_embed_dim'
,
256
)
args
.
decoder_hidden_dim
=
getattr
(
args
,
'decoder_hidden_dim'
,
256
)
3.
Training
the
Model
---------------------
Now
we
're ready to train the model. We can use the existing :ref:`fairseq-train`
command-line tool for this, making sure to specify our new Model architecture
(``--arch tutorial_simple_lstm``).
.. note::
Make sure you'
ve
already
preprocessed
the
data
from
the
IWSLT
example
in
the
:
file
:`
examples
/
translation
/`
directory
.
..
code
-
block
::
console
>
fairseq
-
train
data
-
bin
/
iwslt14
.
tokenized
.
de
-
en
\
--
arch
tutorial_simple_lstm
\
--
encoder
-
dropout
0.2
--
decoder
-
dropout
0.2
\
--
optimizer
adam
--
lr
0.005
--
lr
-
shrink
0.5
\
--
max
-
tokens
12000
(...)
|
epoch
052
|
loss
4.027
|
ppl
16.30
|
wps
420805
|
ups
39.7
|
wpb
9841
|
bsz
400
|
num_updates
20852
|
lr
1.95313e-05
|
gnorm
0.218
|
clip
0
%
|
oom
0
|
wall
529
|
train_wall
396
|
epoch
052
|
valid
on
'valid'
subset
|
valid_loss
4.74989
|
valid_ppl
26.91
|
num_updates
20852
|
best
4.74954
The
model
files
should
appear
in
the
:
file
:`
checkpoints
/`
directory
.
While
this
model
architecture
is
not
very
good
,
we
can
use
the
:
ref
:`
fairseq
-
generate
`
script
to
generate
translations
and
compute
our
BLEU
score
over
the
test
set
:
..
code
-
block
::
console
>
fairseq
-
generate
data
-
bin
/
iwslt14
.
tokenized
.
de
-
en
\
--
path
checkpoints
/
checkpoint_best
.
pt
\
--
beam
5
\
--
remove
-
bpe
(...)
|
Translated
6750
sentences
(
153132
tokens
)
in
17.3
s
(
389.12
sentences
/
s
,
8827.68
tokens
/
s
)
|
Generate
test
with
beam
=
5
:
BLEU4
=
8.18
,
38.8
/
12.1
/
4.7
/
2.0
(
BP
=
1.000
,
ratio
=
1.066
,
syslen
=
139865
,
reflen
=
131146
)
4.
Making
generation
faster
---------------------------
While
autoregressive
generation
from
sequence
-
to
-
sequence
models
is
inherently
slow
,
our
implementation
above
is
especially
slow
because
it
recomputes
the
entire
sequence
of
Decoder
hidden
states
for
every
output
token
(
i
.
e
.,
it
is
``
O
(
n
^
2
)``).
We
can
make
this
significantly
faster
by
instead
caching
the
previous
hidden
states
.
In
fairseq
this
is
called
:
ref
:`
Incremental
decoding
`.
Incremental
decoding
is
a
special
mode
at
inference
time
where
the
Model
only
receives
a
single
timestep
of
input
corresponding
to
the
immediately
previous
output
token
(
for
teacher
forcing
)
and
must
produce
the
next
output
incrementally
.
Thus
the
model
must
cache
any
long
-
term
state
that
is
needed
about
the
sequence
,
e
.
g
.,
hidden
states
,
convolutional
states
,
etc
.
To
implement
incremental
decoding
we
will
modify
our
model
to
implement
the
:
class
:`~
fairseq
.
models
.
FairseqIncrementalDecoder
`
interface
.
Compared
to
the
standard
:
class
:`~
fairseq
.
models
.
FairseqDecoder
`
interface
,
the
incremental
decoder
interface
allows
``
forward
()``
methods
to
take
an
extra
keyword
argument
(*
incremental_state
*)
that
can
be
used
to
cache
state
across
time
-
steps
.
Let
's replace our ``SimpleLSTMDecoder`` with an incremental one::
import torch
from fairseq.models import FairseqIncrementalDecoder
class SimpleLSTMDecoder(FairseqIncrementalDecoder):
def __init__(
self, dictionary, encoder_hidden_dim=128, embed_dim=128, hidden_dim=128,
dropout=0.1,
):
# This remains the same as before.
super().__init__(dictionary)
self.embed_tokens = nn.Embedding(
num_embeddings=len(dictionary),
embedding_dim=embed_dim,
padding_idx=dictionary.pad(),
)
self.dropout = nn.Dropout(p=dropout)
self.lstm = nn.LSTM(
input_size=encoder_hidden_dim + embed_dim,
hidden_size=hidden_dim,
num_layers=1,
bidirectional=False,
)
self.output_projection = nn.Linear(hidden_dim, len(dictionary))
# We now take an additional kwarg (*incremental_state*) for caching the
# previous hidden and cell states.
def forward(self, prev_output_tokens, encoder_out, incremental_state=None):
if incremental_state is not None:
# If the *incremental_state* argument is not ``None`` then we are
# in incremental inference mode. While *prev_output_tokens* will
# still contain the entire decoded prefix, we will only use the
# last step and assume that the rest of the state is cached.
prev_output_tokens = prev_output_tokens[:, -1:]
# This remains the same as before.
bsz, tgt_len = prev_output_tokens.size()
final_encoder_hidden = encoder_out['
final_hidden
']
x = self.embed_tokens(prev_output_tokens)
x = self.dropout(x)
x = torch.cat(
[x, final_encoder_hidden.unsqueeze(1).expand(bsz, tgt_len, -1)],
dim=2,
)
# We will now check the cache and load the cached previous hidden and
# cell states, if they exist, otherwise we will initialize them to
# zeros (as before). We will use the ``utils.get_incremental_state()``
# and ``utils.set_incremental_state()`` helpers.
initial_state = utils.get_incremental_state(
self, incremental_state, '
prev_state
',
)
if initial_state is None:
# first time initialization, same as the original version
initial_state = (
final_encoder_hidden.unsqueeze(0), # hidden
torch.zeros_like(final_encoder_hidden).unsqueeze(0), # cell
)
# Run one step of our LSTM.
output, latest_state = self.lstm(x.transpose(0, 1), initial_state)
# Update the cache with the latest hidden and cell states.
utils.set_incremental_state(
self, incremental_state, '
prev_state
', latest_state,
)
# This remains the same as before
x = output.transpose(0, 1)
x = self.output_projection(x)
return x, None
# The ``FairseqIncrementalDecoder`` interface also requires implementing a
# ``reorder_incremental_state()`` method, which is used during beam search
# to select and reorder the incremental state.
def reorder_incremental_state(self, incremental_state, new_order):
# Load the cached state.
prev_state = utils.get_incremental_state(
self, incremental_state, '
prev_state
',
)
# Reorder batches according to *new_order*.
reordered_state = (
prev_state[0].index_select(1, new_order), # hidden
prev_state[1].index_select(1, new_order), # cell
)
# Update the cached state.
utils.set_incremental_state(
self, incremental_state, '
prev_state
', reordered_state,
)
Finally, we can rerun generation and observe the speedup:
.. code-block:: console
# Before
> fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/checkpoint_best.pt \
--beam 5 \
--remove-bpe
(...)
| Translated 6750 sentences (153132 tokens) in 17.3s (389.12 sentences/s, 8827.68 tokens/s)
| Generate test with beam=5: BLEU4 = 8.18, 38.8/12.1/4.7/2.0 (BP=1.000, ratio=1.066, syslen=139865, reflen=131146)
# After
> fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/checkpoint_best.pt \
--beam 5 \
--remove-bpe
(...)
| Translated 6750 sentences (153132 tokens) in 5.5s (1225.54 sentences/s, 27802.94 tokens/s)
| Generate test with beam=5: BLEU4 = 8.18, 38.8/12.1/4.7/2.0 (BP=1.000, ratio=1.066, syslen=139865, reflen=131146)
PyTorch/NLP/new-Transformer/env.sh
0 → 100644
View file @
c0f05c10
#module load compiler/intel/2021.3.0
export
ROCM_PATH
=
/work/home/hepj/app/dtk-22.04.2
echo
$ROCM_PATH
export
HIP_PATH
=
${
ROCM_PATH
}
/hip
export
AMDGPU_TARGETS
=
"gfx900;gfx906"
export
PATH
=
${
ROCM_PATH
}
/bin:
${
ROCM_PATH
}
/llvm/bin:
${
ROCM_PATH
}
/hcc/bin:
${
ROCM_PATH
}
/hip/bin:
$PATH
export
LD_LIBRARY_PATH
=
${
ROCM_PATH
}
/lib:
${
ROCM_PATH
}
/lib64:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
${
ROCM_PATH
}
/hip/lib:
${
ROCM_PATH
}
/llvm/lib:
${
ROCM_PATH
}
/opencl/lib/x86_64:
$LD_LIBRARY_PATH
#export LD_LIBRARY_PATH=${ROCM_PATH}/hip/lib:${ROCM_PATH}/llvm/lib:$LD_LIBRARY_PATH
#export C_INCLUDE_PATH=${ROCM_PATH}/include:${ROCM_PATH}/llvm/include${C_INCLUDE_PATH:+:${C_INCLUDE_PATH}}
export
C_INCLUDE_PATH
=
${
ROCM_PATH
}
/include:
${
ROCM_PATH
}
/llvm/include:/opencl/include
export
CPLUS_INCLUDE_PATH
=
${
ROCM_PATH
}
/include:
${
ROCM_PATH
}
/llvm/include
export
PATH
=
${
ROCM_PATH
}
/miopen/bin:
${
ROCM_PATH
}
/rocblas/bin:
${
ROCM_PATH
}
/hipsparse/bin:
$PATH
export
LD_LIBRARY_PATH
=
${
ROCM_PATH
}
/miopen/lib:
${
ROCM_PATH
}
/rocblas/lib:
$LD_LIBRARY_PATH
export
MIOPEN_SYSTEM_DB_PATH
=
${
ROCM_PATH
}
/miopen/share/miopen/db/
export
LD_LIBRARY_PATH
=
/usr/lib64:
$LD_LIBRARY_PATH
export
LIBRARY_PATH
=
/usr/lib64:
$LIBRARY_PATH
export
C_INCLUDE_PATH
=
/public/software/apps/deeplearning-depend/gflags-2.1.2-build/include:/public/software/apps/DeepLearning/PyTorch/glog-build/include:
$C_INCLUDE_PATH
export
DEEP_PATH
=
/public/software/apps/deeplearning-depend
export
LD_LIBRARY_PATH
=
/work/home/hepj/.pyenv/versions/3.7.0/envs/torch/lib/python3.7/site-packages/Pillow.libs/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
/public/software/apps/deeplearning-depend/lmdb-0.9.24-build/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
/public/software/apps/deeplearning-depend/opencv-2.4.13.6-build/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
${
DEEP_PATH
}
/glog-build/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
${
DEEP_PATH
}
/opencv-2.4.13.6-build/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
${
DEEP_PATH
}
/openblas-0.3.7-build/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
${
DEEP_PATH
}
/gflags-2.1.2-build/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
${
DEEP_PATH
}
/lib/:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
/public/software/apps/DeepLearning/PyTorch/openmp-build/lib:
$LD_LIBRARY_PATH
#使用rocblas添加的路径
export
LD_LIBRARY_PATH
=
/work/home/hepj/app/dtk-22.04.2/lib:
$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
=
/work/home/hepj/app/dtk-22.04.2/rocblas/lib/benchmark_tool:
$LD_LIBRARY_PATH
\ No newline at end of file
PyTorch/NLP/Transformer/examples/.gitignore
→
PyTorch/NLP/
new-
Transformer/examples/.gitignore
View file @
c0f05c10
File moved
PyTorch/NLP/Transformer/examples/translation/README.md
→
PyTorch/NLP/
new-
Transformer/examples/translation/README.md
View file @
c0f05c10
File moved
PyTorch/NLP/Transformer/examples/translation/prepare-iwslt14.sh
→
PyTorch/NLP/
new-
Transformer/examples/translation/prepare-iwslt14.sh
View file @
c0f05c10
File moved
PyTorch/NLP/Transformer/examples/translation/prepare-wmt14en2de.sh
→
PyTorch/NLP/
new-
Transformer/examples/translation/prepare-wmt14en2de.sh
View file @
c0f05c10
File moved
PyTorch/NLP/Transformer/examples/translation/prepare-wmt14en2fr.sh
→
PyTorch/NLP/
new-
Transformer/examples/translation/prepare-wmt14en2fr.sh
View file @
c0f05c10
File moved
PyTorch/NLP/new-Transformer/fairseq/__init__.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
"""isort:skip_file"""
import
os
import
sys
try
:
from
.version
import
__version__
# noqa
except
ImportError
:
version_txt
=
os
.
path
.
join
(
os
.
path
.
dirname
(
__file__
),
"version.txt"
)
with
open
(
version_txt
)
as
f
:
__version__
=
f
.
read
().
strip
()
__all__
=
[
"pdb"
]
# backwards compatibility to support `from fairseq.X import Y`
from
fairseq.distributed
import
utils
as
distributed_utils
from
fairseq.logging
import
meters
,
metrics
,
progress_bar
# noqa
sys
.
modules
[
"fairseq.distributed_utils"
]
=
distributed_utils
sys
.
modules
[
"fairseq.meters"
]
=
meters
sys
.
modules
[
"fairseq.metrics"
]
=
metrics
sys
.
modules
[
"fairseq.progress_bar"
]
=
progress_bar
# initialize hydra
from
fairseq.dataclass.initialize
import
hydra_init
hydra_init
()
import
fairseq.criterions
# noqa
import
fairseq.distributed
# noqa
import
fairseq.models
# noqa
import
fairseq.modules
# noqa
import
fairseq.optim
# noqa
import
fairseq.optim.lr_scheduler
# noqa
import
fairseq.pdb
# noqa
import
fairseq.scoring
# noqa
import
fairseq.tasks
# noqa
import
fairseq.token_generation_constraints
# noqa
import
fairseq.benchmark
# noqa
import
fairseq.model_parallel
# noqa
PyTorch/NLP/new-Transformer/fairseq/benchmark/__init__.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
# import models/tasks to register them
from
.
import
dummy_dataset
,
dummy_lm
,
dummy_masked_lm
,
dummy_model
,
dummy_mt
# noqa
PyTorch/NLP/new-Transformer/fairseq/benchmark/benchmark_multihead_attention.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import
itertools
import
random
import
torch
from
torch.utils
import
benchmark
from
fairseq.modules.multihead_attention
import
MultiheadAttention
BATCH
=
[
20
,
41
,
97
]
SEQ
=
64
EMB
=
48
HEADS
=
4
DROP
=
0.1
DEVICE
=
torch
.
device
(
"cuda"
)
ATTN_MASK_DTYPE
=
[
torch
.
uint8
,
torch
.
bool
,
torch
.
float
]
KEY_PADDING_MASK_DTYPE
=
[
torch
.
uint8
,
torch
.
bool
]
def
_reset_seeds
():
torch
.
manual_seed
(
0
)
random
.
seed
(
0
)
def
_get_mask
(
to_dtype
:
torch
.
dtype
,
dim0
:
int
,
dim1
:
int
):
if
to_dtype
==
torch
.
float
:
mask
=
torch
.
randint
(
0
,
2
,
(
dim0
,
dim1
)).
to
(
dtype
=
torch
.
bool
)
return
mask
.
to
(
dtype
=
to_dtype
).
masked_fill
(
mask
,
-
float
(
"inf"
))
return
torch
.
randint
(
0
,
2
,
(
dim0
,
dim1
)).
to
(
dtype
=
to_dtype
)
def
benchmark_multihead_attention
(
label
=
""
,
attn_dtype
=
torch
.
uint8
,
key_padding_dtype
=
torch
.
uint8
,
add_bias_kv
=
False
,
add_zero_attn
=
False
,
static_kv
=
False
,
batch_size
=
20
,
embedding
=
EMB
,
seq_len
=
SEQ
,
num_heads
=
HEADS
,
):
results
=
[]
# device = torch.device("cuda")
xformers_att_config
=
'{"name": "scaled_dot_product"}'
attn_mask
=
_get_mask
(
to_dtype
=
attn_dtype
,
dim0
=
seq_len
,
dim1
=
seq_len
)
key_padding_mask
=
_get_mask
(
to_dtype
=
key_padding_dtype
,
dim0
=
batch_size
,
dim1
=
seq_len
)
q
=
torch
.
rand
(
seq_len
,
batch_size
,
embedding
,
requires_grad
=
True
)
k
=
torch
.
rand
(
seq_len
,
batch_size
,
embedding
,
requires_grad
=
True
)
v
=
torch
.
rand
(
seq_len
,
batch_size
,
embedding
,
requires_grad
=
True
)
_reset_seeds
()
original_mha
=
MultiheadAttention
(
embedding
,
num_heads
,
dropout
=
0.0
,
xformers_att_config
=
None
,
add_bias_kv
=
add_bias_kv
,
add_zero_attn
=
add_zero_attn
,
)
xformers_mha
=
MultiheadAttention
(
embedding
,
num_heads
,
dropout
=
0.0
,
xformers_att_config
=
xformers_att_config
,
add_bias_kv
=
add_bias_kv
,
add_zero_attn
=
add_zero_attn
,
)
def
original_bench_fw
(
q
,
k
,
v
,
key_padding_mask
,
attn_mask
,
static_kv
):
original_mha
(
query
=
q
,
key
=
k
,
value
=
v
,
key_padding_mask
=
key_padding_mask
,
attn_mask
=
attn_mask
,
static_kv
=
static_kv
,
)
def
xformers_bench_fw
(
q
,
k
,
v
,
key_padding_mask
,
attn_mask
,
static_kv
):
xformers_mha
(
query
=
q
,
key
=
k
,
value
=
v
,
key_padding_mask
=
key_padding_mask
,
attn_mask
=
attn_mask
,
static_kv
=
static_kv
,
)
def
original_bench_fw_bw
(
q
,
k
,
v
,
key_padding_mask
,
attn_mask
,
static_kv
):
output
,
_
=
original_mha
(
query
=
q
,
key
=
k
,
value
=
v
,
key_padding_mask
=
key_padding_mask
,
attn_mask
=
attn_mask
,
static_kv
=
static_kv
,
)
loss
=
torch
.
norm
(
output
)
loss
.
backward
()
def
xformers_bench_fw_bw
(
q
,
k
,
v
,
key_padding_mask
,
attn_mask
,
static_kv
):
output
,
_
=
xformers_mha
(
query
=
q
,
key
=
k
,
value
=
v
,
key_padding_mask
=
key_padding_mask
,
attn_mask
=
attn_mask
,
static_kv
=
static_kv
,
)
loss
=
torch
.
norm
(
output
)
loss
.
backward
()
fns
=
[
original_bench_fw
,
xformers_bench_fw
,
original_bench_fw_bw
,
xformers_bench_fw_bw
,
]
for
fn
in
fns
:
results
.
append
(
benchmark
.
Timer
(
stmt
=
"fn(q, k, v, key_padding_mask, attn_mask, static_kv)"
,
globals
=
{
"q"
:
q
,
"k"
:
k
,
"v"
:
v
,
"key_padding_mask"
:
key_padding_mask
,
"attn_mask"
:
attn_mask
,
"static_kv"
:
static_kv
,
"fn"
:
fn
,
},
label
=
"multihead fw + bw"
,
sub_label
=
f
"
{
fn
.
__name__
}
"
,
description
=
label
,
).
blocked_autorange
(
min_run_time
=
1
)
)
compare
=
benchmark
.
Compare
(
results
)
compare
.
print
()
def
run_benchmarks
():
for
attn_dtype
,
key_padding_dtype
,
add_bias_kv
,
add_zero_attn
in
itertools
.
product
(
ATTN_MASK_DTYPE
,
KEY_PADDING_MASK_DTYPE
,
[
True
,
False
],
[
True
,
False
]
):
label
=
f
"attn_dtype
{
attn_dtype
}
, key_padding_dtype
{
key_padding_dtype
}
,
\
add_bias_kv
{
add_bias_kv
}
, add_zero_attn
{
add_zero_attn
}
"
benchmark_multihead_attention
(
label
=
label
,
attn_dtype
=
attn_dtype
,
key_padding_dtype
=
key_padding_dtype
,
add_bias_kv
=
add_bias_kv
,
add_zero_attn
=
add_zero_attn
,
)
run_benchmarks
()
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_dataset.py
0 → 100644
View file @
c0f05c10
import
numpy
as
np
from
fairseq.data
import
FairseqDataset
class
DummyDataset
(
FairseqDataset
):
def
__init__
(
self
,
batch
,
num_items
,
item_size
):
super
().
__init__
()
self
.
batch
=
batch
self
.
num_items
=
num_items
self
.
item_size
=
item_size
def
__getitem__
(
self
,
index
):
return
index
def
__len__
(
self
):
return
self
.
num_items
def
collater
(
self
,
samples
):
return
self
.
batch
@
property
def
sizes
(
self
):
return
np
.
array
([
self
.
item_size
]
*
self
.
num_items
)
def
num_tokens
(
self
,
index
):
return
self
.
item_size
def
size
(
self
,
index
):
return
self
.
item_size
def
ordered_indices
(
self
):
return
np
.
arange
(
self
.
num_items
)
@
property
def
supports_prefetch
(
self
):
return
False
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_lm.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import
logging
from
dataclasses
import
dataclass
,
field
from
typing
import
Optional
import
torch
from
.dummy_dataset
import
DummyDataset
from
fairseq.data
import
Dictionary
from
fairseq.dataclass
import
FairseqDataclass
from
fairseq.tasks
import
FairseqTask
,
register_task
from
omegaconf
import
II
logger
=
logging
.
getLogger
(
__name__
)
@
dataclass
class
DummyLMConfig
(
FairseqDataclass
):
dict_size
:
int
=
49996
dataset_size
:
int
=
100000
tokens_per_sample
:
int
=
field
(
default
=
512
,
metadata
=
{
"help"
:
"max sequence length"
}
)
add_bos_token
:
bool
=
False
batch_size
:
Optional
[
int
]
=
II
(
"dataset.batch_size"
)
max_tokens
:
Optional
[
int
]
=
II
(
"dataset.max_tokens"
)
max_target_positions
:
int
=
II
(
"task.tokens_per_sample"
)
@
register_task
(
"dummy_lm"
,
dataclass
=
DummyLMConfig
)
class
DummyLMTask
(
FairseqTask
):
def
__init__
(
self
,
cfg
:
DummyLMConfig
):
super
().
__init__
(
cfg
)
# load dictionary
self
.
dictionary
=
Dictionary
()
for
i
in
range
(
cfg
.
dict_size
):
self
.
dictionary
.
add_symbol
(
"word{}"
.
format
(
i
))
self
.
dictionary
.
pad_to_multiple_
(
8
)
# often faster if divisible by 8
logger
.
info
(
"dictionary: {} types"
.
format
(
len
(
self
.
dictionary
)))
seq
=
torch
.
arange
(
cfg
.
tokens_per_sample
+
1
)
+
self
.
dictionary
.
pad
()
+
1
self
.
dummy_src
=
seq
[:
-
1
]
self
.
dummy_tgt
=
seq
[
1
:]
def
load_dataset
(
self
,
split
,
epoch
=
1
,
combine
=
False
,
**
kwargs
):
"""Load a given dataset split.
Args:
split (str): name of the split (e.g., train, valid, test)
"""
if
self
.
cfg
.
batch_size
is
not
None
:
bsz
=
self
.
cfg
.
batch_size
else
:
bsz
=
max
(
1
,
self
.
cfg
.
max_tokens
//
self
.
cfg
.
tokens_per_sample
)
self
.
datasets
[
split
]
=
DummyDataset
(
{
"id"
:
1
,
"net_input"
:
{
"src_tokens"
:
torch
.
stack
([
self
.
dummy_src
for
_
in
range
(
bsz
)]),
"src_lengths"
:
torch
.
full
(
(
bsz
,),
self
.
cfg
.
tokens_per_sample
,
dtype
=
torch
.
long
),
},
"target"
:
torch
.
stack
([
self
.
dummy_tgt
for
_
in
range
(
bsz
)]),
"nsentences"
:
bsz
,
"ntokens"
:
bsz
*
self
.
cfg
.
tokens_per_sample
,
},
num_items
=
self
.
cfg
.
dataset_size
,
item_size
=
self
.
cfg
.
tokens_per_sample
,
)
@
property
def
source_dictionary
(
self
):
return
self
.
dictionary
@
property
def
target_dictionary
(
self
):
return
self
.
dictionary
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_masked_lm.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import
logging
from
dataclasses
import
dataclass
,
field
from
typing
import
Optional
import
torch
from
omegaconf
import
II
from
.dummy_dataset
import
DummyDataset
from
fairseq.data
import
Dictionary
from
fairseq.dataclass
import
FairseqDataclass
from
fairseq.tasks
import
FairseqTask
,
register_task
logger
=
logging
.
getLogger
(
__name__
)
@
dataclass
class
DummyMaskedLMConfig
(
FairseqDataclass
):
dict_size
:
int
=
49996
dataset_size
:
int
=
100000
tokens_per_sample
:
int
=
field
(
default
=
512
,
metadata
=
{
"help"
:
"max number of total tokens over all"
" segments per sample for BERT dataset"
},
)
batch_size
:
Optional
[
int
]
=
II
(
"dataset.batch_size"
)
max_tokens
:
Optional
[
int
]
=
II
(
"dataset.max_tokens"
)
max_target_positions
:
int
=
II
(
"task.tokens_per_sample"
)
@
register_task
(
"dummy_masked_lm"
,
dataclass
=
DummyMaskedLMConfig
)
class
DummyMaskedLMTask
(
FairseqTask
):
def
__init__
(
self
,
cfg
:
DummyMaskedLMConfig
):
super
().
__init__
(
cfg
)
self
.
dictionary
=
Dictionary
()
for
i
in
range
(
cfg
.
dict_size
):
self
.
dictionary
.
add_symbol
(
"word{}"
.
format
(
i
))
logger
.
info
(
"dictionary: {} types"
.
format
(
len
(
self
.
dictionary
)))
# add mask token
self
.
mask_idx
=
self
.
dictionary
.
add_symbol
(
"<mask>"
)
self
.
dictionary
.
pad_to_multiple_
(
8
)
# often faster if divisible by 8
mask_idx
=
0
pad_idx
=
1
seq
=
torch
.
arange
(
cfg
.
tokens_per_sample
)
+
pad_idx
+
1
mask
=
torch
.
arange
(
2
,
cfg
.
tokens_per_sample
,
7
)
# ~15%
src
=
seq
.
clone
()
src
[
mask
]
=
mask_idx
tgt
=
torch
.
full_like
(
seq
,
pad_idx
)
tgt
[
mask
]
=
seq
[
mask
]
self
.
dummy_src
=
src
self
.
dummy_tgt
=
tgt
def
load_dataset
(
self
,
split
,
epoch
=
1
,
combine
=
False
,
**
kwargs
):
"""Load a given dataset split.
Args:
split (str): name of the split (e.g., train, valid, test)
"""
if
self
.
cfg
.
batch_size
is
not
None
:
bsz
=
self
.
cfg
.
batch_size
else
:
bsz
=
max
(
1
,
self
.
cfg
.
max_tokens
//
self
.
cfg
.
tokens_per_sample
)
self
.
datasets
[
split
]
=
DummyDataset
(
{
"id"
:
1
,
"net_input"
:
{
"src_tokens"
:
torch
.
stack
([
self
.
dummy_src
for
_
in
range
(
bsz
)]),
"src_lengths"
:
torch
.
full
(
(
bsz
,),
self
.
cfg
.
tokens_per_sample
,
dtype
=
torch
.
long
),
},
"target"
:
torch
.
stack
([
self
.
dummy_tgt
for
_
in
range
(
bsz
)]),
"nsentences"
:
bsz
,
"ntokens"
:
bsz
*
self
.
cfg
.
tokens_per_sample
,
},
num_items
=
self
.
cfg
.
dataset_size
,
item_size
=
self
.
cfg
.
tokens_per_sample
,
)
@
property
def
source_dictionary
(
self
):
return
self
.
dictionary
@
property
def
target_dictionary
(
self
):
return
self
.
dictionary
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_model.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import
torch.nn
as
nn
import
torch.nn.functional
as
F
from
fairseq.data
import
Dictionary
from
fairseq.models
import
(
FairseqDecoder
,
FairseqLanguageModel
,
register_model
,
register_model_architecture
,
)
@
register_model
(
"dummy_model"
)
class
DummyModel
(
FairseqLanguageModel
):
def
__init__
(
self
,
args
,
encoder
):
super
().
__init__
(
encoder
)
self
.
args
=
args
@
staticmethod
def
add_args
(
parser
):
parser
.
add_argument
(
"--num-layers"
,
type
=
int
,
default
=
24
)
parser
.
add_argument
(
"--embed-dim"
,
type
=
int
,
default
=
1024
)
@
classmethod
def
build_model
(
cls
,
args
,
task
):
encoder
=
DummyEncoder
(
num_embed
=
len
(
task
.
target_dictionary
),
embed_dim
=
args
.
embed_dim
,
num_layers
=
args
.
num_layers
,
)
return
cls
(
args
,
encoder
)
def
forward
(
self
,
src_tokens
,
masked_tokens
=
None
,
**
kwargs
):
return
self
.
decoder
(
src_tokens
,
masked_tokens
=
masked_tokens
)
class
DummyEncoder
(
FairseqDecoder
):
def
__init__
(
self
,
num_embed
=
50000
,
embed_dim
=
1024
,
num_layers
=
24
):
super
().
__init__
(
Dictionary
())
self
.
embed
=
nn
.
Embedding
(
num_embeddings
=
num_embed
,
embedding_dim
=
embed_dim
,
padding_idx
=
0
)
self
.
layers_a
=
nn
.
ModuleList
(
[
nn
.
Sequential
(
nn
.
LayerNorm
(
embed_dim
),
nn
.
Linear
(
embed_dim
,
3
*
embed_dim
),
# q, k, v input projection
nn
.
Linear
(
3
*
embed_dim
,
embed_dim
),
# skip self-attention
nn
.
Linear
(
embed_dim
,
embed_dim
),
# output projection
nn
.
Dropout
(),
)
for
i
in
range
(
num_layers
)
]
)
self
.
layers_b
=
nn
.
ModuleList
(
[
nn
.
Sequential
(
nn
.
LayerNorm
(
embed_dim
),
nn
.
Linear
(
embed_dim
,
4
*
embed_dim
),
# FFN
nn
.
ReLU
(),
nn
.
Linear
(
4
*
embed_dim
,
embed_dim
),
# FFN
nn
.
Dropout
(
0.1
),
)
for
i
in
range
(
num_layers
)
]
)
self
.
out_proj
=
nn
.
Linear
(
embed_dim
,
num_embed
)
def
forward
(
self
,
tokens
,
masked_tokens
=
None
):
x
=
self
.
embed
(
tokens
)
for
layer_a
,
layer_b
in
zip
(
self
.
layers_a
,
self
.
layers_b
):
x
=
x
+
layer_a
(
x
)
x
=
x
+
layer_b
(
x
)
x
=
self
.
out_proj
(
x
)
if
masked_tokens
is
not
None
:
x
=
x
[
masked_tokens
]
return
(
x
,)
def
max_positions
(
self
):
return
1024
def
get_normalized_probs
(
self
,
net_output
,
log_probs
,
sample
=
None
):
logits
=
net_output
[
0
].
float
()
if
log_probs
:
return
F
.
log_softmax
(
logits
,
dim
=-
1
)
else
:
return
F
.
softmax
(
logits
,
dim
=-
1
)
@
register_model_architecture
(
"dummy_model"
,
"dummy_model"
)
def
base_architecture
(
args
):
pass
PyTorch/NLP/new-Transformer/fairseq/benchmark/dummy_mt.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import
logging
import
numpy
as
np
import
torch
from
fairseq.data
import
Dictionary
,
FairseqDataset
from
fairseq.tasks
import
LegacyFairseqTask
,
register_task
logger
=
logging
.
getLogger
(
__name__
)
@
register_task
(
"dummy_mt"
)
class
DummyMTTask
(
LegacyFairseqTask
):
@
staticmethod
def
add_args
(
parser
):
"""Add task-specific arguments to the parser."""
parser
.
add_argument
(
"--dict-size"
,
default
=
49996
,
type
=
int
)
parser
.
add_argument
(
"--dataset-size"
,
default
=
100000
,
type
=
int
)
parser
.
add_argument
(
"--src-len"
,
default
=
30
,
type
=
int
)
parser
.
add_argument
(
"--tgt-len"
,
default
=
30
,
type
=
int
)
def
__init__
(
self
,
args
,
dictionary
):
super
().
__init__
(
args
)
self
.
dictionary
=
dictionary
self
.
seed
=
args
.
seed
dictionary
.
pad_to_multiple_
(
8
)
# often faster if divisible by 8
self
.
dummy_src
=
torch
.
arange
(
args
.
src_len
+
1
)
+
dictionary
.
pad
()
+
1
self
.
dummy_tgt
=
torch
.
arange
(
args
.
tgt_len
+
1
)
+
dictionary
.
pad
()
+
1
@
classmethod
def
setup_task
(
cls
,
args
,
**
kwargs
):
"""Setup the task."""
dictionary
=
Dictionary
()
for
i
in
range
(
args
.
dict_size
):
dictionary
.
add_symbol
(
"word{}"
.
format
(
i
))
logger
.
info
(
"dictionary: {} types"
.
format
(
len
(
dictionary
)))
args
.
max_source_positions
=
args
.
src_len
+
dictionary
.
pad
()
+
2
args
.
max_target_positions
=
args
.
tgt_len
+
dictionary
.
pad
()
+
2
return
cls
(
args
,
dictionary
)
def
load_dataset
(
self
,
split
,
epoch
=
1
,
combine
=
False
,
**
kwargs
):
"""Load a given dataset split.
Args:
split (str): name of the split (e.g., train, valid, test)
"""
item_size
=
max
(
self
.
args
.
src_len
,
self
.
args
.
tgt_len
)
if
self
.
args
.
batch_size
is
not
None
:
bsz
=
self
.
args
.
batch_size
else
:
bsz
=
max
(
1
,
self
.
args
.
max_tokens
//
item_size
)
tgt
=
torch
.
stack
([
self
.
dummy_tgt
for
_
in
range
(
bsz
)])
self
.
datasets
[
split
]
=
DummyDataset
(
{
"id"
:
1
,
"net_input"
:
{
"src_tokens"
:
torch
.
stack
([
self
.
dummy_src
for
_
in
range
(
bsz
)]),
"src_lengths"
:
torch
.
full
(
(
bsz
,),
self
.
args
.
src_len
,
dtype
=
torch
.
long
),
"prev_output_tokens"
:
tgt
.
clone
(),
},
"target"
:
tgt
,
"nsentences"
:
bsz
,
"ntokens"
:
bsz
*
self
.
args
.
tgt_len
,
},
num_items
=
self
.
args
.
dataset_size
,
item_size
=
item_size
,
)
@
property
def
source_dictionary
(
self
):
return
self
.
dictionary
@
property
def
target_dictionary
(
self
):
return
self
.
dictionary
class
DummyDataset
(
FairseqDataset
):
def
__init__
(
self
,
batch
,
num_items
,
item_size
):
super
().
__init__
()
self
.
batch
=
batch
self
.
num_items
=
num_items
self
.
item_size
=
item_size
def
__getitem__
(
self
,
index
):
return
index
def
__len__
(
self
):
return
self
.
num_items
def
collater
(
self
,
samples
):
return
self
.
batch
@
property
def
sizes
(
self
):
return
np
.
array
([
self
.
item_size
]
*
self
.
num_items
)
def
num_tokens
(
self
,
index
):
return
self
.
item_size
def
size
(
self
,
index
):
return
self
.
item_size
def
ordered_indices
(
self
):
return
np
.
arange
(
self
.
num_items
)
@
property
def
supports_prefetch
(
self
):
return
False
PyTorch/NLP/new-Transformer/fairseq/binarizer.py
0 → 100644
View file @
c0f05c10
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import
logging
import
os
import
typing
as
tp
from
abc
import
ABC
,
abstractmethod
from
collections
import
Counter
from
dataclasses
import
dataclass
from
multiprocessing
import
Pool
import
torch
from
fairseq.data
import
Dictionary
,
indexed_dataset
from
fairseq.file_chunker_utils
import
Chunker
,
find_offsets
from
fairseq.file_io
import
PathManager
from
fairseq.tokenizer
import
tokenize_line
logger
=
logging
.
getLogger
(
"binarizer"
)
@
dataclass
class
BinarizeSummary
:
"""
Keep track of what's going on in the binarizer
"""
num_seq
:
int
=
0
replaced
:
tp
.
Optional
[
Counter
]
=
None
num_tok
:
int
=
0
@
property
def
num_replaced
(
self
)
->
int
:
if
self
.
replaced
is
None
:
return
0
return
sum
(
self
.
replaced
.
values
())
@
property
def
replaced_percent
(
self
)
->
float
:
return
100
*
self
.
num_replaced
/
self
.
num_tok
def
__str__
(
self
)
->
str
:
base
=
f
"
{
self
.
num_seq
}
sents,
{
self
.
num_tok
}
tokens"
if
self
.
replaced
is
None
:
return
base
return
f
"
{
base
}
,
{
self
.
replaced_percent
:.
3
}
% replaced"
def
merge
(
self
,
other
:
"BinarizeSummary"
):
replaced
=
None
if
self
.
replaced
is
not
None
:
replaced
=
self
.
replaced
if
other
.
replaced
is
not
None
:
if
replaced
is
None
:
replaced
=
other
.
replaced
else
:
replaced
+=
other
.
replaced
self
.
replaced
=
replaced
self
.
num_seq
+=
other
.
num_seq
self
.
num_tok
+=
other
.
num_tok
class
Binarizer
(
ABC
):
"""
a binarizer describes how to take a string and build a tensor out of it
"""
@
abstractmethod
def
binarize_line
(
self
,
line
:
str
,
summary
:
BinarizeSummary
,
)
->
torch
.
IntTensor
:
...
def
_worker_prefix
(
output_prefix
:
str
,
worker_id
:
int
):
return
f
"
{
output_prefix
}
.pt
{
worker_id
}
"
class
FileBinarizer
:
"""
An file binarizer can take a file, tokenize it, and binarize each line to a tensor
"""
@
classmethod
def
multiprocess_dataset
(
cls
,
input_file
:
str
,
dataset_impl
:
str
,
binarizer
:
Binarizer
,
output_prefix
:
str
,
vocab_size
=
None
,
num_workers
=
1
,
)
->
BinarizeSummary
:
final_summary
=
BinarizeSummary
()
offsets
=
find_offsets
(
input_file
,
num_workers
)
# find_offsets returns a list of position [pos1, pos2, pos3, pos4] but we would want pairs:
# [(pos1, pos2), (pos2, pos3), (pos3, pos4)] to process the chunks with start/end info
# we zip the list with itself shifted by one to get all the pairs.
(
first_chunk
,
*
more_chunks
)
=
zip
(
offsets
,
offsets
[
1
:])
pool
=
None
if
num_workers
>
1
:
pool
=
Pool
(
processes
=
num_workers
-
1
)
worker_results
=
[
pool
.
apply_async
(
cls
.
_binarize_chunk_and_finalize
,
args
=
(
binarizer
,
input_file
,
start_offset
,
end_offset
,
_worker_prefix
(
output_prefix
,
worker_id
,
),
dataset_impl
,
),
kwds
=
{
"vocab_size"
:
vocab_size
,
}
if
vocab_size
is
not
None
else
{},
)
for
worker_id
,
(
start_offset
,
end_offset
)
in
enumerate
(
more_chunks
,
start
=
1
)
]
pool
.
close
()
pool
.
join
()
for
r
in
worker_results
:
summ
=
r
.
get
()
final_summary
.
merge
(
summ
)
# do not close the bin file as we need to merge the worker results in
final_ds
,
summ
=
cls
.
_binarize_file_chunk
(
binarizer
,
input_file
,
offset_start
=
first_chunk
[
0
],
offset_end
=
first_chunk
[
1
],
output_prefix
=
output_prefix
,
dataset_impl
=
dataset_impl
,
vocab_size
=
vocab_size
if
vocab_size
is
not
None
else
None
,
)
final_summary
.
merge
(
summ
)
if
num_workers
>
1
:
for
worker_id
in
range
(
1
,
num_workers
):
# merge the worker outputs
worker_output_prefix
=
_worker_prefix
(
output_prefix
,
worker_id
,
)
final_ds
.
merge_file_
(
worker_output_prefix
)
try
:
os
.
remove
(
indexed_dataset
.
data_file_path
(
worker_output_prefix
))
os
.
remove
(
indexed_dataset
.
index_file_path
(
worker_output_prefix
))
except
Exception
as
e
:
logger
.
error
(
f
"couldn't remove
{
worker_output_prefix
}
.*"
,
exc_info
=
e
)
# now we can close the file
idx_file
=
indexed_dataset
.
index_file_path
(
output_prefix
)
final_ds
.
finalize
(
idx_file
)
return
final_summary
@
staticmethod
def
_binarize_file_chunk
(
binarizer
:
Binarizer
,
filename
:
str
,
offset_start
:
int
,
offset_end
:
int
,
output_prefix
:
str
,
dataset_impl
:
str
,
vocab_size
=
None
,
)
->
tp
.
Tuple
[
tp
.
Any
,
BinarizeSummary
]:
# (dataset builder, BinarizeSummary)
"""
creates a dataset builder and append binarized items to it. This function does not
finalize the builder, this is useful if you want to do other things with your bin file
like appending/merging other files
"""
bin_file
=
indexed_dataset
.
data_file_path
(
output_prefix
)
ds
=
indexed_dataset
.
make_builder
(
bin_file
,
impl
=
dataset_impl
,
vocab_size
=
vocab_size
,
)
summary
=
BinarizeSummary
()
with
Chunker
(
PathManager
.
get_local_path
(
filename
),
offset_start
,
offset_end
)
as
line_iterator
:
for
line
in
line_iterator
:
ds
.
add_item
(
binarizer
.
binarize_line
(
line
,
summary
))
return
ds
,
summary
@
classmethod
def
_binarize_chunk_and_finalize
(
cls
,
binarizer
:
Binarizer
,
filename
:
str
,
offset_start
:
int
,
offset_end
:
int
,
output_prefix
:
str
,
dataset_impl
:
str
,
vocab_size
=
None
,
):
"""
same as above, but also finalizes the builder
"""
ds
,
summ
=
cls
.
_binarize_file_chunk
(
binarizer
,
filename
,
offset_start
,
offset_end
,
output_prefix
,
dataset_impl
,
vocab_size
=
vocab_size
,
)
idx_file
=
indexed_dataset
.
index_file_path
(
output_prefix
)
ds
.
finalize
(
idx_file
)
return
summ
class
VocabularyDatasetBinarizer
(
Binarizer
):
"""
Takes a Dictionary/Vocabulary, assign ids to each
token using the dictionary encode_line function.
"""
def
__init__
(
self
,
dict
:
Dictionary
,
tokenize
:
tp
.
Callable
[[
str
],
tp
.
List
[
str
]]
=
tokenize_line
,
append_eos
:
bool
=
True
,
reverse_order
:
bool
=
False
,
already_numberized
:
bool
=
False
,
)
->
None
:
self
.
dict
=
dict
self
.
tokenize
=
tokenize
self
.
append_eos
=
append_eos
self
.
reverse_order
=
reverse_order
self
.
already_numberized
=
already_numberized
super
().
__init__
()
def
binarize_line
(
self
,
line
:
str
,
summary
:
BinarizeSummary
,
):
if
summary
.
replaced
is
None
:
summary
.
replaced
=
Counter
()
def
replaced_consumer
(
word
,
idx
):
if
idx
==
self
.
dict
.
unk_index
and
word
!=
self
.
dict
.
unk_word
:
summary
.
replaced
.
update
([
word
])
if
self
.
already_numberized
:
id_strings
=
line
.
strip
().
split
()
id_list
=
[
int
(
id_string
)
for
id_string
in
id_strings
]
if
self
.
reverse_order
:
id_list
.
reverse
()
if
self
.
append_eos
:
id_list
.
append
(
self
.
dict
.
eos
())
ids
=
torch
.
IntTensor
(
id_list
)
else
:
ids
=
self
.
dict
.
encode_line
(
line
=
line
,
line_tokenizer
=
self
.
tokenize
,
add_if_not_exist
=
False
,
consumer
=
replaced_consumer
,
append_eos
=
self
.
append_eos
,
reverse_order
=
self
.
reverse_order
,
)
summary
.
num_seq
+=
1
summary
.
num_tok
+=
len
(
ids
)
return
ids
class
AlignmentDatasetBinarizer
(
Binarizer
):
"""
binarize by parsing a set of alignments and packing
them in a tensor (see utils.parse_alignment)
"""
def
__init__
(
self
,
alignment_parser
:
tp
.
Callable
[[
str
],
torch
.
IntTensor
],
)
->
None
:
super
().
__init__
()
self
.
alignment_parser
=
alignment_parser
def
binarize_line
(
self
,
line
:
str
,
summary
:
BinarizeSummary
,
):
ids
=
self
.
alignment_parser
(
line
)
summary
.
num_seq
+=
1
summary
.
num_tok
+=
len
(
ids
)
return
ids
class
LegacyBinarizer
:
@
classmethod
def
binarize
(
cls
,
filename
:
str
,
dico
:
Dictionary
,
consumer
:
tp
.
Callable
[[
torch
.
IntTensor
],
None
],
tokenize
:
tp
.
Callable
[[
str
],
tp
.
List
[
str
]]
=
tokenize_line
,
append_eos
:
bool
=
True
,
reverse_order
:
bool
=
False
,
offset
:
int
=
0
,
end
:
int
=
-
1
,
already_numberized
:
bool
=
False
,
)
->
tp
.
Dict
[
str
,
int
]:
binarizer
=
VocabularyDatasetBinarizer
(
dict
=
dico
,
tokenize
=
tokenize
,
append_eos
=
append_eos
,
reverse_order
=
reverse_order
,
already_numberized
=
already_numberized
,
)
return
cls
.
_consume_file
(
filename
,
binarizer
,
consumer
,
offset_start
=
offset
,
offset_end
=
end
,
)
@
classmethod
def
binarize_alignments
(
cls
,
filename
:
str
,
alignment_parser
:
tp
.
Callable
[[
str
],
torch
.
IntTensor
],
consumer
:
tp
.
Callable
[[
torch
.
IntTensor
],
None
],
offset
:
int
=
0
,
end
:
int
=
-
1
,
)
->
tp
.
Dict
[
str
,
int
]:
binarizer
=
AlignmentDatasetBinarizer
(
alignment_parser
)
return
cls
.
_consume_file
(
filename
,
binarizer
,
consumer
,
offset_start
=
offset
,
offset_end
=
end
,
)
@
staticmethod
def
_consume_file
(
filename
:
str
,
binarizer
:
Binarizer
,
consumer
:
tp
.
Callable
[[
torch
.
IntTensor
],
None
],
offset_start
:
int
,
offset_end
:
int
,
)
->
tp
.
Dict
[
str
,
int
]:
summary
=
BinarizeSummary
()
with
Chunker
(
PathManager
.
get_local_path
(
filename
),
offset_start
,
offset_end
)
as
line_iterator
:
for
line
in
line_iterator
:
consumer
(
binarizer
.
binarize_line
(
line
,
summary
))
return
{
"nseq"
:
summary
.
num_seq
,
"nunk"
:
summary
.
num_replaced
,
"ntok"
:
summary
.
num_tok
,
"replaced"
:
summary
.
replaced
,
}
Prev
1
2
3
4
5
6
7
8
9
10
…
17
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment