Commit 6381cc97 authored by Myle Ott's avatar Myle Ott
Browse files

Add documentation

parent 0e101e9c
# Introduction
Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models, including:
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks. It provides reference implementations
of various sequence-to-sequence models, including:
- **Convolutional Neural Networks (CNN)**
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](https://arxiv.org/abs/1612.08083)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122)
......@@ -16,10 +19,12 @@ Fairseq(-py) is a sequence modeling toolkit that allows researchers and develope
Fairseq features:
- multi-GPU (distributed) training on one machine or across multiple machines
- fast beam search generation on both CPU and GPU
- large mini-batch training (even on a single GPU) via delayed updates
- large mini-batch training even on a single GPU via delayed updates
- fast half-precision floating point (FP16) training
- extensible: easily register new models, criterions, and tasks
We also provide [pre-trained models](#pre-trained-models) for several benchmark translation datasets.
We also provide [pre-trained models](#pre-trained-models) for several benchmark
translation and language modeling datasets.
![Model](fairseq.gif)
......@@ -31,112 +36,20 @@ We also provide [pre-trained models](#pre-trained-models) for several benchmark
Currently fairseq requires PyTorch version >= 0.4.0.
Please follow the instructions here: https://github.com/pytorch/pytorch#installation.
If you use Docker make sure to increase the shared memory size either with `--ipc=host` or `--shm-size` as command line
options to `nvidia-docker run`.
If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
After PyTorch is installed, you can install fairseq with:
```
pip install -r requirements.txt
python setup.py build
python setup.py develop
python setup.py build develop
```
# Quick Start
# Getting Started
The following command-line tools are provided:
* `python preprocess.py`: Data pre-processing: build vocabularies and binarize training data
* `python train.py`: Train a new model on one or multiple GPUs
* `python generate.py`: Translate pre-processed data with a trained model
* `python interactive.py`: Translate raw text with a trained model
* `python score.py`: BLEU scoring of generated translations against reference translations
* `python eval_lm.py`: Language model evaluation
## Evaluating Pre-trained Models
First, download a pre-trained model along with its vocabularies:
```
$ curl https://s3.amazonaws.com/fairseq-py/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf -
```
This model uses a [Byte Pair Encoding (BPE) vocabulary](https://arxiv.org/abs/1508.07909), so we'll have to apply the encoding to the source text before it can be translated.
This can be done with the [apply_bpe.py](https://github.com/rsennrich/subword-nmt/blob/master/apply_bpe.py) script using the `wmt14.en-fr.fconv-cuda/bpecodes` file.
`@@` is used as a continuation marker and the original text can be easily recovered with e.g. `sed s/@@ //g` or by passing the `--remove-bpe` flag to `generate.py`.
Prior to BPE, input text needs to be tokenized using `tokenizer.perl` from [mosesdecoder](https://github.com/moses-smt/mosesdecoder).
Let's use `python interactive.py` to generate translations interactively.
Here, we use a beam size of 5:
```
$ MODEL_DIR=wmt14.en-fr.fconv-py
$ python interactive.py \
--path $MODEL_DIR/model.pt $MODEL_DIR \
--beam 5
| loading model(s) from wmt14.en-fr.fconv-py/model.pt
| [en] dictionary: 44206 types
| [fr] dictionary: 44463 types
| Type the input sentence and press return:
> Why is it rare to discover new marine mam@@ mal species ?
O Why is it rare to discover new marine mam@@ mal species ?
H -0.06429661810398102 Pourquoi est-il rare de découvrir de nouvelles espèces de mammifères marins ?
A 0 1 3 3 5 6 6 8 8 8 7 11 12
```
This generation script produces four types of outputs: a line prefixed with *S* shows the supplied source sentence after applying the vocabulary; *O* is a copy of the original source sentence; *H* is the hypothesis along with an average log-likelihood; and *A* is the attention maxima for each word in the hypothesis, including the end-of-sentence marker which is omitted from the text.
Check [below](#pre-trained-models) for a full list of pre-trained models available.
## Training a New Model
The following tutorial is for machine translation.
For an example of how to use Fairseq for other tasks, such as [language modeling](examples/language_model/README.md), please see the `examples/` directory.
### Data Pre-processing
Fairseq contains example pre-processing scripts for several translation datasets: IWSLT 2014 (German-English), WMT 2014 (English-French) and WMT 2014 (English-German).
To pre-process and binarize the IWSLT dataset:
```
$ cd examples/translation/
$ bash prepare-iwslt14.sh
$ cd ../..
$ TEXT=examples/translation/iwslt14.tokenized.de-en
$ python preprocess.py --source-lang de --target-lang en \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/iwslt14.tokenized.de-en
```
This will write binarized data that can be used for model training to `data-bin/iwslt14.tokenized.de-en`.
### Training
Use `python train.py` to train a new model.
Here a few example settings that work well for the IWSLT 2014 dataset:
```
$ mkdir -p checkpoints/fconv
$ CUDA_VISIBLE_DEVICES=0 python train.py data-bin/iwslt14.tokenized.de-en \
--lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--arch fconv_iwslt_de_en --save-dir checkpoints/fconv
```
By default, `python train.py` will use all available GPUs on your machine.
Use the [CUDA_VISIBLE_DEVICES](http://acceleware.com/blog/cudavisibledevices-masking-gpus) environment variable to select specific GPUs and/or to change the number of GPU devices that will be used.
Also note that the batch size is specified in terms of the maximum number of tokens per batch (`--max-tokens`).
You may need to use a smaller value depending on the available GPU memory on your system.
### Generation
Once your model is trained, you can generate translations using `python generate.py` **(for binarized data)** or `python interactive.py` **(for raw text)**:
```
$ python generate.py data-bin/iwslt14.tokenized.de-en \
--path checkpoints/fconv/checkpoint_best.pt \
--batch-size 128 --beam 5
| [de] dictionary: 35475 types
| [en] dictionary: 24739 types
| data-bin/iwslt14.tokenized.de-en test 6750 examples
| model fconv
| loaded checkpoint trainings/fconv/checkpoint_best.pt
S-721 danke .
T-721 thank you .
...
```
To generate translations with only a CPU, use the `--cpu` flag.
BPE continuation markers can be removed with the `--remove-bpe` flag.
The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
# Pre-trained Models
......@@ -185,68 +98,6 @@ $ python score.py --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
```
# Large mini-batch training with delayed updates
The `--update-freq` option can be used to accumulate gradients from multiple mini-batches and delay updating,
creating a larger effective batch size.
Delayed updates can also improve training speed by reducing inter-GPU communication costs and by saving idle time caused by variance in workload across GPUs.
See [Ott et al. (2018)](https://arxiv.org/abs/1806.00187) for more details.
To train on a single GPU with an effective batch size that is equivalent to training on 8 GPUs:
```
CUDA_VISIBLE_DEVICES=0 python train.py --update-freq 8 (...)
```
# Training with half precision floating point (FP16)
> Note: FP16 training requires a Volta GPU and CUDA 9.1 or greater
Recent GPUs enable efficient half precision floating point computation, e.g., using [Nvidia Tensor Cores](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html).
Fairseq supports FP16 training with the `--fp16` flag:
```
python train.py --fp16 (...)
```
# Distributed training
Distributed training in fairseq is implemented on top of [torch.distributed](http://pytorch.org/docs/master/distributed.html).
Training begins by launching one worker process per GPU.
These workers discover each other via a unique host and port (required) that can be used to establish an initial connection.
Additionally, each worker has a rank, that is a unique number from 0 to n-1 where n is the total number of GPUs.
If you run on a cluster managed by [SLURM](https://slurm.schedmd.com/) you can train a large English-French model on the WMT 2014 dataset on 16 nodes with 8 GPUs each (in total 128 GPUs) using this command:
```
$ DATA=... # path to the preprocessed dataset, must be visible from all nodes
$ PORT=9218 # any available TCP port that can be used by the trainer to establish initial connection
$ sbatch --job-name fairseq-py --gres gpu:8 --cpus-per-task 10 \
--nodes 16 --ntasks-per-node 8 \
--wrap 'srun --output train.log.node%t --error train.stderr.node%t.%j \
python train.py $DATA \
--distributed-world-size 128 \
--distributed-port $PORT \
--force-anneal 50 --lr-scheduler fixed --max-epoch 55 \
--arch fconv_wmt_en_fr --optimizer nag --lr 0.1,4 --max-tokens 3000 \
--clip-norm 0.1 --dropout 0.1 --criterion label_smoothed_cross_entropy \
--label-smoothing 0.1 --wd 0.0001'
```
Alternatively you can manually start one process per GPU:
```
$ DATA=... # path to the preprocessed dataset, must be visible from all nodes
$ HOST_PORT=master.devserver.com:9218 # one of the hosts used by the job
$ RANK=... # the rank of this process, from 0 to 127 in case of 128 GPUs
$ python train.py $DATA \
--distributed-world-size 128 \
--distributed-init-method 'tcp://$HOST_PORT' \
--distributed-rank $RANK \
--force-anneal 50 --lr-scheduler fixed --max-epoch 55 \
--arch fconv_wmt_en_fr --optimizer nag --lr 0.1,4 --max-tokens 3000 \
--clip-norm 0.1 --dropout 0.1 --criterion label_smoothed_cross_entropy \
--label-smoothing 0.1 --wd 0.0001
```
# Join the fairseq community
* Facebook page: https://www.facebook.com/groups/fairseq.users
......@@ -271,4 +122,8 @@ The license applies to the pre-trained models as well.
We also provide an additional patent grant.
# Credits
This is a PyTorch version of [fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence learning toolkit from Facebook AI Research. The original authors of this reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam Gross.
This is a PyTorch version of
[fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence
learning toolkit from Facebook AI Research. The original authors of this
reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam
Gross.
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = python -msphinx
SPHINXPROJ = fairseq
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
.wy-table-responsive table td kbd {
white-space: nowrap;
}
.wy-table-responsive table td {
white-space: normal !important;
}
.wy-table-responsive {
overflow: visible !important;
}
.. _Command-line Tools:
Command-line Tools
==================
Fairseq provides several command-line tools for training and evaluating models:
- :ref:`preprocess.py`: Data pre-processing: build vocabularies and binarize training data
- :ref:`train.py`: Train a new model on one or multiple GPUs
- :ref:`generate.py`: Translate pre-processed data with a trained model
- :ref:`interactive.py`: Translate raw text with a trained model
- :ref:`score.py`: BLEU scoring of generated translations against reference translations
- :ref:`eval_lm.py`: Language model evaluation
.. _preprocess.py:
preprocess.py
~~~~~~~~~~~~~
.. automodule:: preprocess
.. argparse::
:module: preprocess
:func: get_parser
:prog: preprocess.py
.. _train.py:
train.py
~~~~~~~~
.. automodule:: train
.. argparse::
:module: fairseq.options
:func: get_training_parser
:prog: train.py
.. _generate.py:
generate.py
~~~~~~~~~~~
.. automodule:: generate
.. argparse::
:module: fairseq.options
:func: get_generation_parser
:prog: generate.py
.. _interactive.py:
interactive.py
~~~~~~~~~~~~~~
.. automodule:: interactive
.. argparse::
:module: fairseq.options
:func: get_interactive_generation_parser
:prog: interactive.py
.. _score.py:
score.py
~~~~~~~~
.. automodule:: score
.. argparse::
:module: score
:func: get_parser
:prog: score.py
.. _eval_lm.py:
eval_lm.py
~~~~~~~~~~
.. automodule:: eval_lm
.. argparse::
:module: fairseq.options
:func: get_eval_lm_parser
:prog: eval_lm.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# fairseq documentation build configuration file, created by
# sphinx-quickstart on Fri Aug 17 21:45:30 2018.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
import os
import sys
import fairseq
import sphinx_rtd_theme
# source code directory, relative to this file, for sphinx-autobuild
sys.path.insert(0, os.path.abspath('..'))
import recommonmark
from recommonmark.parser import CommonMarkParser
from recommonmark.transform import AutoStructify
source_parsers = {
'.md': CommonMarkParser
}
source_suffix = ['.rst', '.md']
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',
'sphinx.ext.viewcode',
'sphinx.ext.githubpages',
'sphinx.ext.napoleon',
'sphinxarg.ext',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = 'fairseq'
copyright = '2018, Facebook AI Research (FAIR)'
author = 'Facebook AI Research (FAIR)'
github_doc_root = 'https://github.com/pytorch/fairseq/tree/master/docs/'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.5.0'
# The full version, including alpha/beta/rc tags.
release = '0.5.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
highlight_language = 'python'
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_context = {
'css_files': [
'_static/theme_overrides.css', # override wide tables in RTD theme
],
}
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
#html_sidebars = {
# '**': [
# 'about.html',
# 'navigation.html',
# 'relations.html', # needs 'show_related': True theme option to display
# 'searchbox.html',
# 'donate.html',
# ]
#}
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
'numpy': ('http://docs.scipy.org/doc/numpy/', None),
'python': ('https://docs.python.org/', None),
'torch': ('https://pytorch.org/docs/master/', None),
}
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'fairseqdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'fairseq.tex', 'fairseq Documentation',
'Facebook AI Research (FAIR)', 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'fairseq', 'fairseq Documentation',
[author], 1)
]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'fairseq', 'fairseq Documentation',
author, 'fairseq', 'One line description of project.',
'Miscellaneous'),
]
# app setup hook
def setup(app):
app.add_config_value('recommonmark_config', {
'url_resolver': lambda url: github_doc_root + url,
'auto_toc_tree_section': 'Contents',
'enable_eval_rst': True,
'enable_auto_doc_ref': True,
}, True)
app.add_transform(AutoStructify)
.. role:: hidden
:class: hidden-section
.. _Criterions:
Criterions
==========
.. automodule:: fairseq.criterions
:members:
.. autoclass:: fairseq.criterions.FairseqCriterion
:members:
:undoc-members:
.. role:: hidden
:class: hidden-section
.. module:: fairseq.data
Data Loading and Utilities
==========================
.. _datasets:
Datasets
--------
**Datasets** define the data format and provide helpers for creating
mini-batches.
.. autoclass:: fairseq.data.FairseqDataset
:members:
.. autoclass:: fairseq.data.LanguagePairDataset
:members:
.. autoclass:: fairseq.data.MonolingualDataset
:members:
Dictionary
----------
.. autoclass:: fairseq.data.Dictionary
:members:
Iterators
---------
.. autoclass:: fairseq.data.CountingIterator
:members:
.. autoclass:: fairseq.data.EpochBatchIterator
:members:
.. autoclass:: fairseq.data.ShardedIterator
:members:
[writers]
option-limit=0
Evaluating Pre-trained Models
=============================
First, download a pre-trained model along with its vocabularies:
.. code-block:: console
> curl https://s3.amazonaws.com/fairseq-py/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf -
This model uses a `Byte Pair Encoding (BPE)
vocabulary <https://arxiv.org/abs/1508.07909>`__, so we'll have to apply
the encoding to the source text before it can be translated. This can be
done with the
`apply\_bpe.py <https://github.com/rsennrich/subword-nmt/blob/master/apply_bpe.py>`__
script using the ``wmt14.en-fr.fconv-cuda/bpecodes`` file. ``@@`` is
used as a continuation marker and the original text can be easily
recovered with e.g. ``sed s/@@ //g`` or by passing the ``--remove-bpe``
flag to :ref:`generate.py`. Prior to BPE, input text needs to be tokenized
using ``tokenizer.perl`` from
`mosesdecoder <https://github.com/moses-smt/mosesdecoder>`__.
Let's use :ref:`interactive.py` to generate translations
interactively. Here, we use a beam size of 5:
.. code-block:: console
> MODEL_DIR=wmt14.en-fr.fconv-py
> python interactive.py \
--path $MODEL_DIR/model.pt $MODEL_DIR \
--beam 5
| loading model(s) from wmt14.en-fr.fconv-py/model.pt
| [en] dictionary: 44206 types
| [fr] dictionary: 44463 types
| Type the input sentence and press return:
> Why is it rare to discover new marine mam@@ mal species ?
O Why is it rare to discover new marine mam@@ mal species ?
H -0.06429661810398102 Pourquoi est-il rare de découvrir de nouvelles espèces de mammifères marins ?
A 0 1 3 3 5 6 6 8 8 8 7 11 12
This generation script produces four types of outputs: a line prefixed
with *S* shows the supplied source sentence after applying the
vocabulary; *O* is a copy of the original source sentence; *H* is the
hypothesis along with an average log-likelihood; and *A* is the
attention maxima for each word in the hypothesis, including the
end-of-sentence marker which is omitted from the text.
See the `README <https://github.com/pytorch/fairseq#pre-trained-models>`__ for a
full list of pre-trained models available.
Training a New Model
====================
The following tutorial is for machine translation. For an example of how
to use Fairseq for other tasks, such as :ref:`language modeling`, please see the
``examples/`` directory.
Data Pre-processing
-------------------
Fairseq contains example pre-processing scripts for several translation
datasets: IWSLT 2014 (German-English), WMT 2014 (English-French) and WMT
2014 (English-German). To pre-process and binarize the IWSLT dataset:
.. code-block:: console
> cd examples/translation/
> bash prepare-iwslt14.sh
> cd ../..
> TEXT=examples/translation/iwslt14.tokenized.de-en
> python preprocess.py --source-lang de --target-lang en \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/iwslt14.tokenized.de-en
This will write binarized data that can be used for model training to
``data-bin/iwslt14.tokenized.de-en``.
Training
--------
Use :ref:`train.py` to train a new model. Here a few example settings that work
well for the IWSLT 2014 dataset:
.. code-block:: console
> mkdir -p checkpoints/fconv
> CUDA_VISIBLE_DEVICES=0 python train.py data-bin/iwslt14.tokenized.de-en \
--lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--arch fconv_iwslt_de_en --save-dir checkpoints/fconv
By default, :ref:`train.py` will use all available GPUs on your machine. Use the
``CUDA_VISIBLE_DEVICES`` environment variable to select specific GPUs and/or to
change the number of GPU devices that will be used.
Also note that the batch size is specified in terms of the maximum
number of tokens per batch (``--max-tokens``). You may need to use a
smaller value depending on the available GPU memory on your system.
Generation
----------
Once your model is trained, you can generate translations using
:ref:`generate.py` **(for binarized data)** or
:ref:`interactive.py` **(for raw text)**:
.. code-block:: console
> python generate.py data-bin/iwslt14.tokenized.de-en \
--path checkpoints/fconv/checkpoint_best.pt \
--batch-size 128 --beam 5
| [de] dictionary: 35475 types
| [en] dictionary: 24739 types
| data-bin/iwslt14.tokenized.de-en test 6750 examples
| model fconv
| loaded checkpoint trainings/fconv/checkpoint_best.pt
S-721 danke .
T-721 thank you .
...
To generate translations with only a CPU, use the ``--cpu`` flag. BPE
continuation markers can be removed with the ``--remove-bpe`` flag.
Advanced Training Options
=========================
Large mini-batch training with delayed updates
----------------------------------------------
The ``--update-freq`` option can be used to accumulate gradients from
multiple mini-batches and delay updating, creating a larger effective
batch size. Delayed updates can also improve training speed by reducing
inter-GPU communication costs and by saving idle time caused by variance
in workload across GPUs. See `Ott et al.
(2018) <https://arxiv.org/abs/1806.00187>`__ for more details.
To train on a single GPU with an effective batch size that is equivalent
to training on 8 GPUs:
.. code-block:: console
> CUDA_VISIBLE_DEVICES=0 python train.py --update-freq 8 (...)
Training with half precision floating point (FP16)
--------------------------------------------------
.. note::
FP16 training requires a Volta GPU and CUDA 9.1 or greater
Recent GPUs enable efficient half precision floating point computation,
e.g., using `Nvidia Tensor Cores
<https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html>`__.
Fairseq supports FP16 training with the ``--fp16`` flag:
.. code-block:: console
> python train.py --fp16 (...)
Distributed training
--------------------
Distributed training in fairseq is implemented on top of
`torch.distributed <http://pytorch.org/docs/master/distributed.html>`__.
Training begins by launching one worker process per GPU. These workers
discover each other via a unique host and port (required) that can be
used to establish an initial connection. Additionally, each worker has a
rank, that is a unique number from 0 to n-1 where n is the total number
of GPUs.
If you run on a cluster managed by
`SLURM <https://slurm.schedmd.com/>`__ you can train a large
English-French model on the WMT 2014 dataset on 16 nodes with 8 GPUs
each (in total 128 GPUs) using this command:
.. code-block:: console
> DATA=... # path to the preprocessed dataset, must be visible from all nodes
> PORT=9218 # any available TCP port that can be used by the trainer to establish initial connection
> sbatch --job-name fairseq-py --gres gpu:8 --cpus-per-task 10 \
--nodes 16 --ntasks-per-node 8 \
--wrap 'srun --output train.log.node%t --error train.stderr.node%t.%j \
python train.py $DATA \
--distributed-world-size 128 \
--distributed-port $PORT \
--force-anneal 50 --lr-scheduler fixed --max-epoch 55 \
--arch fconv_wmt_en_fr --optimizer nag --lr 0.1,4 --max-tokens 3000 \
--clip-norm 0.1 --dropout 0.1 --criterion label_smoothed_cross_entropy \
--label-smoothing 0.1 --wd 0.0001'
Alternatively you can manually start one process per GPU:
.. code-block:: console
> DATA=... # path to the preprocessed dataset, must be visible from all nodes
> HOST_PORT=master.example.com:9218 # one of the hosts used by the job
> RANK=... # the rank of this process, from 0 to 127 in case of 128 GPUs
> python train.py $DATA \
--distributed-world-size 128 \
--distributed-init-method 'tcp://$HOST_PORT' \
--distributed-rank $RANK \
--force-anneal 50 --lr-scheduler fixed --max-epoch 55 \
--arch fconv_wmt_en_fr --optimizer nag --lr 0.1,4 --max-tokens 3000 \
--clip-norm 0.1 --dropout 0.1 --criterion label_smoothed_cross_entropy \
--label-smoothing 0.1 --wd 0.0001
.. fairseq documentation master file, created by
sphinx-quickstart on Fri Aug 17 21:45:30 2018.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
:github_url: https://github.com/pytorch/fairseq
fairseq documentation
=====================
Fairseq is a sequence modeling toolkit written in `PyTorch
<http://pytorch.org/>`_ that allows researchers and developers to
train custom models for translation, summarization, language modeling and other
text generation tasks.
.. toctree::
:maxdepth: 1
:caption: Getting Started
getting_started
command_line_tools
.. toctree::
:maxdepth: 1
:caption: Extending Fairseq
overview
tutorial_simple_lstm
tutorial_classifying_names
.. toctree::
:maxdepth: 2
:caption: Library Reference
tasks
models
criterions
optim
lr_scheduler
data
modules
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
.. role:: hidden
:class: hidden-section
.. _Learning Rate Schedulers:
Learning Rate Schedulers
========================
TODO
.. automodule:: fairseq.optim.lr_scheduler
:members:
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=python -msphinx
)
set SOURCEDIR=.
set BUILDDIR=_build
set SPHINXPROJ=fairseq
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The Sphinx module was not found. Make sure you have Sphinx installed,
echo.then set the SPHINXBUILD environment variable to point to the full
echo.path of the 'sphinx-build' executable. Alternatively you may add the
echo.Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
:end
popd
.. role:: hidden
:class: hidden-section
.. module:: fairseq.models
.. _Models:
Models
======
A Model defines the neural network's ``forward()`` method and encapsulates all
of the learnable parameters in the network. Each model also provides a set of
named *architectures* that define the precise network configuration (e.g.,
embedding dimension, number of layers, etc.).
Both the model type and architecture are selected via the ``--arch``
command-line argument. Once selected, a model may expose additional command-line
arguments for further configuration.
.. note::
All fairseq Models extend :class:`BaseFairseqModel`, which in turn extends
:class:`torch.nn.Module`. Thus any fairseq Model can be used as a
stand-alone Module in other PyTorch code.
Convolutional Neural Networks (CNN)
-----------------------------------
.. module:: fairseq.models.fconv
.. autoclass:: fairseq.models.fconv.FConvModel
:members:
.. autoclass:: fairseq.models.fconv.FConvEncoder
:members:
:undoc-members:
.. autoclass:: fairseq.models.fconv.FConvDecoder
:members:
Long Short-Term Memory (LSTM) networks
--------------------------------------
.. module:: fairseq.models.lstm
.. autoclass:: fairseq.models.lstm.LSTMModel
:members:
.. autoclass:: fairseq.models.lstm.LSTMEncoder
:members:
.. autoclass:: fairseq.models.lstm.LSTMDecoder
:members:
Transformer (self-attention) networks
-------------------------------------
.. module:: fairseq.models.transformer
.. autoclass:: fairseq.models.transformer.TransformerModel
:members:
.. autoclass:: fairseq.models.transformer.TransformerEncoder
:members:
.. autoclass:: fairseq.models.transformer.TransformerEncoderLayer
:members:
.. autoclass:: fairseq.models.transformer.TransformerDecoder
:members:
.. autoclass:: fairseq.models.transformer.TransformerDecoderLayer
:members:
Adding new models
-----------------
.. currentmodule:: fairseq.models
.. autofunction:: fairseq.models.register_model
.. autofunction:: fairseq.models.register_model_architecture
.. autoclass:: fairseq.models.BaseFairseqModel
:members:
:undoc-members:
.. autoclass:: fairseq.models.FairseqModel
:members:
:undoc-members:
.. autoclass:: fairseq.models.FairseqLanguageModel
:members:
:undoc-members:
.. autoclass:: fairseq.models.FairseqEncoder
:members:
.. autoclass:: fairseq.models.CompositeEncoder
:members:
.. autoclass:: fairseq.models.FairseqDecoder
:members:
.. _Incremental decoding:
Incremental decoding
--------------------
.. autoclass:: fairseq.models.FairseqIncrementalDecoder
:members:
:undoc-members:
Modules
=======
Fairseq provides several stand-alone :class:`torch.nn.Module` s that may be
helpful when implementing a new :class:`FairseqModel`.
.. automodule:: fairseq.modules
:members:
:undoc-members:
.. role:: hidden
:class: hidden-section
.. _optimizers:
Optimizers
==========
.. automodule:: fairseq.optim
:members:
Overview
========
Fairseq can be extended through user-supplied `plug-ins
<https://en.wikipedia.org/wiki/Plug-in_(computing)>`_. We support five kinds of
plug-ins:
- :ref:`Models` define the neural network architecture and encapsulate all of the
learnable parameters.
- :ref:`Criterions` compute the loss function given the model outputs and targets.
- :ref:`Tasks` store dictionaries and provide helpers for loading/iterating over
Datasets, initializing the Model/Criterion and calculating the loss.
- :ref:`Optimizers` update the Model parameters based on the gradients.
- :ref:`Learning Rate Schedulers` update the learning rate over the course of
training.
**Training Flow**
Given a ``model``, ``criterion``, ``task``, ``optimizer`` and ``lr_scheduler``,
fairseq implements the following high-level training flow::
for epoch in range(num_epochs):
itr = task.get_batch_iterator(task.dataset('train'))
for num_updates, batch in enumerate(itr):
loss = criterion(model, batch)
optimizer.backward(loss)
optimizer.step()
lr_scheduler.step_update(num_updates)
lr_scheduler.step(epoch)
**Registering new plug-ins**
New plug-ins are *registered* through a set of ``@register`` function
decorators, for example::
@register_model('my_lstm')
class MyLSTM(FairseqModel):
(...)
Once registered, new plug-ins can be used with the existing :ref:`Command-line
Tools`. See the Tutorial sections for more detailed walkthroughs of how to add
new plug-ins.
.. role:: hidden
:class: hidden-section
.. module:: fairseq.tasks
.. _Tasks:
Tasks
=====
Tasks store dictionaries and provide helpers for loading/iterating over
Datasets, initializing the Model/Criterion and calculating the loss.
Tasks can be selected via the ``--task`` command-line argument. Once selected, a
task may expose additional command-line arguments for further configuration.
Example usage::
# setup the task (e.g., load dictionaries)
task = fairseq.tasks.setup_task(args)
# build model and criterion
model = task.build_model(args)
criterion = task.build_criterion(args)
# load datasets
task.load_dataset('train')
task.load_dataset('valid')
# iterate over mini-batches of data
batch_itr = task.get_batch_iterator(
task.dataset('train'), max_tokens=4096,
)
for batch in batch_itr:
# compute the loss
loss, sample_size, logging_output = task.get_loss(
model, criterion, batch,
)
loss.backward()
Translation
-----------
.. autoclass:: fairseq.tasks.translation.TranslationTask
.. _language modeling:
Language Modeling
-----------------
.. autoclass:: fairseq.tasks.language_modeling.LanguageModelingTask
Adding new tasks
----------------
.. autofunction:: fairseq.tasks.register_task
.. autoclass:: fairseq.tasks.FairseqTask
:members:
:undoc-members:
Tutorial: Classifying Names with a Character-Level RNN
======================================================
In this tutorial we will extend fairseq to support *classification* tasks. In
particular we will re-implement the PyTorch tutorial for `Classifying Names with
a Character-Level RNN <https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html>`_
in fairseq. It is recommended to quickly skim that tutorial before beginning
this one.
This tutorial covers:
1. **Preprocessing the data** to create dictionaries.
2. **Registering a new Model** that encodes an input sentence with a simple RNN
and predicts the output label.
3. **Registering a new Task** that loads our dictionaries and dataset.
4. **Training the Model** using the existing command-line tools.
5. **Writing an evaluation script** that imports fairseq and allows us to
interactively evaluate our model on new inputs.
1. Preprocessing the data
-------------------------
The original tutorial provides raw data, but we'll work with a modified version
of the data that is already tokenized into characters and split into separate
train, valid and test sets.
Download and extract the data from here:
`tutorial_names.tar.gz <https://s3.amazonaws.com/fairseq-py/data/tutorial_names.tar.gz>`_
Once extracted, let's preprocess the data using the :ref:`preprocess.py`
command-line tool to create the dictionaries. While this tool is primarily
intended for sequence-to-sequence problems, we're able to reuse it here by
treating the label as a "target" sequence of length 1. We'll also output the
preprocessed files in "raw" format using the ``--output-format`` option to
enhance readability:
.. code-block:: console
> python preprocess.py \
--trainpref names/train --validpref names/valid --testpref names/test \
--source-lang input --target-lang label \
--destdir names-bin --output-format raw
After running the above command you should see a new directory,
:file:`names-bin/`, containing the dictionaries for *inputs* and *labels*.
2. Registering a new Model
--------------------------
Next we'll register a new model in fairseq that will encode an input sentence
with a simple RNN and predict the output label. Compared to the original PyTorch
tutorial, our version will also work with batches of data and GPU Tensors.
First let's copy the simple RNN module implemented in the `PyTorch tutorial
<https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html#creating-the-network>`_.
Create a new file named :file:`fairseq/models/rnn_classifier.py` with the
following contents::
import torch
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
We must also *register* this model with fairseq using the
:func:`~fairseq.models.register_model` function decorator. Once the model is
registered we'll be able to use it with the existing :ref:`Command-line Tools`.
All registered models must implement the :class:`~fairseq.models.BaseFairseqModel`
interface, so we'll create a small wrapper class in the same file and register
it in fairseq with the name ``'rnn_classifier'``::
from fairseq.models import BaseFairseqModel, register_model
# Note: the register_model "decorator" should immediately precede the
# definition of the Model class.
@register_model('rnn_classifier')
class FairseqRNNClassifier(BaseFairseqModel):
@staticmethod
def add_args(parser):
# Models can override this method to add new command-line arguments.
# Here we'll add a new command-line argument to configure the
# dimensionality of the hidden state.
parser.add_argument(
'--hidden-dim', type=int, metavar='N',
help='dimensionality of the hidden state',
)
@classmethod
def build_model(cls, args, task):
# Fairseq initializes models by calling the ``build_model()``
# function. This provides more flexibility, since the returned model
# instance can be of a different type than the one that was called.
# In this case we'll just return a FairseqRNNClassifier instance.
# Initialize our RNN module
rnn = RNN(
# We'll define the Task in the next section, but for now just
# notice that the task holds the dictionaries for the "source"
# (i.e., the input sentence) and "target" (i.e., the label).
input_size=len(task.source_dictionary),
hidden_size=args.hidden_dim,
output_size=len(task.target_dictionary),
)
# Return the wrapped version of the module
return FairseqRNNClassifier(
rnn=rnn,
input_vocab=task.source_dictionary,
)
def __init__(self, rnn, input_vocab):
super(FairseqRNNClassifier, self).__init__()
self.rnn = rnn
self.input_vocab = input_vocab
# The RNN module in the tutorial expects one-hot inputs, so we can
# precompute the identity matrix to help convert from indices to
# one-hot vectors. We register it as a buffer so that it is moved to
# the GPU when ``cuda()`` is called.
self.register_buffer('one_hot_inputs', torch.eye(len(input_vocab)))
def forward(self, src_tokens, src_lengths):
# The inputs to the ``forward()`` function are determined by the
# Task, and in particular the ``'net_input'`` key in each
# mini-batch. We'll define the Task in the next section, but for
# now just know that *src_tokens* has shape `(batch, src_len)` and
# *src_lengths* has shape `(batch)`.
bsz, max_src_len = src_tokens.size()
# Initialize the RNN hidden state. Compared to the original PyTorch
# tutorial we'll also handle batched inputs and work on the GPU.
hidden = self.rnn.initHidden()
hidden = hidden.repeat(bsz, 1) # expand for batched inputs
hidden = hidden.to(src_tokens.device) # move to GPU
for i in range(max_src_len):
# WARNING: The inputs have padding, so we should mask those
# elements here so that padding doesn't affect the results.
# This is left as an exercise for the reader. The padding symbol
# is given by ``self.input_vocab.pad()`` and the unpadded length
# of each input is given by *src_lengths*.
# One-hot encode a batch of input characters.
input = self.one_hot_inputs[src_tokens[:, i].long()]
# Feed the input to our RNN.
output, hidden = self.rnn(input, hidden)
# Return the final output state for making a prediction
return output
Finally let's define a *named architecture* with the configuration for our
model. This is done with the :func:`~fairseq.models.register_model_architecture`
function decorator. Thereafter this named architecture can be used with the
``--arch`` command-line argument, e.g., ``--arch pytorch_tutorial_rnn``::
from fairseq.models import register_model_architecture
# The first argument to ``register_model_architecture()`` should be the name
# of the model we registered above (i.e., 'rnn_classifier'). The function we
# register here should take a single argument *args* and modify it in-place
# to match the desired architecture.
@register_model_architecture('rnn_classifier', 'pytorch_tutorial_rnn')
def pytorch_tutorial_rnn(args):
# We use ``getattr()`` to prioritize arguments that are explicitly given
# on the command-line, so that the defaults defined below are only used
# when no other value has been specified.
args.hidden_dim = getattr(args, 'hidden_dim', 128)
3. Registering a new Task
-------------------------
Now we'll register a new :class:`~fairseq.tasks.FairseqTask` that will load our
dictionaries and dataset. Tasks can also control how the data is batched into
mini-batches, but in this tutorial we'll reuse the batching provided by
:class:`fairseq.data.LanguagePairDataset`.
Create a new file named :file:`fairseq/tasks/simple_classification.py` with the
following contents::
import os
import torch
from fairseq.data import Dictionary, LanguagePairDataset
from fairseq.tasks import FairseqTask, register_task
from fairseq.tokenizer import Tokenizer
@register_task('simple_classification')
class SimpleClassificationTask(FairseqTask):
@staticmethod
def add_args(parser):
# Add some command-line arguments for specifying where the data is
# located and the maximum supported input length.
parser.add_argument('data', metavar='FILE',
help='file prefix for data')
parser.add_argument('--max-positions', default=1024, type=int,
help='max input length')
@classmethod
def setup_task(cls, args, **kwargs):
# Here we can perform any setup required for the task. This may include
# loading Dictionaries, initializing shared Embedding layers, etc.
# In this case we'll just load the Dictionaries.
input_vocab = Dictionary.load(os.path.join(args.data, 'dict.input.txt'))
label_vocab = Dictionary.load(os.path.join(args.data, 'dict.label.txt'))
print('| [input] dictionary: {} types'.format(len(input_vocab)))
print('| [label] dictionary: {} types'.format(len(label_vocab)))
return SimpleClassificationTask(args, input_vocab, label_vocab)
def __init__(self, args, input_vocab, label_vocab):
super().__init__(args)
self.input_vocab = input_vocab
self.label_vocab = label_vocab
def load_dataset(self, split, **kwargs):
"""Load a given dataset split (e.g., train, valid, test)."""
prefix = os.path.join(self.args.data, '{}.input-label'.format(split))
# Read input sentences.
sentences, lengths = [], []
with open(prefix + '.input', encoding='utf-8') as file:
for line in file:
sentence = line.strip()
# Tokenize the sentence, splitting on spaces
tokens = Tokenizer.tokenize(
sentence, self.input_vocab, add_if_not_exist=False,
)
sentences.append(tokens)
lengths.append(tokens.numel())
# Read labels.
labels = []
with open(prefix + '.label', encoding='utf-8') as file:
for line in file:
label = line.strip()
labels.append(
# Convert label to a numeric ID.
torch.LongTensor([self.label_vocab.add_symbol(label)])
)
assert len(sentences) == len(labels)
print('| {} {} {} examples'.format(self.args.data, split, len(sentences)))
# We reuse LanguagePairDataset since classification can be modeled as a
# sequence-to-sequence task where the target sequence has length 1.
self.datasets[split] = LanguagePairDataset(
src=sentences,
src_sizes=lengths,
src_dict=self.input_vocab,
tgt=labels,
tgt_sizes=torch.ones(len(labels)), # targets have length 1
tgt_dict=self.label_vocab,
left_pad_source=False,
max_source_positions=self.args.max_positions,
max_target_positions=1,
# Since our target is a single class label, there's no need for
# input feeding. If we set this to ``True`` then our Model's
# ``forward()`` method would receive an additional argument called
# *prev_output_tokens* that would contain a shifted version of the
# target sequence.
input_feeding=False,
)
def max_positions(self):
"""Return the max input length allowed by the task."""
# The source should be less than *args.max_positions* and the "target"
# has max length 1.
return (self.args.max_positions, 1)
@property
def source_dictionary(self):
"""Return the source :class:`~fairseq.data.Dictionary`."""
return self.input_vocab
@property
def target_dictionary(self):
"""Return the target :class:`~fairseq.data.Dictionary`."""
return self.label_vocab
# We could override this method if we wanted more control over how batches
# are constructed, but it's not necessary for this tutorial since we can
# reuse the batching provided by LanguagePairDataset.
#
# def get_batch_iterator(
# self, dataset, max_tokens=None, max_sentences=None, max_positions=None,
# ignore_invalid_inputs=False, required_batch_size_multiple=1,
# seed=1, num_shards=1, shard_id=0,
# ):
# (...)
4. Training the Model
---------------------
Now we're ready to train the model. We can use the existing :ref:`train.py`
command-line tool for this, making sure to specify our new Task (``--task
simple_classification``) and Model architecture (``--arch
pytorch_tutorial_rnn``):
.. note::
You can also configure the dimensionality of the hidden state by passing the
``--hidden-dim`` argument to :ref:`train.py`.
.. code-block:: console
> python train.py names-bin \
--task simple_classification \
--arch pytorch_tutorial_rnn \
--optimizer adam --lr 0.001 --lr-shrink 0.5 \
--max-tokens 1000
(...)
| epoch 027 | loss 1.200 | ppl 2.30 | wps 15728 | ups 119.4 | wpb 116 | bsz 116 | num_updates 3726 | lr 1.5625e-05 | gnorm 1.290 | clip 0% | oom 0 | wall 32 | train_wall 21
| epoch 027 | valid on 'valid' subset | valid_loss 1.41304 | valid_ppl 2.66 | num_updates 3726 | best 1.41208
| done training in 31.6 seconds
The model files should appear in the :file:`checkpoints/` directory.
5. Writing an evaluation script
-------------------------------
Finally we can write a short script to evaluate our model on new inputs. Create
a new file named :file:`eval_classify.py` with the following contents::
from fairseq import data, options, tasks, utils
from fairseq.tokenizer import Tokenizer
# Parse command-line arguments for generation
parser = options.get_generation_parser()
args = options.parse_args_and_arch(parser)
# Setup task
args.task = 'simple_classification'
task = tasks.setup_task(args)
# Load model
print('| loading model from {}'.format(args.path))
models, _model_args = utils.load_ensemble_for_inference([args.path], task)
model = models[0]
while True:
sentence = input('\nInput: ')
# Tokenize into characters
chars = ' '.join(list(sentence.strip()))
tokens = Tokenizer.tokenize(
chars, task.source_dictionary, add_if_not_exist=False,
)
# Build mini-batch to feed to the model
batch = data.language_pair_dataset.collate(
samples=[{'id': -1, 'source': tokens}], # bsz = 1
pad_idx=task.source_dictionary.pad(),
eos_idx=task.source_dictionary.eos(),
left_pad_source=False,
input_feeding=False,
)
# Feed batch to the model and get predictions
preds = model(**batch['net_input'])
# Print top 3 predictions and their log-probabilities
top_scores, top_labels = preds[0].topk(k=3)
for score, label_idx in zip(top_scores, top_labels):
label_name = task.target_dictionary.string([label_idx])
print('({:.2f})\t{}'.format(score, label_name))
Now we can evaluate our model interactively. Note that we have included the
original data path (:file:`names-bin/`) so that the dictionaries can be loaded:
.. code-block:: console
> python eval_classifier.py names-bin --path checkpoints/checkpoint_best.pt
| [input] dictionary: 64 types
| [label] dictionary: 24 types
| loading model from checkpoints/checkpoint_best.pt
Input: Satoshi
(-0.61) Japanese
(-1.20) Arabic
(-2.86) Italian
Input: Sinbad
(-0.30) Arabic
(-1.76) English
(-4.08) Russian
This diff is collapsed.
......@@ -5,6 +5,9 @@
# This source code is licensed under the license found in the LICENSE file in
# the root directory of this source tree. An additional grant of patent rights
# can be found in the PATENTS file in the same directory.
"""
Evaluate the perplexity of a trained language model.
"""
import numpy as np
import torch
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment