PyTorch XLNet

cd656fb2 · Lysandre · Lysandre Debut · 83fa8d9f · cd656fb2 · cd656fb2
Commit cd656fb2 authored Jan 17, 2020 by Lysandre Committed by Lysandre Debut Jan 23, 2020
Expand all Show whitespace changes
Inline Side-by-side

Showing with 355 additions and 388 deletions

docs/source/model_doc/xlnet.rst docs/source/model_doc/xlnet.rst +16 -0

src/transformers/modeling_xlnet.py src/transformers/modeling_xlnet.py +339 -388

No files found.
--- a/docs/source/model_doc/xlnet.rst
+++ b/docs/source/model_doc/xlnet.rst
 XLNet
 ----------------------------------------------------

+The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding`_
+by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
+XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method
+to learn bidirectional contexts by maximizing the expected likelihood over all permutations
+of the input sequence factorization order.
+
+The specific attention pattern can be controlled at training and test time using the `perm_mask` input.
+
+Due to the difficulty of training a fully auto-regressive model over various factorization order,
+XLNet is pretrained using only a sub-set of the output tokens as target which are selected
+with the `target_mapping` input.
+
+To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the `perm_mask` and
+`target_mapping` inputs to control the attention span and outputs (see examples in `examples/run_generation.py`)
+
+
 ``XLNetConfig``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


--- a/src/transformers/modeling_xlnet.py
+++ b/src/transformers/modeling_xlnet.py