XLM_ROBERTA_START_DOCSTRING=r""" The XLM-RoBERTa model was proposed in
XLM_ROBERTA_START_DOCSTRING=r"""
`Unsupervised Cross-lingual Representation Learning at Scale`_
by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019.
It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
.. note::
This implementation is the same as RoBERTa.
TF 2.0 models accepts two formats as inputs:
This model is a tf.keras.Model `tf.keras.Model`_ sub-class. Use it as a regular TF 2.0 Keras Model and
- having all inputs as keyword arguments (like PyTorch models), or
refer to the TF 2.0 documentation for all matter related to general usage and behavior.
- having all inputs as a list, tuple or dict in the first positional arguments.
.. _`Unsupervised Cross-lingual Representation Learning at Scale`:
This second option is useful when using :obj:`tf.keras.Model.fit()` method which currently requires having
https://arxiv.org/abs/1911.02116
all the tensors in the first argument of the model call function: :obj:`model(inputs)`.
.. _`tf.keras.Model`:
If you choose this second option, there are three possibilities you can use to gather all the input Tensors