DistilBERT is a small, fast, cheap and light Transformer model
trained by distilling Bert base. It has 40% less parameters than
`bert-base-uncased`, runs 60% faster while preserving over 95% of
Bert's performances as measured on the GLUE language understanding benchmark.
Here are the differences between the interface of Bert and DistilBert:
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
For more information on DistilBERT, please refer to our
@add_start_docstrings("""DistilBert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of
the hidden-states output to compute `span start logits` and `span end logits`). """,
r"""A placeholder identity operator that is argument-insensitive.
"""
def__init__(self,*args,**kwargs):
super(Identity,self).__init__()
defforward(self,input):
returninput
ifnotsix.PY2:
ifnotsix.PY2:
defadd_start_docstrings(*docstr):
defadd_start_docstrings(*docstr):
defdocstring_decorator(fn):
defdocstring_decorator(fn):
...
@@ -54,8 +68,22 @@ else:
...
@@ -54,8 +68,22 @@ else:
classPretrainedConfig(object):
classPretrainedConfig(object):
""" Base class for all configuration classes.
r""" Base class for all configuration classes.
Handle a few common parameters and methods for loading/downloading/saving configurations.
Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations.
Note:
A configuration file can be loaded and saved to disk. Loading the configuration file and using this file to initialize a model does **not** load the model weights.
It only affects the model's configuration.
Class attributes (overridden by derived classes):
- ``pretrained_config_archive_map``: a python ``dict`` of with `short-cut-names` (string) as keys and `url` (string) of associated pretrained model configurations as values.
Parameters:
``finetuning_task``: string, default `None`. Name of the task used to fine-tune the model. This can be used when converting from an original (TensorFlow or PyTorch) checkpoint.
``num_labels``: integer, default `2`. Number of classes to use when the model is a classification model (sequences/tokens)
``output_attentions``: boolean, default `False`. Should the model returns attentions weights.
``output_hidden_states``: string, default `False`. Should the model returns all hidden-states.
``torchscript``: string, default `False`. Is the model used with Torchscript.
"""
"""
pretrained_config_archive_map={}
pretrained_config_archive_map={}
...
@@ -67,8 +95,8 @@ class PretrainedConfig(object):
...
@@ -67,8 +95,8 @@ class PretrainedConfig(object):
self.torchscript=kwargs.pop('torchscript',False)
self.torchscript=kwargs.pop('torchscript',False)
defsave_pretrained(self,save_directory):
defsave_pretrained(self,save_directory):
""" Save a configuration object to a directory, so that it
""" Save a configuration object to the directory `save_directory`, so that it
can be re-loaded using the `from_pretrained(save_directory)` class method.
can be re-loaded using the :func:`~pytorch_transformers.PretrainedConfig.from_pretrained` class method.
"""
"""
assertos.path.isdir(save_directory),"Saving path should be a directory where the model and configuration can be saved"
assertos.path.isdir(save_directory),"Saving path should be a directory where the model and configuration can be saved"
...
@@ -78,33 +106,56 @@ class PretrainedConfig(object):
...
@@ -78,33 +106,56 @@ class PretrainedConfig(object):
r""" Instantiate a PretrainedConfig from a pre-trained model configuration.
r""" Instantiate a :class:`~pytorch_transformers.PretrainedConfig` (or a derived class) from a pre-trained model configuration.
Params:
Parameters:
**pretrained_model_name_or_path**: either:
pretrained_model_name_or_path: either:
- a string with the `shortcut name` of a pre-trained model configuration to load from cache
or download and cache if not already stored in cache (e.g. 'bert-base-uncased').
- a string with the `shortcut name` of a pre-trained model configuration to load from cache or download, e.g.: ``bert-base-uncased``.
- a path to a `directory` containing a configuration file saved
- a path to a `directory` containing a configuration file saved using the :func:`~pytorch_transformers.PretrainedConfig.save_pretrained` method, e.g.: ``./my_model_directory/``.
using the `save_pretrained(save_directory)` method.
- a path or url to a saved configuration JSON `file`, e.g.: ``./my_model_directory/configuration.json``.
- a path or url to a saved configuration `file`.
**cache_dir**: (`optional`) string:
cache_dir: (`optional`) string:
Path to a directory in which a downloaded pre-trained model
Path to a directory in which a downloaded pre-trained model
configuration should be cached if the standard cache should not be used.
configuration should be cached if the standard cache should not be used.
**kwargs**: (`optional`) dict:
Dictionnary of key, values to update the configuration object after loading.
kwargs: (`optional`) dict: key/value pairs with which to update the configuration object after loading.
Can be used to override selected configuration parameters.
- The values in kwargs of any keys which are configuration attributes will be used to override the loaded values.
- Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled by the `return_unused_kwargs` keyword parameter.
Force to (re-)download the model weights and configuration files and override the cached versions if they exists.
proxies: (`optional`) dict, default None:
A dictionary of proxy servers to use by protocol or endpoint, e.g.: {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request.
return_unused_kwargs: (`optional`) bool:
- If False, then this function returns just the final configuration object.
- If True, then this functions returns a tuple `(config, unused_kwargs)` where `unused_kwargs` is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: ie the part of kwargs which has not been used to update `config` and is otherwise ignored.
Examples::
Examples::
>>> config = BertConfig.from_pretrained('bert-base-uncased') # Download configuration from S3 and cache.
# We can't instantiate directly the base class `PretrainedConfig` so let's show the examples on a
>>> config = BertConfig.from_pretrained('./test/saved_model/') # E.g. config (or model) was saved using `save_pretrained('./test/saved_model/')`
@@ -148,7 +199,10 @@ class PretrainedConfig(object):
...
@@ -148,7 +199,10 @@ class PretrainedConfig(object):
kwargs.pop(key,None)
kwargs.pop(key,None)
logger.info("Model config %s",config)
logger.info("Model config %s",config)
returnconfig
ifreturn_unused_kwargs:
returnconfig,kwargs
else:
returnconfig
@classmethod
@classmethod
deffrom_dict(cls,json_object):
deffrom_dict(cls,json_object):
...
@@ -187,14 +241,26 @@ class PretrainedConfig(object):
...
@@ -187,14 +241,26 @@ class PretrainedConfig(object):
classPreTrainedModel(nn.Module):
classPreTrainedModel(nn.Module):
""" Base class for all models. Handle loading/storing model config and
r""" Base class for all models.
a simple interface for dowloading and loading pretrained models.
:class:`~pytorch_transformers.PreTrainedModel` takes care of storing the configuration of the models and handles methods for loading/downloading/saving models
as well as a few methods commons to all models to (i) resize the input embeddings and (ii) prune heads in the self-attention heads.
Class attributes (overridden by derived classes):
- ``config_class``: a class derived from :class:`~pytorch_transformers.PretrainedConfig` to use as configuration class for this model architecture.
- ``pretrained_model_archive_map``: a python ``dict`` of with `short-cut-names` (string) as keys and `url` (string) of associated pretrained weights as values.
- ``load_tf_weights``: a python ``method`` for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments:
- ``model``: an instance of the relevant subclass of :class:`~pytorch_transformers.PreTrainedModel`,
- ``config``: an instance of the relevant subclass of :class:`~pytorch_transformers.PretrainedConfig`,
- ``path``: a path (string) to the TensorFlow checkpoint.
- ``base_model_prefix``: a string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.
"""
"""
config_class=PretrainedConfig
config_class=None
pretrained_model_archive_map={}
pretrained_model_archive_map={}
load_tf_weights=lambdamodel,config,path:None
load_tf_weights=lambdamodel,config,path:None
base_model_prefix=""
base_model_prefix=""
input_embeddings=None
def__init__(self,config,*inputs,**kwargs):
def__init__(self,config,*inputs,**kwargs):
super(PreTrainedModel,self).__init__()
super(PreTrainedModel,self).__init__()
...
@@ -252,17 +318,16 @@ class PreTrainedModel(nn.Module):
...
@@ -252,17 +318,16 @@ class PreTrainedModel(nn.Module):
""" Resize input token embeddings matrix of the model if new_num_tokens != config.vocab_size.
""" Resize input token embeddings matrix of the model if new_num_tokens != config.vocab_size.
Take care of tying weights embeddings afterwards if the model class has a `tie_weights()` method.
Take care of tying weights embeddings afterwards if the model class has a `tie_weights()` method.
Args:
Arguments:
new_num_tokens: (`optional`) int
New number of tokens in the embedding matrix.
new_num_tokens: (`optional`) int:
Increasing the size will add newly initialized vectors at the end
New number of tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end.
Reducing the size will remove vectors from the end
If not provided or None: does nothing and just returns a pointer to the input tokens ``torch.nn.Embeddings`` Module of the model.
If not provided or None: does nothing and just returns a pointer to the input tokens Embedding Module of the model.
Return: ``torch.nn.Embeddings``
Return: ``torch.nn.Embeddings``
Pointer to the input tokens Embedding Module of the model
Pointer to the input tokens Embeddings Module of the model
"""
"""
base_model=getattr(self,self.base_model_prefix,self)# get the base model if needed
base_model=getattr(self,self.base_model_prefix,self)# get the base model if needed
@@ -281,15 +346,17 @@ class PreTrainedModel(nn.Module):
...
@@ -281,15 +346,17 @@ class PreTrainedModel(nn.Module):
defprune_heads(self,heads_to_prune):
defprune_heads(self,heads_to_prune):
""" Prunes heads of the base model.
""" Prunes heads of the base model.
Args:
heads_to_prune: dict of {layer_num (int): list of heads to prune in this layer (list of int)}
Arguments:
heads_to_prune: dict with keys being selected layer indices (`int`) and associated values being the list of heads to prune in said layer (list of `int`).
"""
"""
base_model=getattr(self,self.base_model_prefix,self)# get the base model if needed
base_model=getattr(self,self.base_model_prefix,self)# get the base model if needed
base_model._prune_heads(heads_to_prune)
base_model._prune_heads(heads_to_prune)
defsave_pretrained(self,save_directory):
defsave_pretrained(self,save_directory):
""" Save a model with its configuration file to a directory, so that it
""" Save a model and its configuration file to a directory, so that it
can be re-loaded using the `from_pretrained(save_directory)` class method.
can be re-loaded using the `:func:`~pytorch_transformers.PreTrainedModel.from_pretrained`` class method.
"""
"""
assertos.path.isdir(save_directory),"Saving path should be a directory where the model and configuration can be saved"
assertos.path.isdir(save_directory),"Saving path should be a directory where the model and configuration can be saved"
...
@@ -305,61 +372,88 @@ class PreTrainedModel(nn.Module):
...
@@ -305,61 +372,88 @@ class PreTrainedModel(nn.Module):
r"""Instantiate a pretrained pytorch model from a pre-trained model configuration.
r"""Instantiate a pretrained pytorch model from a pre-trained model configuration.
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are desactivated)
The model is set in evaluation mode by default using ``model.eval()`` (Dropout modules are deactivated)
To train the model, you should first set it back in training mode with `model.train()`
To train the model, you should first set it back in training mode with ``model.train()``
Params:
The warning ``Weights from XXX not initialized from pretrained model`` means that the weights of XXX do not come pre-trained with the rest of the model.
**pretrained_model_name_or_path**: either:
It is up to you to train those weights with a downstream fine-tuning task.
- a string with the `shortcut name` of a pre-trained model to load from cache
or download and cache if not already stored in cache (e.g. 'bert-base-uncased').
The warning ``Weights from XXX not used in YYY`` means that the layer XXX is not used by YYY, therefore those weights are discarded.
- a path to a `directory` containing a configuration file saved
using the `save_pretrained(save_directory)` method.
Parameters:
- a path or url to a tensorflow index checkpoint `file` (e.g. `./tf_model/model.ckpt.index`).
pretrained_model_name_or_path: either:
In this case, ``from_tf`` should be set to True and a configuration object should be
provided as `config` argument. This loading option is slower than converting the TensorFlow
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
checkpoint in a PyTorch model using the provided conversion scripts and loading
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
the PyTorch model afterwards.
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
**config**: an optional configuration for the model to use instead of an automatically loaded configuation.
Configuration can be automatically loaded when:
model_args: (`optional`) Sequence of positional arguments:
- the model is a model provided by the library (loaded with a `shortcut name` of a pre-trained model), or
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
- the model was saved using the `save_pretrained(save_directory)` (loaded by suppling the save directory).
**state_dict**: an optional state dictionnary for the model to use instead of a state dictionary loaded
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
from saved weights file.
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
This option can be used if you want to create a model from a pretrained configuraton but load your own weights.
In this case though, you should check if using `save_pretrained(dir)` and `from_pretrained(save_directory)` is not
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
a simpler option.
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
**cache_dir**: (`optional`) string:
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
state_dict: (`optional`) dict:
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
cache_dir: (`optional`) string:
Path to a directory in which a downloaded pre-trained model
Path to a directory in which a downloaded pre-trained model
configuration should be cached if the standard cache should not be used.
configuration should be cached if the standard cache should not be used.
Force to (re-)download the model weights and configuration files and override the cached versions if they exists.
proxies: (`optional`) dict, default None:
A dictionary of proxy servers to use by protocol or endpoint, e.g.: {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request.
output_loading_info: (`optional`) boolean:
Set to ``True`` to also return a dictionnary containing missing keys, unexpected keys and error messages.
Set to ``True`` to also return a dictionnary containing missing keys, unexpected keys and error messages.
**kwargs**: (`optional`) dict:
Dictionnary of key, values to update the configuration object after loading.
kwargs: (`optional`) Remaining dictionary of keyword arguments:
Can be used to override selected configuration parameters. E.g. ``output_attention=True``
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
Examples::
Examples::
>>> model = BertModel.from_pretrained('bert-base-uncased') # Download model and configuration from S3 and cache.
model = BertModel.from_pretrained('bert-base-uncased') # Download model and configuration from S3 and cache.
>>> model = BertModel.from_pretrained('./test/saved_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
model = BertModel.from_pretrained('./test/saved_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
>>> model = BertModel.from_pretrained('bert-base-uncased', output_attention=True) # Update configuration during loading
model = BertModel.from_pretrained('bert-base-uncased', output_attention=True) # Update configuration during loading
>>> assert model.config.output_attention == True
assert model.config.output_attention == True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
# Loading from a TF checkpoint file instead of a PyTorch model (slower)