Commit 30968d70 authored by Julien Chaumond's avatar Julien Chaumond
Browse files

misc doc

parent de890ae6
...@@ -106,7 +106,7 @@ This section explain how you can save and re-load a fine-tuned model (BERT, GPT, ...@@ -106,7 +106,7 @@ This section explain how you can save and re-load a fine-tuned model (BERT, GPT,
There are three types of files you need to save to be able to reload a fine-tuned model: There are three types of files you need to save to be able to reload a fine-tuned model:
* the model it-self which should be saved following PyTorch serialization `best practices <https://pytorch.org/docs/stable/notes/serialization.html#best-practices>`__\ , * the model itself which should be saved following PyTorch serialization `best practices <https://pytorch.org/docs/stable/notes/serialization.html#best-practices>`__\ ,
* the configuration file of the model which is saved as a JSON file, and * the configuration file of the model which is saved as a JSON file, and
* the vocabulary (and the merges for the BPE-based models GPT and GPT-2). * the vocabulary (and the merges for the BPE-based models GPT and GPT-2).
......
...@@ -237,7 +237,7 @@ def main(): ...@@ -237,7 +237,7 @@ def main():
# Save a trained model # Save a trained model
if args.do_train: if args.do_train:
# Save a trained model, configuration and tokenizer # Save a trained model, configuration and tokenizer
model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self model_to_save = model.module if hasattr(model, 'module') else model # Only save the model itself
# If we save using the predefined names, we can load using `from_pretrained` # If we save using the predefined names, we can load using `from_pretrained`
output_model_file = os.path.join(args.output_dir, WEIGHTS_NAME) output_model_file = os.path.join(args.output_dir, WEIGHTS_NAME)
......
...@@ -7,7 +7,7 @@ The library is designed to incorporate a variety of models and code bases. As su ...@@ -7,7 +7,7 @@ The library is designed to incorporate a variety of models and code bases. As su
One important point though is that the library has the following goals impacting the way models are incorporated: One important point though is that the library has the following goals impacting the way models are incorporated:
- one specific feature of the API is the capability to run the model and tokenizer inline. The tokenization code thus often have to be slightly adapted to allow for running in the python interpreter. - one specific feature of the API is the capability to run the model and tokenizer inline. The tokenization code thus often have to be slightly adapted to allow for running in the python interpreter.
- the package is also designed to be as self-consistent and with a small and reliable set of packages dependencies. In consequence, additional dependencies are usually not allowed when adding a model but can be allowed for the inclusion of a new tokenizer (recent examples of dependencies added for tokenizer specificites includes `sentencepiece` and `sacremoses`). Please make sure to check the existing dependencies when possible before adding a new one. - the package is also designed to be as self-consistent and with a small and reliable set of packages dependencies. In consequence, additional dependencies are usually not allowed when adding a model but can be allowed for the inclusion of a new tokenizer (recent examples of dependencies added for tokenizer specificities include `sentencepiece` and `sacremoses`). Please make sure to check the existing dependencies when possible before adding a new one.
For a quick overview of the library organization, please check the [QuickStart section of the documentation](https://huggingface.co/transformers/quickstart.html). For a quick overview of the library organization, please check the [QuickStart section of the documentation](https://huggingface.co/transformers/quickstart.html).
...@@ -20,7 +20,7 @@ Here an overview of the general workflow: ...@@ -20,7 +20,7 @@ Here an overview of the general workflow:
- [ ] add tests - [ ] add tests
- [ ] finalize - [ ] finalize
Let's details what should be done at each step Let's detail what should be done at each step
## Adding model/configuration/tokenization classes ## Adding model/configuration/tokenization classes
...@@ -28,16 +28,16 @@ Here is the workflow for adding model/configuration/tokenization classes: ...@@ -28,16 +28,16 @@ Here is the workflow for adding model/configuration/tokenization classes:
- [ ] copy the python files from the present folder to the main folder and rename them, replacing `xxx` with your model name, - [ ] copy the python files from the present folder to the main folder and rename them, replacing `xxx` with your model name,
- [ ] edit the files to replace `XXX` (with various casing) with your model name - [ ] edit the files to replace `XXX` (with various casing) with your model name
- [ ] copy-past or create a simple configuration class for your model in the `configuration_...` file - [ ] copy-paste or create a simple configuration class for your model in the `configuration_...` file
- [ ] copy-past or create the code for your model in the `modeling_...` files (PyTorch and TF 2.0) - [ ] copy-paste or create the code for your model in the `modeling_...` files (PyTorch and TF 2.0)
- [ ] copy-past or create a tokenizer class for your model in the `tokenization_...` file - [ ] copy-paste or create a tokenizer class for your model in the `tokenization_...` file
# Adding conversion scripts # Adding conversion scripts
Here is the workflow for the conversion scripts: Here is the workflow for the conversion scripts:
- [ ] copy the conversion script (`convert_...`) from the present folder to the main folder. - [ ] copy the conversion script (`convert_...`) from the present folder to the main folder.
- [ ] edit this scipt to convert your original checkpoint weights to the current pytorch ones. - [ ] edit this script to convert your original checkpoint weights to the current pytorch ones.
# Adding tests: # Adding tests:
...@@ -58,5 +58,5 @@ You can then finish the addition step by adding imports for your classes in the ...@@ -58,5 +58,5 @@ You can then finish the addition step by adding imports for your classes in the
- [ ] add your models and tokenizer to `pipeline.py` - [ ] add your models and tokenizer to `pipeline.py`
- [ ] add a link to your conversion script in the main conversion utility (currently in `__main__` but will be moved to the `commands` subfolder in the near future) - [ ] add a link to your conversion script in the main conversion utility (currently in `__main__` but will be moved to the `commands` subfolder in the near future)
- [ ] edit the PyTorch to TF 2.0 conversion script to add your model in the `convert_pytorch_checkpoint_to_tf2.py` file - [ ] edit the PyTorch to TF 2.0 conversion script to add your model in the `convert_pytorch_checkpoint_to_tf2.py` file
- [ ] add a mention of your model in the doc: `README.md` and the documentation it-self at `docs/source/pretrained_models.rst`. - [ ] add a mention of your model in the doc: `README.md` and the documentation itself at `docs/source/pretrained_models.rst`.
- [ ] upload the pretrained weigths, configurations and vocabulary files. - [ ] upload the pretrained weigths, configurations and vocabulary files.
...@@ -49,7 +49,7 @@ TF_XXX_PRETRAINED_MODEL_ARCHIVE_MAP = { ...@@ -49,7 +49,7 @@ TF_XXX_PRETRAINED_MODEL_ARCHIVE_MAP = {
#################################################### ####################################################
# TF 2.0 Models are constructed using Keras imperative API by sub-classing # TF 2.0 Models are constructed using Keras imperative API by sub-classing
# - tf.keras.layers.Layer for the layers and # - tf.keras.layers.Layer for the layers and
# - TFPreTrainedModel for the models (it-self a sub-class of tf.keras.Model) # - TFPreTrainedModel for the models (itself a sub-class of tf.keras.Model)
#################################################### ####################################################
#################################################### ####################################################
......
...@@ -120,7 +120,7 @@ def load_tf_weights_in_xxx(model, config, tf_checkpoint_path): ...@@ -120,7 +120,7 @@ def load_tf_weights_in_xxx(model, config, tf_checkpoint_path):
#################################################### ####################################################
# PyTorch Models are constructed by sub-classing # PyTorch Models are constructed by sub-classing
# - torch.nn.Module for the layers and # - torch.nn.Module for the layers and
# - PreTrainedModel for the models (it-self a sub-class of torch.nn.Module) # - PreTrainedModel for the models (itself a sub-class of torch.nn.Module)
#################################################### ####################################################
#################################################### ####################################################
...@@ -300,10 +300,19 @@ class XxxModel(XxxPreTrainedModel): ...@@ -300,10 +300,19 @@ class XxxModel(XxxPreTrainedModel):
self.encoder.layer[layer].attention.prune_heads(heads) self.encoder.layer[layer].attention.prune_heads(heads)
def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None): def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None):
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None:
input_shape = input_ids.size()
elif inputs_embeds is not None:
input_shape = inputs_embeds.size()[:-1]
else:
raise ValueError("You have to specify either input_ids or inputs_embeds")
if attention_mask is None: if attention_mask is None:
attention_mask = torch.ones_like(input_ids) attention_mask = torch.ones(input_shape)
if token_type_ids is None: if token_type_ids is None:
token_type_ids = torch.zeros_like(input_ids) token_type_ids = torch.zeros(input_shape, dtype=torch.long)
# We create a 3D attention mask from a 2D tensor mask. # We create a 3D attention mask from a 2D tensor mask.
# Sizes are [batch_size, 1, 1, to_seq_length] # Sizes are [batch_size, 1, 1, to_seq_length]
...@@ -337,7 +346,7 @@ class XxxModel(XxxPreTrainedModel): ...@@ -337,7 +346,7 @@ class XxxModel(XxxPreTrainedModel):
################################## ##################################
# Replace this with your model code # Replace this with your model code
embedding_output = self.embeddings(input_ids, position_ids=position_ids, token_type_ids=token_type_ids) embedding_output = self.embeddings(input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds)
encoder_outputs = self.encoder(embedding_output, extended_attention_mask, head_mask=head_mask) encoder_outputs = self.encoder(embedding_output, extended_attention_mask, head_mask=head_mask)
sequence_output = encoder_outputs[0] sequence_output = encoder_outputs[0]
outputs = (sequence_output,) + encoder_outputs[1:] # add hidden_states and attentions if they are here outputs = (sequence_output,) + encoder_outputs[1:] # add hidden_states and attentions if they are here
...@@ -388,14 +397,15 @@ class XxxForMaskedLM(XxxPreTrainedModel): ...@@ -388,14 +397,15 @@ class XxxForMaskedLM(XxxPreTrainedModel):
def get_output_embeddings(self): def get_output_embeddings(self):
return self.lm_head return self.lm_head
def forward(self, input_ids, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None,
masked_lm_labels=None): masked_lm_labels=None):
outputs = self.transformer(input_ids, outputs = self.transformer(input_ids,
attention_mask=attention_mask, attention_mask=attention_mask,
token_type_ids=token_type_ids, token_type_ids=token_type_ids,
position_ids=position_ids, position_ids=position_ids,
head_mask=head_mask) head_mask=head_mask,
inputs_embeds=inputs_embeds)
sequence_output = outputs[0] sequence_output = outputs[0]
prediction_scores = self.cls(sequence_output) prediction_scores = self.cls(sequence_output)
......
...@@ -238,7 +238,7 @@ class PreTrainedModel(nn.Module): ...@@ -238,7 +238,7 @@ class PreTrainedModel(nn.Module):
""" """
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved" assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
# Only save the model it-self if we are using distributed training # Only save the model itself if we are using distributed training
model_to_save = self.module if hasattr(self, 'module') else self model_to_save = self.module if hasattr(self, 'module') else self
# Save configuration file # Save configuration file
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment