Unverified Commit 1551e2dc authored by NielsRogge's avatar NielsRogge Committed by GitHub
Browse files

[WIP] Tapas v4 (tres) (#9117)



* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Test PyTorch scatter

* Set to slow + minify

* Calm flake8 down

* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Add add_pooling_layer argument to TapasModel

Fix comments by @sgugger and @patrickvonplaten

* Fix issue in docs + fix style and quality

* Clean up conversion script and add task parameter to TapasConfig

* Revert the task parameter of TapasConfig

Some minor fixes

* Improve conversion script and add test for absolute position embeddings

* Improve conversion script and add test for absolute position embeddings

* Fix bug with reset_position_index_per_cell arg of the conversion cli

* Add notebooks to the examples directory and fix style and quality

* Apply suggestions from code review

* Move from `nielsr/` to `google/` namespace

* Apply Sylvain's comments
Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
Co-authored-by: default avatarRogge Niels <niels.rogge@howest.be>
Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
parent ad895af9
...@@ -79,6 +79,7 @@ jobs: ...@@ -79,6 +79,7 @@ jobs:
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece] - run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- save_cache: - save_cache:
key: v0.4-{{ checksum "setup.py" }} key: v0.4-{{ checksum "setup.py" }}
paths: paths:
...@@ -105,6 +106,7 @@ jobs: ...@@ -105,6 +106,7 @@ jobs:
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece] - run: pip install .[sklearn,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- save_cache: - save_cache:
key: v0.4-torch-{{ checksum "setup.py" }} key: v0.4-torch-{{ checksum "setup.py" }}
paths: paths:
...@@ -183,6 +185,7 @@ jobs: ...@@ -183,6 +185,7 @@ jobs:
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece] - run: pip install .[sklearn,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- save_cache: - save_cache:
key: v0.4-torch-{{ checksum "setup.py" }} key: v0.4-torch-{{ checksum "setup.py" }}
paths: paths:
......
...@@ -50,6 +50,7 @@ jobs: ...@@ -50,6 +50,7 @@ jobs:
pip install --upgrade pip pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece] pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets pip install git+https://github.com/huggingface/datasets
pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-$(python -c "import torch; print(''.join(torch.__version__)")+$(python -c "import torch; print(''.join(torch.version.cuda.split('.')))").html
- name: Are GPUs recognized by our DL frameworks - name: Are GPUs recognized by our DL frameworks
run: | run: |
...@@ -187,6 +188,7 @@ jobs: ...@@ -187,6 +188,7 @@ jobs:
pip install --upgrade pip pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece] pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets pip install git+https://github.com/huggingface/datasets
pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-$(python -c "import torch; print(''.join(torch.__version__)")+$(python -c "import torch; print(''.join(torch.version.cuda.split('.')))").html
- name: Are GPUs recognized by our DL frameworks - name: Are GPUs recognized by our DL frameworks
run: | run: |
......
...@@ -222,6 +222,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. ...@@ -222,6 +222,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT. ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/transformers/master/model_doc/tapas.html)** released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. 1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
......
...@@ -176,19 +176,22 @@ and conversion utilities for the following models: ...@@ -176,19 +176,22 @@ and conversion utilities for the following models:
30. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a 30. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
31. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL: 31. `TAPAS <https://huggingface.co/transformers/master/model_doc/tapas.html>`__ released with the paper `TAPAS: Weakly
Supervised Table Parsing via Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof
Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
32. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*, Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
32. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model 33. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau. Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
33. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet: 34. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
34. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised 35. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
Zettlemoyer and Veselin Stoyanov. Zettlemoyer and Veselin Stoyanov.
35. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive 36. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
...@@ -269,6 +272,8 @@ TensorFlow and/or Flax. ...@@ -269,6 +272,8 @@ TensorFlow and/or Flax.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| T5 | ✅ | ✅ | ✅ | ✅ | ❌ | | T5 | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| TAPAS | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ | | Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ | | XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
...@@ -382,6 +387,7 @@ TensorFlow and/or Flax. ...@@ -382,6 +387,7 @@ TensorFlow and/or Flax.
model_doc/roberta model_doc/roberta
model_doc/squeezebert model_doc/squeezebert
model_doc/t5 model_doc/t5
model_doc/tapas
model_doc/transformerxl model_doc/transformerxl
model_doc/xlm model_doc/xlm
model_doc/xlmprophetnet model_doc/xlmprophetnet
......
This diff is collapsed.
---
language: en
tags:
- tapas
- masked-lm
license: apache-2.0
---
# TAPAS base model
This model corresponds to the `tapas_inter_masklm_base_reset` checkpoint of the [original Github repository](https://github.com/google-research/tapas).
Disclaimer: The team releasing TAPAS did not write a model card for this model so this model card has been written by
the Hugging Face team and contributors.
## Model description
TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion.
This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it
can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
was pretrained with two objectives:
- Masked language modeling (MLM): taking a (flattened) table and associated context, the model randomly masks 15% of the words in
the input, then runs the entire (partially masked) sequence through the model. The model then has to predict the masked words.
This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other,
or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional
representation of a table and associated text.
- Intermediate pre-training: to encourage numerical reasoning on tables, the authors additionally pre-trained the model by creating
a balanced dataset of millions of syntactically created training examples. Here, the model must predict (classify) whether a sentence
is supported or refuted by the contents of a table. The training examples are created based on synthetic as well as counterfactual statements.
This way, the model learns an inner representation of the English language used in tables and associated texts, which can then be used
to extract features useful for downstream tasks such as answering questions about a table, or determining whether a sentence is entailed
or refuted by the contents of a table. Fine-tuning is done by adding classification heads on top of the pre-trained model, and then jointly
train the randomly initialized classification heads with the base model on a labelled dataset.
## Intended uses & limitations
You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
See the [model hub](https://huggingface.co/models?filter=tapas) to look for fine-tuned versions on a task that interests you.
Here is how to use this model to get the features of a given table-text pair in PyTorch:
```python
from transformers import TapasTokenizer, TapasModel
import pandas as pd
tokenizer = TapasTokenizer.from_pretrained('tapase-base')
model = TapasModel.from_pretrained("tapas-base")
data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"],
'Age': ["56", "45", "59"],
'Number of movies': ["87", "53", "69"]
}
table = pd.DataFrame.from_dict(data)
queries = ["How many movies has George Clooney played in?"]
text = "Replace me by any text you'd like."
encoded_input = tokenizer(table=table, queries=queries, return_tensors='pt')
output = model(**encoded_input)
```
## Training data
For masked language modeling (MLM), a collection of 6.2 million tables was extracted from English Wikipedia: 3.3M of class [Infobox](https://en.wikipedia.org/wiki/Help:Infobox)
and 2.9M of class WikiTable. The author only considered tables with at most 500 cells. As a proxy for questions that appear in the
downstream tasks, the authros extracted the table caption, article title, article description, segment title and text of the segment
the table occurs in as relevant text snippets. In this way, 21.3M snippets were created. For more info, see the original [TAPAS paper](https://www.aclweb.org/anthology/2020.acl-main.398.pdf).
For intermediate pre-training, 2 tasks are introduced: one based on synthetic and the other from counterfactual statements. The first one
generates a sentence by sampling from a set of logical expressions that filter, combine and compare the information on the table, which is
required in table entailment (e.g., knowing that Gerald Ford is taller than the average president requires summing
all presidents and dividing by the number of presidents). The second one corrupts sentences about tables appearing on Wikipedia by swapping
entities for plausible alternatives. Examples of the two tasks can be seen in Figure 1. The procedure is described in detail in section 3 of
the [TAPAS follow-up paper](https://www.aclweb.org/anthology/2020.findings-emnlp.27.pdf).
## Training procedure
### Preprocessing
The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
then of the form:
```
[CLS] Context [SEP] Flattened table [SEP]
```
The details of the masking procedure for each sequence are the following:
- 15% of the tokens are masked.
- In 80% of the cases, the masked tokens are replaced by `[MASK]`.
- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
- In the 10% remaining cases, the masked tokens are left as is.
The details of the creation of the synthetic and counterfactual examples can be found in the [follow-up paper](https://arxiv.org/abs/2010.00571).
### Pretraining
The model was trained on 32 Cloud TPU v3 cores for one million steps with maximum sequence length 512 and batch size of 512.
In this setup, pre-training takes around 3 days. The optimizer used is Adam with a learning rate of 5e-5, and a warmup ratio
of 0.10.
### BibTeX entry and citation info
```bibtex
@misc{herzig2020tapas,
title={TAPAS: Weakly Supervised Table Parsing via Pre-training},
author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
year={2020},
eprint={2004.02349},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
```
```bibtex
@misc{eisenschlos2020understanding,
title={Understanding tables with intermediate pre-training},
author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
year={2020},
eprint={2010.00571},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
\ No newline at end of file
...@@ -69,3 +69,5 @@ Pull Request so it can be included under the Community notebooks. ...@@ -69,3 +69,5 @@ Pull Request so it can be included under the Community notebooks.
|[Classify text with DistilBERT and Tensorflow](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | How to fine-tune DistilBERT for text classification in TensorFlow | [Peter Bayerle](https://github.com/peterbayerle) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)| |[Classify text with DistilBERT and Tensorflow](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | How to fine-tune DistilBERT for text classification in TensorFlow | [Peter Bayerle](https://github.com/peterbayerle) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)|
|[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)| |[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)|
|[Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | How to warm-start a shared *EncoderDecoderModel* with a *roberta-base* checkpoint for summarization on BBC/XSum | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)| |[Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | How to warm-start a shared *EncoderDecoderModel* with a *roberta-base* checkpoint for summarization on BBC/XSum | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)|
|[Fine-tuning TAPAS on Sequential Question Answering (SQA)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb) | How to fine-tune *TapasForQuestionAnswering* with a *tapas-base* checkpoint on the Sequential Question Answering (SQA) dataset | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Fine_tuning_TapasForQuestionAnswering_on_SQAipynb)|
|[Evaluating TAPAS on Table Fact Checking (TabFact)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb) | How to evaluate a fine-tuned *TapasForSequenceClassification* with a *tapas-base-finetuned-tabfact* checkpoint using a combination of the 🤗 datasets and 🤗 transformers libraries | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb)|
...@@ -164,6 +164,7 @@ from .models.retribert import RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, RetriBert ...@@ -164,6 +164,7 @@ from .models.retribert import RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, RetriBert
from .models.roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig, RobertaTokenizer from .models.roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig, RobertaTokenizer
from .models.squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig, SqueezeBertTokenizer from .models.squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig, SqueezeBertTokenizer
from .models.t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config from .models.t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config
from .models.tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig, TapasTokenizer
from .models.transfo_xl import ( from .models.transfo_xl import (
TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP, TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP,
TransfoXLConfig, TransfoXLConfig,
...@@ -605,6 +606,13 @@ if is_torch_available(): ...@@ -605,6 +606,13 @@ if is_torch_available():
T5PreTrainedModel, T5PreTrainedModel,
load_tf_weights_in_t5, load_tf_weights_in_t5,
) )
from .models.tapas import (
TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST,
TapasForMaskedLM,
TapasForQuestionAnswering,
TapasForSequenceClassification,
TapasModel,
)
from .models.transfo_xl import ( from .models.transfo_xl import (
TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST, TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST,
AdaptiveEmbedding, AdaptiveEmbedding,
......
...@@ -216,6 +216,29 @@ except ImportError: ...@@ -216,6 +216,29 @@ except ImportError:
_tokenizers_available = False _tokenizers_available = False
try:
import pandas # noqa: F401
_pandas_available = True
except ImportError:
_pandas_available = False
try:
import torch_scatter
# Check we're not importing a "torch_scatter" directory somewhere
_scatter_available = hasattr(torch_scatter, "__version__") and hasattr(torch_scatter, "scatter")
if _scatter_available:
logger.debug(f"Succesfully imported torch-scatter version {torch_scatter.__version__}")
else:
logger.debug("Imported a torch_scatter object but this doesn't seem to be the torch-scatter library.")
except ImportError:
_scatter_available = False
old_default_cache_path = os.path.join(torch_cache_home, "transformers") old_default_cache_path = os.path.join(torch_cache_home, "transformers")
# New default cache, shared with the Datasets library # New default cache, shared with the Datasets library
hf_cache_home = os.path.expanduser( hf_cache_home = os.path.expanduser(
...@@ -325,6 +348,14 @@ def is_in_notebook(): ...@@ -325,6 +348,14 @@ def is_in_notebook():
return _in_notebook return _in_notebook
def is_scatter_available():
return _scatter_available
def is_pandas_available():
return _pandas_available
def torch_only_method(fn): def torch_only_method(fn):
def wrapper(*args, **kwargs): def wrapper(*args, **kwargs):
if not _torch_available: if not _torch_available:
...@@ -427,6 +458,13 @@ installation page: https://github.com/google/flax and follow the ones that match ...@@ -427,6 +458,13 @@ installation page: https://github.com/google/flax and follow the ones that match
""" """
# docstyle-ignore
SCATTER_IMPORT_ERROR = """
{0} requires the torch-scatter library but it was not found in your environment. You can install it with pip as
explained here: https://github.com/rusty1s/pytorch_scatter.
"""
def requires_datasets(obj): def requires_datasets(obj):
name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__ name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__
if not is_datasets_available(): if not is_datasets_available():
...@@ -481,6 +519,12 @@ def requires_protobuf(obj): ...@@ -481,6 +519,12 @@ def requires_protobuf(obj):
raise ImportError(PROTOBUF_IMPORT_ERROR.format(name)) raise ImportError(PROTOBUF_IMPORT_ERROR.format(name))
def requires_scatter(obj):
name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__
if not is_scatter_available():
raise ImportError(SCATTER_IMPORT_ERROR.format(name))
def add_start_docstrings(*docstr): def add_start_docstrings(*docstr):
def docstring_decorator(fn): def docstring_decorator(fn):
fn.__doc__ = "".join(docstr) + (fn.__doc__ if fn.__doc__ is not None else "") fn.__doc__ = "".join(docstr) + (fn.__doc__ if fn.__doc__ is not None else "")
......
...@@ -51,6 +51,7 @@ from ..retribert.configuration_retribert import RETRIBERT_PRETRAINED_CONFIG_ARCH ...@@ -51,6 +51,7 @@ from ..retribert.configuration_retribert import RETRIBERT_PRETRAINED_CONFIG_ARCH
from ..roberta.configuration_roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig from ..roberta.configuration_roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig
from ..squeezebert.configuration_squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig from ..squeezebert.configuration_squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig
from ..t5.configuration_t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config from ..t5.configuration_t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config
from ..tapas.configuration_tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig
from ..transfo_xl.configuration_transfo_xl import TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP, TransfoXLConfig from ..transfo_xl.configuration_transfo_xl import TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP, TransfoXLConfig
from ..xlm.configuration_xlm import XLM_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMConfig from ..xlm.configuration_xlm import XLM_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMConfig
from ..xlm_prophetnet.configuration_xlm_prophetnet import ( from ..xlm_prophetnet.configuration_xlm_prophetnet import (
...@@ -95,6 +96,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict( ...@@ -95,6 +96,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP, XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP, PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP, MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP,
] ]
for key, value, in pretrained_map.items() for key, value, in pretrained_map.items()
) )
...@@ -141,6 +143,7 @@ CONFIG_MAPPING = OrderedDict( ...@@ -141,6 +143,7 @@ CONFIG_MAPPING = OrderedDict(
("dpr", DPRConfig), ("dpr", DPRConfig),
("layoutlm", LayoutLMConfig), ("layoutlm", LayoutLMConfig),
("rag", RagConfig), ("rag", RagConfig),
("tapas", TapasConfig),
] ]
) )
...@@ -185,6 +188,7 @@ MODEL_NAMES_MAPPING = OrderedDict( ...@@ -185,6 +188,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("prophetnet", "ProphetNet"), ("prophetnet", "ProphetNet"),
("mt5", "mT5"), ("mt5", "mT5"),
("mpnet", "MPNet"), ("mpnet", "MPNet"),
("tapas", "TAPAS"),
] ]
) )
......
...@@ -165,6 +165,12 @@ from ..squeezebert.modeling_squeezebert import ( ...@@ -165,6 +165,12 @@ from ..squeezebert.modeling_squeezebert import (
SqueezeBertModel, SqueezeBertModel,
) )
from ..t5.modeling_t5 import T5ForConditionalGeneration, T5Model from ..t5.modeling_t5 import T5ForConditionalGeneration, T5Model
from ..tapas.modeling_tapas import (
TapasForMaskedLM,
TapasForQuestionAnswering,
TapasForSequenceClassification,
TapasModel,
)
from ..transfo_xl.modeling_transfo_xl import TransfoXLForSequenceClassification, TransfoXLLMHeadModel, TransfoXLModel from ..transfo_xl.modeling_transfo_xl import TransfoXLForSequenceClassification, TransfoXLLMHeadModel, TransfoXLModel
from ..xlm.modeling_xlm import ( from ..xlm.modeling_xlm import (
XLMForMultipleChoice, XLMForMultipleChoice,
...@@ -230,6 +236,7 @@ from .configuration_auto import ( ...@@ -230,6 +236,7 @@ from .configuration_auto import (
RobertaConfig, RobertaConfig,
SqueezeBertConfig, SqueezeBertConfig,
T5Config, T5Config,
TapasConfig,
TransfoXLConfig, TransfoXLConfig,
XLMConfig, XLMConfig,
XLMProphetNetConfig, XLMProphetNetConfig,
...@@ -277,6 +284,7 @@ MODEL_MAPPING = OrderedDict( ...@@ -277,6 +284,7 @@ MODEL_MAPPING = OrderedDict(
(XLMProphetNetConfig, XLMProphetNetModel), (XLMProphetNetConfig, XLMProphetNetModel),
(ProphetNetConfig, ProphetNetModel), (ProphetNetConfig, ProphetNetModel),
(MPNetConfig, MPNetModel), (MPNetConfig, MPNetModel),
(TapasConfig, TapasModel),
] ]
) )
...@@ -308,6 +316,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict( ...@@ -308,6 +316,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict(
(LxmertConfig, LxmertForPreTraining), (LxmertConfig, LxmertForPreTraining),
(FunnelConfig, FunnelForPreTraining), (FunnelConfig, FunnelForPreTraining),
(MPNetConfig, MPNetForMaskedLM), (MPNetConfig, MPNetForMaskedLM),
(TapasConfig, TapasForMaskedLM),
] ]
) )
...@@ -340,6 +349,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict( ...@@ -340,6 +349,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
(ReformerConfig, ReformerModelWithLMHead), (ReformerConfig, ReformerModelWithLMHead),
(FunnelConfig, FunnelForMaskedLM), (FunnelConfig, FunnelForMaskedLM),
(MPNetConfig, MPNetForMaskedLM), (MPNetConfig, MPNetForMaskedLM),
(TapasConfig, TapasForMaskedLM),
] ]
) )
...@@ -386,6 +396,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict( ...@@ -386,6 +396,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
(ReformerConfig, ReformerForMaskedLM), (ReformerConfig, ReformerForMaskedLM),
(FunnelConfig, FunnelForMaskedLM), (FunnelConfig, FunnelForMaskedLM),
(MPNetConfig, MPNetForMaskedLM), (MPNetConfig, MPNetForMaskedLM),
(TapasConfig, TapasForMaskedLM),
] ]
) )
...@@ -431,6 +442,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict( ...@@ -431,6 +442,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
(CTRLConfig, CTRLForSequenceClassification), (CTRLConfig, CTRLForSequenceClassification),
(TransfoXLConfig, TransfoXLForSequenceClassification), (TransfoXLConfig, TransfoXLForSequenceClassification),
(MPNetConfig, MPNetForSequenceClassification), (MPNetConfig, MPNetForSequenceClassification),
(TapasConfig, TapasForSequenceClassification),
] ]
) )
...@@ -455,6 +467,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict( ...@@ -455,6 +467,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
(FunnelConfig, FunnelForQuestionAnswering), (FunnelConfig, FunnelForQuestionAnswering),
(LxmertConfig, LxmertForQuestionAnswering), (LxmertConfig, LxmertForQuestionAnswering),
(MPNetConfig, MPNetForQuestionAnswering), (MPNetConfig, MPNetForQuestionAnswering),
(TapasConfig, TapasForQuestionAnswering),
] ]
) )
......
...@@ -47,6 +47,7 @@ from ..rag.tokenization_rag import RagTokenizer ...@@ -47,6 +47,7 @@ from ..rag.tokenization_rag import RagTokenizer
from ..retribert.tokenization_retribert import RetriBertTokenizer from ..retribert.tokenization_retribert import RetriBertTokenizer
from ..roberta.tokenization_roberta import RobertaTokenizer from ..roberta.tokenization_roberta import RobertaTokenizer
from ..squeezebert.tokenization_squeezebert import SqueezeBertTokenizer from ..squeezebert.tokenization_squeezebert import SqueezeBertTokenizer
from ..tapas.tokenization_tapas import TapasTokenizer
from ..transfo_xl.tokenization_transfo_xl import TransfoXLTokenizer from ..transfo_xl.tokenization_transfo_xl import TransfoXLTokenizer
from ..xlm.tokenization_xlm import XLMTokenizer from ..xlm.tokenization_xlm import XLMTokenizer
from .configuration_auto import ( from .configuration_auto import (
...@@ -84,6 +85,7 @@ from .configuration_auto import ( ...@@ -84,6 +85,7 @@ from .configuration_auto import (
RobertaConfig, RobertaConfig,
SqueezeBertConfig, SqueezeBertConfig,
T5Config, T5Config,
TapasConfig,
TransfoXLConfig, TransfoXLConfig,
XLMConfig, XLMConfig,
XLMProphetNetConfig, XLMProphetNetConfig,
...@@ -223,6 +225,7 @@ TOKENIZER_MAPPING = OrderedDict( ...@@ -223,6 +225,7 @@ TOKENIZER_MAPPING = OrderedDict(
(XLMProphetNetConfig, (XLMProphetNetTokenizer, None)), (XLMProphetNetConfig, (XLMProphetNetTokenizer, None)),
(ProphetNetConfig, (ProphetNetTokenizer, None)), (ProphetNetConfig, (ProphetNetTokenizer, None)),
(MPNetConfig, (MPNetTokenizer, MPNetTokenizerFast)), (MPNetConfig, (MPNetTokenizer, MPNetTokenizerFast)),
(TapasConfig, (TapasTokenizer, None)),
] ]
) )
......
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2020 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ...file_utils import is_torch_available
from .configuration_tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig
from .tokenization_tapas import TapasTokenizer
if is_torch_available():
from .modeling_tapas import (
TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST,
TapasForMaskedLM,
TapasForQuestionAnswering,
TapasForSequenceClassification,
TapasModel,
)
# coding=utf-8
# Copyright 2020 Google Research and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
TAPAS configuration. Based on the BERT configuration with added parameters.
Hyperparameters are taken from run_task_main.py and hparam_utils.py of the original implementation. URLS:
- https://github.com/google-research/tapas/blob/master/tapas/run_task_main.py
- https://github.com/google-research/tapas/blob/master/tapas/utils/hparam_utils.py
"""
from ...configuration_utils import PretrainedConfig
TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"google/tapas-base-finetuned-sqa": "https://huggingface.co/google/tapas-base-finetuned-sqa/resolve/main/config.json",
"google/tapas-base-finetuned-wtq": "https://huggingface.co/google/tapas-base-finetuned-wtq/resolve/main/config.json",
"google/tapas-base-finetuned-wikisql-supervised": "https://huggingface.co/google/tapas-base-finetuned-wikisql-supervised/resolve/main/config.json",
"google/tapas-base-finetuned-tabfact": "https://huggingface.co/google/tapas-base-finetuned-tabfact/resolve/main/config.json",
}
class TapasConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.TapasModel`. It is used to
instantiate a TAPAS model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the TAPAS `tapas-base-finetuned-sqa`
architecture. Configuration objects inherit from :class:`~transformers.PreTrainedConfig` and can be used to control
the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Hyperparameters additional to BERT are taken from run_task_main.py and hparam_utils.py of the original
implementation. Original implementation available at https://github.com/google-research/tapas/tree/master.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522):
Vocabulary size of the TAPAS model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.TapasModel`.
hidden_size (:obj:`int`, `optional`, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 1024):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_sizes (:obj:`List[int]`, `optional`, defaults to :obj:`[3, 256, 256, 2, 256, 256, 10]`):
The vocabulary sizes of the :obj:`token_type_ids` passed when calling :class:`~transformers.TapasModel`.
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
The epsilon used by the layer normalization layers.
gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to use gradient checkpointing to save memory at the expense of a slower backward pass.
positive_label_weight (:obj:`float`, `optional`, defaults to 10.0):
Weight for positive labels.
num_aggregation_labels (:obj:`int`, `optional`, defaults to 0):
The number of aggregation operators to predict.
aggregation_loss_weight (:obj:`float`, `optional`, defaults to 1.0):
Importance weight for the aggregation loss.
use_answer_as_supervision (:obj:`bool`, `optional`):
Whether to use the answer as the only supervision for aggregation examples.
answer_loss_importance (:obj:`float`, `optional`, defaults to 1.0):
Importance weight for the regression loss.
use_normalized_answer_loss (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to normalize the answer loss by the maximum of the predicted and expected value.
huber_loss_delta: (:obj:`float`, `optional`):
Delta parameter used to calculate the regression loss.
temperature: (:obj:`float`, `optional`, defaults to 1.0):
Value used to control (OR change) the skewness of cell logits probabilities.
aggregation_temperature: (:obj:`float`, `optional`, defaults to 1.0):
Scales aggregation logits to control the skewness of probabilities.
use_gumbel_for_cells: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to apply Gumbel-Softmax to cell selection.
use_gumbel_for_aggregation: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to apply Gumbel-Softmax to aggregation selection.
average_approximation_function: (:obj:`string`, `optional`, defaults to :obj:`"ratio"`):
Method to calculate the expected average of cells in the weak supervision case. One of :obj:`"ratio"`,
:obj:`"first_order"` or :obj:`"second_order"`.
cell_selection_preference: (:obj:`float`, `optional`):
Preference for cell selection in ambiguous cases. Only applicable in case of weak supervision for
aggregation (WTQ, WikiSQL). If the total mass of the aggregation probabilities (excluding the "NONE"
operator) is higher than this hyperparameter, then aggregation is predicted for an example.
answer_loss_cutoff: (:obj:`float`, `optional`):
Ignore examples with answer loss larger than cutoff.
max_num_rows: (:obj:`int`, `optional`, defaults to 64):
Maximum number of rows.
max_num_columns: (:obj:`int`, `optional`, defaults to 32):
Maximum number of columns.
average_logits_per_cell: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to average logits per cell.
select_one_column: (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether to constrain the model to only select cells from a single column.
allow_empty_column_selection: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to allow not to select any column.
init_cell_selection_weights_to_zero: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to initialize cell selection weights to 0 so that the initial probabilities are 50%.
reset_position_index_per_cell: (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether to restart position indexes at every cell (i.e. use relative position embeddings).
disable_per_token_loss: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to disable any (strong or weak) supervision on cells.
Example::
>>> from transformers import TapasModel, TapasConfig
>>> # Initializing a default (SQA) Tapas configuration
>>> configuration = TapasConfig()
>>> # Initializing a model from the configuration
>>> model = TapasModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
"""
model_type = "tapas"
def __init__(
self,
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=1024,
type_vocab_sizes=[3, 256, 256, 2, 256, 256, 10],
initializer_range=0.02,
layer_norm_eps=1e-12,
pad_token_id=0,
gradient_checkpointing=False,
positive_label_weight=10.0,
num_aggregation_labels=0,
aggregation_loss_weight=1.0,
use_answer_as_supervision=None,
answer_loss_importance=1.0,
use_normalized_answer_loss=False,
huber_loss_delta=None,
temperature=1.0,
aggregation_temperature=1.0,
use_gumbel_for_cells=False,
use_gumbel_for_aggregation=False,
average_approximation_function="ratio",
cell_selection_preference=None,
answer_loss_cutoff=None,
max_num_rows=64,
max_num_columns=32,
average_logits_per_cell=False,
select_one_column=True,
allow_empty_column_selection=False,
init_cell_selection_weights_to_zero=False,
reset_position_index_per_cell=True,
disable_per_token_loss=False,
**kwargs
):
super().__init__(pad_token_id=pad_token_id, **kwargs)
# BERT hyperparameters (with updated max_position_embeddings and type_vocab_sizes)
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.intermediate_size = intermediate_size
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.type_vocab_sizes = type_vocab_sizes
self.initializer_range = initializer_range
self.layer_norm_eps = layer_norm_eps
self.gradient_checkpointing = gradient_checkpointing
# Fine-tuning task hyperparameters
self.positive_label_weight = positive_label_weight
self.num_aggregation_labels = num_aggregation_labels
self.aggregation_loss_weight = aggregation_loss_weight
self.use_answer_as_supervision = use_answer_as_supervision
self.answer_loss_importance = answer_loss_importance
self.use_normalized_answer_loss = use_normalized_answer_loss
self.huber_loss_delta = huber_loss_delta
self.temperature = temperature
self.aggregation_temperature = aggregation_temperature
self.use_gumbel_for_cells = use_gumbel_for_cells
self.use_gumbel_for_aggregation = use_gumbel_for_aggregation
self.average_approximation_function = average_approximation_function
self.cell_selection_preference = cell_selection_preference
self.answer_loss_cutoff = answer_loss_cutoff
self.max_num_rows = max_num_rows
self.max_num_columns = max_num_columns
self.average_logits_per_cell = average_logits_per_cell
self.select_one_column = select_one_column
self.allow_empty_column_selection = allow_empty_column_selection
self.init_cell_selection_weights_to_zero = init_cell_selection_weights_to_zero
self.reset_position_index_per_cell = reset_position_index_per_cell
self.disable_per_token_loss = disable_per_token_loss
# coding=utf-8
# Copyright 2020 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Convert TAPAS checkpoint."""
import argparse
from transformers.models.tapas.modeling_tapas import (
TapasConfig,
TapasForMaskedLM,
TapasForQuestionAnswering,
TapasForSequenceClassification,
TapasModel,
load_tf_weights_in_tapas,
)
from transformers.models.tapas.tokenization_tapas import TapasTokenizer
from transformers.utils import logging
logging.set_verbosity_info()
def convert_tf_checkpoint_to_pytorch(
task, reset_position_index_per_cell, tf_checkpoint_path, tapas_config_file, pytorch_dump_path
):
# Initialise PyTorch model.
# If you want to convert a checkpoint that uses absolute position embeddings, make sure to set reset_position_index_per_cell of
# TapasConfig to False.
# initialize configuration from json file
config = TapasConfig.from_json_file(tapas_config_file)
# set absolute/relative position embeddings parameter
config.reset_position_index_per_cell = reset_position_index_per_cell
# set remaining parameters of TapasConfig as well as the model based on the task
if task == "SQA":
model = TapasForQuestionAnswering(config=config)
elif task == "WTQ":
# run_task_main.py hparams
config.num_aggregation_labels = 4
config.use_answer_as_supervision = True
# hparam_utils.py hparams
config.answer_loss_cutoff = 0.664694
config.cell_selection_preference = 0.207951
config.huber_loss_delta = 0.121194
config.init_cell_selection_weights_to_zero = True
config.select_one_column = True
config.allow_empty_column_selection = False
config.temperature = 0.0352513
model = TapasForQuestionAnswering(config=config)
elif task == "WIKISQL_SUPERVISED":
# run_task_main.py hparams
config.num_aggregation_labels = 4
config.use_answer_as_supervision = False
# hparam_utils.py hparams
config.answer_loss_cutoff = 36.4519
config.cell_selection_preference = 0.903421
config.huber_loss_delta = 222.088
config.init_cell_selection_weights_to_zero = True
config.select_one_column = True
config.allow_empty_column_selection = True
config.temperature = 0.763141
model = TapasForQuestionAnswering(config=config)
elif task == "TABFACT":
model = TapasForSequenceClassification(config=config)
elif task == "MLM":
model = TapasForMaskedLM(config=config)
elif task == "INTERMEDIATE_PRETRAINING":
model = TapasModel(config=config)
print("Building PyTorch model from configuration: {}".format(str(config)))
# Load weights from tf checkpoint
load_tf_weights_in_tapas(model, config, tf_checkpoint_path)
# Save pytorch-model (weights and configuration)
print("Save PyTorch model to {}".format(pytorch_dump_path))
model.save_pretrained(pytorch_dump_path[:-17])
# Save tokenizer files
dir_name = r"C:\Users\niels.rogge\Documents\Python projecten\tensorflow\Tensorflow models\SQA\Base\tapas_sqa_inter_masklm_base_reset"
tokenizer = TapasTokenizer(vocab_file=dir_name + r"\vocab.txt", model_max_length=512)
print("Save tokenizer files to {}".format(pytorch_dump_path))
tokenizer.save_pretrained(pytorch_dump_path[:-17])
print("Used relative position embeddings:", model.config.reset_position_index_per_cell)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# Required parameters
parser.add_argument(
"--task", default="SQA", type=str, help="Model task for which to convert a checkpoint. Defaults to SQA."
)
parser.add_argument(
"--reset_position_index_per_cell",
default=False,
action="store_true",
help="Whether to use relative position embeddings or not. Defaults to True.",
)
parser.add_argument(
"--tf_checkpoint_path", default=None, type=str, required=True, help="Path to the TensorFlow checkpoint path."
)
parser.add_argument(
"--tapas_config_file",
default=None,
type=str,
required=True,
help="The config json file corresponding to the pre-trained TAPAS model. \n"
"This specifies the model architecture.",
)
parser.add_argument(
"--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
)
args = parser.parse_args()
convert_tf_checkpoint_to_pytorch(
args.task,
args.reset_position_index_per_cell,
args.tf_checkpoint_path,
args.tapas_config_file,
args.pytorch_dump_path,
)
This diff is collapsed.
This diff is collapsed.
...@@ -28,6 +28,8 @@ from .file_utils import ( ...@@ -28,6 +28,8 @@ from .file_utils import (
_datasets_available, _datasets_available,
_faiss_available, _faiss_available,
_flax_available, _flax_available,
_pandas_available,
_scatter_available,
_sentencepiece_available, _sentencepiece_available,
_tf_available, _tf_available,
_tokenizers_available, _tokenizers_available,
...@@ -221,6 +223,27 @@ def require_tokenizers(test_case): ...@@ -221,6 +223,27 @@ def require_tokenizers(test_case):
return test_case return test_case
def require_pandas(test_case):
"""
Decorator marking a test that requires pandas. These tests are skipped when pandas isn't installed.
"""
if not _pandas_available:
return unittest.skip("test requires pandas")(test_case)
else:
return test_case
def require_scatter(test_case):
"""
Decorator marking a test that requires PyTorch Scatter. These tests are skipped when PyTorch Scatter isn't
installed.
"""
if not _scatter_available:
return unittest.skip("test requires PyTorch Scatter")(test_case)
else:
return test_case
def require_torch_multi_gpu(test_case): def require_torch_multi_gpu(test_case):
""" """
Decorator marking a test that requires a multi-GPU setup (in PyTorch). Decorator marking a test that requires a multi-GPU setup (in PyTorch).
......
...@@ -1867,6 +1867,45 @@ def load_tf_weights_in_t5(*args, **kwargs): ...@@ -1867,6 +1867,45 @@ def load_tf_weights_in_t5(*args, **kwargs):
requires_pytorch(load_tf_weights_in_t5) requires_pytorch(load_tf_weights_in_t5)
TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST = None
class TapasForMaskedLM:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class TapasForQuestionAnswering:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class TapasForSequenceClassification:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class TapasModel:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST = None TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST = None
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment