Unverified Commit 3bd0007e authored by Jordan Clive's avatar Jordan Clive Committed by GitHub
Browse files

Update documentation on seq2seq models with absolute positional embeddings, to...


Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)
Co-authored-by: default avatarjordiclive <jordiclive19@imperial.ac.uk>
parent 6e1c5786
...@@ -32,6 +32,11 @@ According to the abstract, ...@@ -32,6 +32,11 @@ According to the abstract,
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE. of up to 6 ROUGE.
Tips:
- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart). This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
......
...@@ -46,6 +46,8 @@ Tips: ...@@ -46,6 +46,8 @@ Tips:
- Sequence length must be divisible by block size. - Sequence length must be divisible by block size.
- Current implementation supports only **ITC**. - Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0** - Current implementation doesn't support **num_random_blocks = 0**
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
[here](https://github.com/google-research/bigbird). [here](https://github.com/google-research/bigbird).
......
...@@ -47,6 +47,8 @@ Tips: ...@@ -47,6 +47,8 @@ Tips:
- Current implementation supports only **ITC**. - Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0**. - Current implementation doesn't support **num_random_blocks = 0**.
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py). - BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
The original code can be found [here](https://github.com/google-research/bigbird). The original code can be found [here](https://github.com/google-research/bigbird).
......
...@@ -36,6 +36,11 @@ and code publicly available. Human evaluations show our best models are superior ...@@ -36,6 +36,11 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.* failure cases of our models.*
Tips:
- Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
found [here](https://github.com/facebookresearch/ParlAI) . found [here](https://github.com/facebookresearch/ParlAI) .
......
...@@ -32,6 +32,11 @@ and code publicly available. Human evaluations show our best models are superior ...@@ -32,6 +32,11 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.* failure cases of our models.*
Tips:
- Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) . This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .
......
...@@ -50,6 +50,8 @@ Tips: ...@@ -50,6 +50,8 @@ Tips:
flag can be used to disable the caching mechanism to save memory. flag can be used to disable the caching mechanism to save memory.
- A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing). - A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
- A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing). - A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
- LED is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
......
...@@ -35,6 +35,11 @@ dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Giga ...@@ -35,6 +35,11 @@ dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Giga
abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new
state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.* state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.*
Tips:
- ProphetNet is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
The Authors' code can be found [here](https://github.com/microsoft/ProphetNet). The Authors' code can be found [here](https://github.com/microsoft/ProphetNet).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment