"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "c3a93d8d821bc1df3601ba858e7385eada8db3a5"
Unverified Commit b971c769 authored by Younes Belkada's avatar Younes Belkada Committed by GitHub
Browse files

Add OPT (#17088)



* First version - OPT model

* Final changes

- putting use cache to False

* few changes

- remove commented block

* few changes

- remove unecessary files

* fix style issues

* few changes

- remove a test file
- added the logits test

* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* add gen tests

* few changes

- rm mask filling example on docstring

* few changes

- remove useless args

* some changes

- more tests should pass now
- needs to clean more
- documentation still needs to be done

* fix code quality

* major changes

- change attention architecture to BART-like
- modify some tests
- style fix

* rm useless classes

- remove opt for:
- QA
- cond generation
- seq classif

* Removed autodoc calls to non-existant classes

TOkenizers are not implemented

* Update src/transformers/__init__.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/__init__.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/auto/modeling_tf_auto.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Replaced OPTTokeniser with GPT2 tokenizer

* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")

* Removed OPTTokenizer

* make style

* Make style replaces

``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```

* make repo consistency

* Removed PretrainedOPTModel

* fix opt.mdx removed other heads

* fix init, removed 3 heads

* removed heads

* finished cleaning head

* removed seauence classif and question answering

* removed unused imports

* removed useless dummy object for QA, SC and CG

* removed tests for removed useless dummy object for QA, SC and CG

* Removed head_mask using encoder layers which don't exist

* fixed test

* fix line

* added OPT to toctree

* Updated model path with pushed weigths

* fix model path

* fixed code quality

* fixed embeddings and generation tests

* update paths

* clean comments

* removed OPTClassificationHead for sentence classification

* renamed hidden layer

* renamed num layers to standard num_hidden_layers

* num_attention_heads fix

* changes for 125m

* add first version for 125m

* add first version - flax

* add new version

* causal LM output

* replace output type with BaseModelOutputWithPastAndCrossAttentions

* revert working config from 150m to 350m

* clean

* removed decoder input ids

* fixed embed dim

* more embed_dim issues

* make style + removed enc_dec test

* update falx model

* removed troublesome copy

* added is_encoder_decoder=False to config

* added set_input emb fuinction to model class

* requires torch on embed test

* use head mask instead of decoder head mask input param solves a test

* 8 test remaining, update

* Updated create_and_check_decoder_model_past_large_inputs

* Make style

* update op tokenizer with condition

* make style

* See if I can push

* some clean up

* remove linear head hack

* save intermediate

* save correct attention

* add copied from from bart

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* fix part of the reviewss
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* same changes in naming / conversion

* correct mask

* more fixes

* delete FlaxOPT and TfOPT

* clean traces of Flax and Tf

* fix mask

* fixed positionnal embedding length when past key value is provoded

* get 125m, 6.7b to work

* Added do_layer_norm

* solved mismatch in load dictionnary

* clean up preapre opt input dict

* fixed past key value as bool

* fix previus

* fixed return dict False tuple issue

* All tests are passing

* Make style

* Ignore OPTDecoder non tested

* make fix-copies

* make repo consistency

* small fix

* removed uselss @torch.no_grad decorator

* make styl;e

* fix previous opt test

* style

* make style

* added opt documentation

* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST

* up

* more fixes

* model & config work

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* added comment on padding hack (+2)

* cleaup

* review update

* docstring for missing arg

* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/__init__.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* update pretrained map

* update path and tests

* make style

* styling

* make consistency

* add gpt2 tok new

* more tok fixes

* Update src/transformers/models/auto/tokenization_auto.py

* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/opt/test_modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update based on reviews

* Apply suggestions from code review
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>

* make style

* make tokenizer auto tests pass

* apply Lysandre suggestion

* finish tests

* add some good tokenizer tests

* improve docs slighly
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
parent 8c7481f3
...@@ -294,6 +294,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. ...@@ -294,6 +294,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. 1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
......
...@@ -273,6 +273,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 ...@@ -273,6 +273,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. 1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
......
...@@ -297,6 +297,7 @@ conda install -c huggingface transformers ...@@ -297,6 +297,7 @@ conda install -c huggingface transformers
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。 1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (来自 Meta AI) 伴随论文 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
......
...@@ -309,6 +309,7 @@ conda install -c huggingface transformers ...@@ -309,6 +309,7 @@ conda install -c huggingface transformers
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. 1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
......
...@@ -270,6 +270,8 @@ ...@@ -270,6 +270,8 @@
title: Nyströmformer title: Nyströmformer
- local: model_doc/openai-gpt - local: model_doc/openai-gpt
title: OpenAI GPT title: OpenAI GPT
- local: model_doc/opt
title: OPT
- local: model_doc/gpt2 - local: model_doc/gpt2
title: OpenAI GPT2 title: OpenAI GPT2
- local: model_doc/gptj - local: model_doc/gptj
......
...@@ -115,6 +115,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret ...@@ -115,6 +115,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. 1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
...@@ -231,6 +232,7 @@ Flax), PyTorch, and/or TensorFlow. ...@@ -231,6 +232,7 @@ Flax), PyTorch, and/or TensorFlow.
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ | | Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ | | OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | | OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| OPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ | | Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ | | Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ |
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ | | PLBart | ✅ | ❌ | ✅ | ❌ | ❌ |
......
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# OPT
## Overview
The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI.
OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.
The abstract from the paper is the following:
*Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.*
Tips:
- OPT has the same architecture as [`BartDecoder`].
- Contrary to GPT2, OPT adds the EOS token `</s>` to the beginning of every prompt. **Note**: Make sure to pass `use_fast=False` when loading OPT's tokenizer with [`AutoTokenizer`] to get the correct tokenizer.
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Younes Belkada](https://huggingface.co/ybelkada), and [Patrick Von Platen](https://huggingface.co/patrickvonplaten).
The original code can be found [here](https://github.com/facebookresearch/metaseq).
## OPTConfig
[[autodoc]] OPTConfig
## OPTModel
[[autodoc]] OPTModel
- forward
## OPTForCausalLM
[[autodoc]] OPTForCausalLM
- forward
...@@ -247,6 +247,7 @@ _import_structure = { ...@@ -247,6 +247,7 @@ _import_structure = {
"NystromformerConfig", "NystromformerConfig",
], ],
"models.openai": ["OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OpenAIGPTConfig", "OpenAIGPTTokenizer"], "models.openai": ["OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OpenAIGPTConfig", "OpenAIGPTTokenizer"],
"models.opt": ["OPTConfig"],
"models.pegasus": ["PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP", "PegasusConfig", "PegasusTokenizer"], "models.pegasus": ["PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP", "PegasusConfig", "PegasusTokenizer"],
"models.perceiver": ["PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP", "PerceiverConfig", "PerceiverTokenizer"], "models.perceiver": ["PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP", "PerceiverConfig", "PerceiverTokenizer"],
"models.phobert": ["PhobertTokenizer"], "models.phobert": ["PhobertTokenizer"],
...@@ -1323,6 +1324,14 @@ else: ...@@ -1323,6 +1324,14 @@ else:
"load_tf_weights_in_openai_gpt", "load_tf_weights_in_openai_gpt",
] ]
) )
_import_structure["models.opt"].extend(
[
"OPT_PRETRAINED_MODEL_ARCHIVE_LIST",
"OPTForCausalLM",
"OPTModel",
"OPTPreTrainedModel",
]
)
_import_structure["models.pegasus"].extend( _import_structure["models.pegasus"].extend(
["PegasusForCausalLM", "PegasusForConditionalGeneration", "PegasusModel", "PegasusPreTrainedModel"] ["PegasusForCausalLM", "PegasusForConditionalGeneration", "PegasusModel", "PegasusPreTrainedModel"]
) )
...@@ -2373,7 +2382,6 @@ else: ...@@ -2373,7 +2382,6 @@ else:
"FlaxBartPreTrainedModel", "FlaxBartPreTrainedModel",
] ]
) )
_import_structure["models.beit"].extend( _import_structure["models.beit"].extend(
[ [
"FlaxBeitForImageClassification", "FlaxBeitForImageClassification",
...@@ -2382,6 +2390,7 @@ else: ...@@ -2382,6 +2390,7 @@ else:
"FlaxBeitPreTrainedModel", "FlaxBeitPreTrainedModel",
] ]
) )
_import_structure["models.bert"].extend( _import_structure["models.bert"].extend(
[ [
"FlaxBertForCausalLM", "FlaxBertForCausalLM",
...@@ -2718,6 +2727,7 @@ if TYPE_CHECKING: ...@@ -2718,6 +2727,7 @@ if TYPE_CHECKING:
from .models.mt5 import MT5Config from .models.mt5 import MT5Config
from .models.nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig from .models.nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig
from .models.openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig, OpenAIGPTTokenizer from .models.openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig, OpenAIGPTTokenizer
from .models.opt import OPTConfig
from .models.pegasus import PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP, PegasusConfig, PegasusTokenizer from .models.pegasus import PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP, PegasusConfig, PegasusTokenizer
from .models.perceiver import PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP, PerceiverConfig, PerceiverTokenizer from .models.perceiver import PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP, PerceiverConfig, PerceiverTokenizer
from .models.phobert import PhobertTokenizer from .models.phobert import PhobertTokenizer
...@@ -3630,6 +3640,7 @@ if TYPE_CHECKING: ...@@ -3630,6 +3640,7 @@ if TYPE_CHECKING:
OpenAIGPTPreTrainedModel, OpenAIGPTPreTrainedModel,
load_tf_weights_in_openai_gpt, load_tf_weights_in_openai_gpt,
) )
from .models.opt import OPT_PRETRAINED_MODEL_ARCHIVE_LIST, OPTForCausalLM, OPTModel, OPTPreTrainedModel
from .models.pegasus import ( from .models.pegasus import (
PegasusForCausalLM, PegasusForCausalLM,
PegasusForConditionalGeneration, PegasusForConditionalGeneration,
......
...@@ -87,6 +87,7 @@ from . import ( ...@@ -87,6 +87,7 @@ from . import (
mt5, mt5,
nystromformer, nystromformer,
openai, openai,
opt,
pegasus, pegasus,
perceiver, perceiver,
phobert, phobert,
......
...@@ -98,6 +98,7 @@ CONFIG_MAPPING_NAMES = OrderedDict( ...@@ -98,6 +98,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
("megatron-bert", "MegatronBertConfig"), ("megatron-bert", "MegatronBertConfig"),
("mpnet", "MPNetConfig"), ("mpnet", "MPNetConfig"),
("bart", "BartConfig"), ("bart", "BartConfig"),
("opt", "OPTConfig"),
("blenderbot", "BlenderbotConfig"), ("blenderbot", "BlenderbotConfig"),
("reformer", "ReformerConfig"), ("reformer", "ReformerConfig"),
("longformer", "LongformerConfig"), ("longformer", "LongformerConfig"),
...@@ -190,6 +191,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict( ...@@ -190,6 +191,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bert", "BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("bert", "BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bart", "BART_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("bart", "BART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("opt", "OPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("mbart", "MBART_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("mbart", "MBART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("openai-gpt", "OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("openai-gpt", "OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
...@@ -301,6 +303,7 @@ MODEL_NAMES_MAPPING = OrderedDict( ...@@ -301,6 +303,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("mbart", "mBART"), ("mbart", "mBART"),
("megatron-bert", "MegatronBert"), ("megatron-bert", "MegatronBert"),
("bart", "BART"), ("bart", "BART"),
("opt", "OPT"),
("reformer", "Reformer"), ("reformer", "Reformer"),
("longformer", "Longformer"), ("longformer", "Longformer"),
("roberta", "RoBERTa"), ("roberta", "RoBERTa"),
......
...@@ -93,6 +93,7 @@ MODEL_MAPPING_NAMES = OrderedDict( ...@@ -93,6 +93,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
("xlm-roberta-xl", "XLMRobertaXLModel"), ("xlm-roberta-xl", "XLMRobertaXLModel"),
("xlm-roberta", "XLMRobertaModel"), ("xlm-roberta", "XLMRobertaModel"),
("bart", "BartModel"), ("bart", "BartModel"),
("opt", "OPTModel"),
("longformer", "LongformerModel"), ("longformer", "LongformerModel"),
("roberta", "RobertaModel"), ("roberta", "RobertaModel"),
("data2vec-text", "Data2VecTextModel"), ("data2vec-text", "Data2VecTextModel"),
...@@ -261,6 +262,7 @@ MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict( ...@@ -261,6 +262,7 @@ MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
("xlm-prophetnet", "XLMProphetNetForCausalLM"), ("xlm-prophetnet", "XLMProphetNetForCausalLM"),
("prophetnet", "ProphetNetForCausalLM"), ("prophetnet", "ProphetNetForCausalLM"),
("bart", "BartForCausalLM"), ("bart", "BartForCausalLM"),
("opt", "OPTForCausalLM"),
("mbart", "MBartForCausalLM"), ("mbart", "MBartForCausalLM"),
("pegasus", "PegasusForCausalLM"), ("pegasus", "PegasusForCausalLM"),
("marian", "MarianForCausalLM"), ("marian", "MarianForCausalLM"),
......
...@@ -137,6 +137,7 @@ else: ...@@ -137,6 +137,7 @@ else:
("openai-gpt", ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None)), ("openai-gpt", ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None)),
("gpt2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)), ("gpt2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("gptj", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)), ("gptj", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("opt", ("GPT2Tokenizer", None)),
("transfo-xl", ("TransfoXLTokenizer", None)), ("transfo-xl", ("TransfoXLTokenizer", None)),
( (
"xlnet", "xlnet",
......
...@@ -162,20 +162,26 @@ class GPT2Tokenizer(PreTrainedTokenizer): ...@@ -162,20 +162,26 @@ class GPT2Tokenizer(PreTrainedTokenizer):
unk_token="<|endoftext|>", unk_token="<|endoftext|>",
bos_token="<|endoftext|>", bos_token="<|endoftext|>",
eos_token="<|endoftext|>", eos_token="<|endoftext|>",
pad_token=None,
add_prefix_space=False, add_prefix_space=False,
add_bos_token=False,
**kwargs **kwargs
): ):
bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token
eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token
unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token
pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token
super().__init__( super().__init__(
errors=errors, errors=errors,
unk_token=unk_token, unk_token=unk_token,
bos_token=bos_token, bos_token=bos_token,
eos_token=eos_token, eos_token=eos_token,
pad_token=pad_token,
add_prefix_space=add_prefix_space, add_prefix_space=add_prefix_space,
add_bos_token=add_bos_token,
**kwargs, **kwargs,
) )
self.add_bos_token = add_bos_token
with open(vocab_file, encoding="utf-8") as vocab_handle: with open(vocab_file, encoding="utf-8") as vocab_handle:
self.encoder = json.load(vocab_handle) self.encoder = json.load(vocab_handle)
...@@ -242,6 +248,19 @@ class GPT2Tokenizer(PreTrainedTokenizer): ...@@ -242,6 +248,19 @@ class GPT2Tokenizer(PreTrainedTokenizer):
self.cache[token] = word self.cache[token] = word
return word return word
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
if self.add_bos_token:
bos_token_ids = [self.bos_token_id]
else:
bos_token_ids = []
output = bos_token_ids + token_ids_0
if token_ids_1 is None:
return output
return output + bos_token_ids + token_ids_1
def _tokenize(self, text): def _tokenize(self, text):
"""Tokenize a string.""" """Tokenize a string."""
bpe_tokens = [] bpe_tokens = []
......
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2022 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ...utils import _LazyModule, is_tokenizers_available, is_torch_available
_import_structure = {
"configuration_opt": ["OPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OPTConfig"],
}
if is_torch_available():
_import_structure["modeling_opt"] = [
"OPT_PRETRAINED_MODEL_ARCHIVE_LIST",
"OPTForCausalLM",
"OPTModel",
"OPTPreTrainedModel",
]
if TYPE_CHECKING:
from .configuration_opt import OPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OPTConfig
if is_torch_available():
from .modeling_opt import OPT_PRETRAINED_MODEL_ARCHIVE_LIST, OPTForCausalLM, OPTModel, OPTPreTrainedModel
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
# coding=utf-8
# Copyright 2022 The Metaseq Authors and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" OPT model configuration"""
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
OPT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"facebook/opt-125m": "https://huggingface.co/facebook/opt-125m/blob/main/config.json",
"facebook/opt-350m": "https://huggingface.co/facebook/opt-350m/blob/main/config.json",
"facebook/opt-1.3b": "https://huggingface.co/facebook/opt-1.3b/blob/main/config.json",
"facebook/opt-2.7b": "https://huggingface.co/facebook/opt-2.7b/blob/main/config.json",
"facebook/opt-6.7b": "https://huggingface.co/facebook/opt-6.7b/blob/main/config.json",
"facebook/opt-13b": "https://huggingface.co/facebook/opt-13b/blob/main/config.json",
}
class OPTConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`OPTModel`]. It is used to instantiate a OPT model
according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the OPT
[facebook/opt-350m](https://huggingface.co/facebook/opt-350m) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 50272):
Vocabulary size of the OPT model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`OPTModel`]
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the layers and the pooler layer.
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of decoder layers.
ffn_dim (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer decoder.
activation_function (`str` or `function`, *optional*, defaults to `"relu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported.
max_position_embeddings (`int`, *optional*, defaults to 2048):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
do_layer_norm_before (`bool`, *optional*, defaults to `True`):
Whether to perform layer normalization before the attention block.
word_embed_proj_dim (`int`, *optional*):
`word_embed_proj_dim` can be set to down-project word embeddings, *e.g.* `opt-350m`. Defaults to
`hidden_size`.
dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
activation_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for activations inside the fully connected layer.
layerdrop: (`float`, *optional*, defaults to 0.0):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
Example:
```python
>>> from transformers import OPTModel, OPTConfig
>>> # Initializing a OPT facebook/opt-large style configuration
>>> configuration = OPTConfig()
>>> # Initializing a model from the facebook/opt-large style configuration
>>> model = OPTModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "opt"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=50272,
hidden_size=768,
num_hidden_layers=12,
ffn_dim=3072,
max_position_embeddings=2048,
do_layer_norm_before=True,
word_embed_proj_dim=None,
dropout=0.1,
attention_dropout=0.0,
activation_dropout=0.0,
num_attention_heads=12,
activation_function="relu",
layerdrop=0.0,
init_std=0.02,
use_cache=True,
pad_token_id=1,
bos_token_id=0,
eos_token_id=2,
**kwargs
):
super().__init__(
pad_token_id=pad_token_id,
bos_token_id=bos_token_id,
eos_token_id=eos_token_id,
**kwargs,
)
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.num_attention_heads = num_attention_heads
self.word_embed_proj_dim = word_embed_proj_dim if word_embed_proj_dim is not None else hidden_size
self.ffn_dim = ffn_dim
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.dropout = dropout
self.attention_dropout = attention_dropout
self.activation_dropout = activation_dropout
self.activation_function = activation_function
self.init_std = init_std
self.layerdrop = layerdrop
self.use_cache = use_cache
self.do_layer_norm_before = do_layer_norm_before
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Convert OPT checkpoint."""
import argparse
from pathlib import Path
import torch
from transformers import OPTConfig, OPTModel
from transformers.utils import logging
logging.set_verbosity_info()
logger = logging.get_logger(__name__)
def load_checkpoint(checkpoint_path):
"""Checkpoint path should end in model.pt"""
sd = torch.load(checkpoint_path, map_location="cpu")
if "model" in sd.keys():
sd = torch.load(checkpoint_path, map_location="cpu")["model"]
# pop unnecessary weights
keys_to_delete = [
"decoder.version",
"decoder.layer_norm.weight",
"decoder.layer_norm.bias",
"decoder.output_projection.weight",
]
for key in keys_to_delete:
if key in sd:
sd.pop(key)
keys_to_rename = {
"decoder.project_in_dim.weight": "decoder.project_in.weight",
"decoder.project_out_dim.weight": "decoder.project_out.weight",
}
for old_key, new_key in keys_to_rename.items():
if old_key in sd:
sd[new_key] = sd.pop(old_key)
return sd
@torch.no_grad()
def convert_opt_checkpoint(checkpoint_path, pytorch_dump_folder_path, config=None):
"""
Copy/paste/tweak model's weights to our BERT structure.
"""
state_dict = load_checkpoint(checkpoint_path)
if config is not None:
config = OPTConfig.from_pretrained(config)
else:
config = OPTConfig()
model = OPTModel(config).half().eval()
model.load_state_dict(state_dict)
# Check results
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# Required parameters
parser.add_argument(
"--fairseq_path",
type=str,
help="path to fairseq checkpoint in correct format. You can find all checkpoints in the correct format here: https://huggingface.co/models?other=opt_metasq",
)
parser.add_argument("--pytorch_dump_folder_path", default=None, type=str, help="Path to the output PyTorch model.")
parser.add_argument("--hf_config", default=None, type=str, help="Define HF config.")
args = parser.parse_args()
convert_opt_checkpoint(args.fairseq_path, args.pytorch_dump_folder_path, config=args.hf_config)
This diff is collapsed.
...@@ -2997,6 +2997,30 @@ def load_tf_weights_in_openai_gpt(*args, **kwargs): ...@@ -2997,6 +2997,30 @@ def load_tf_weights_in_openai_gpt(*args, **kwargs):
requires_backends(load_tf_weights_in_openai_gpt, ["torch"]) requires_backends(load_tf_weights_in_openai_gpt, ["torch"])
OPT_PRETRAINED_MODEL_ARCHIVE_LIST = None
class OPTForCausalLM(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class OPTModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class OPTPreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class PegasusForCausalLM(metaclass=DummyObject): class PegasusForCausalLM(metaclass=DummyObject):
_backends = ["torch"] _backends = ["torch"]
......
...@@ -175,6 +175,78 @@ class GPT2TokenizationTest(TokenizerTesterMixin, unittest.TestCase): ...@@ -175,6 +175,78 @@ class GPT2TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
padding="max_length", padding="max_length",
) )
def test_padding_if_pad_token_set_slow(self):
tokenizer = GPT2Tokenizer.from_pretrained(self.tmpdirname, pad_token="<pad>")
# Simple input
s = "This is a simple input"
s2 = ["This is a simple input looooooooong", "This is a simple input"]
p = ("This is a simple input", "This is a pair")
p2 = [
("This is a simple input loooooong", "This is a simple input"),
("This is a simple pair loooooong", "This is a simple pair"),
]
pad_token_id = tokenizer.pad_token_id
out_s = tokenizer(s, padding="max_length", max_length=30, return_tensors="np")
out_s2 = tokenizer(s2, padding=True, truncate=True, return_tensors="np")
out_p = tokenizer(*p, padding="max_length", max_length=60, return_tensors="np")
out_p2 = tokenizer(p2, padding=True, truncate=True, return_tensors="np")
# s
# test single string max_length padding
self.assertEqual(out_s["input_ids"].shape[-1], 30)
self.assertTrue(pad_token_id in out_s["input_ids"])
self.assertTrue(0 in out_s["attention_mask"])
# s2
# test automatic padding
self.assertEqual(out_s2["input_ids"].shape[-1], 33)
# long slice doesn't have padding
self.assertFalse(pad_token_id in out_s2["input_ids"][0])
self.assertFalse(0 in out_s2["attention_mask"][0])
# short slice does have padding
self.assertTrue(pad_token_id in out_s2["input_ids"][1])
self.assertTrue(0 in out_s2["attention_mask"][1])
# p
# test single pair max_length padding
self.assertEqual(out_p["input_ids"].shape[-1], 60)
self.assertTrue(pad_token_id in out_p["input_ids"])
self.assertTrue(0 in out_p["attention_mask"])
# p2
# test automatic padding pair
self.assertEqual(out_p2["input_ids"].shape[-1], 52)
# long slice pair doesn't have padding
self.assertFalse(pad_token_id in out_p2["input_ids"][0])
self.assertFalse(0 in out_p2["attention_mask"][0])
# short slice pair does have padding
self.assertTrue(pad_token_id in out_p2["input_ids"][1])
self.assertTrue(0 in out_p2["attention_mask"][1])
def test_add_bos_token_slow(self):
bos_token = "$$$"
tokenizer = GPT2Tokenizer.from_pretrained(self.tmpdirname, bos_token=bos_token, add_bos_token=True)
s = "This is a simple input"
s2 = ["This is a simple input 1", "This is a simple input 2"]
bos_token_id = tokenizer.bos_token_id
out_s = tokenizer(s)
out_s2 = tokenizer(s2)
self.assertEqual(out_s.input_ids[0], bos_token_id)
self.assertTrue(all(o[0] == bos_token_id for o in out_s2.input_ids))
decode_s = tokenizer.decode(out_s.input_ids)
decode_s2 = tokenizer.batch_decode(out_s2.input_ids)
self.assertEqual(decode_s.split()[0], bos_token)
self.assertTrue(all(d.split()[0] == bos_token for d in decode_s2))
# tokenizer has no padding token # tokenizer has no padding token
def test_padding_different_model_input_name(self): def test_padding_different_model_input_name(self):
pass pass
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment