Unverified Commit aff9bc40 authored by Edward Beeching's avatar Edward Beeching Committed by GitHub
Browse files

Decision transformer gym (#15845)



* Created the Decision Transformer Modle

* updating tests, copy to other machine

* Added last hidden size to Decision Transformer modelling outputs

* Removed copy of original DT file

* made a temporary change to gpt2 to have it conform with the Decision Transformer version

* Updated tests

* Ignoring a file used to test the DT model

* added comments to config file

* added comments and argument descriptions to decision transformer file

* Updated doc

* Ran "make style"

* Remove old model imports

* Removed unused imports, cleaned up init file

* Update docs/source/model_doc/decision_transformer.mdx

added my username
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>

* Reverted changes made to gpt2

* Removed datasets submodule

* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states

* Added support for return of hidden states, attentions and return dict of gpt2 model.

* Updated tests to include many of the ModelTesterMixin tests. 

The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes

* Added missing line to the end of gpt2 file

* Added an integration test for the Decision Transformer

Test performs and autoregressive evaluation for two time steps

* Set done and info to _ to fix failing test

* Updated integration test to be deterministic and check expected outputs

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unnecessary config options

* Cleaned up commented code and old comments.

* Cleaned up commented code.

* Changed DecisionTransformer to Decision Transformer

* Added Decision Transformer to the main README file

* Added copy of GTP2 called DecisionTranformerGPT2Model

* isorted imports

* isorted imports

* Added model to non-English README files

* Ran make fix-copies and corrected some cases.

* Updated index file to include Decision Transformer

* Added gpt2 model as copy inside the Decision Transformer model file

* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS

* Deleted redundant checkpoint files (I don't know how these got committed)

* Removed testing files. (These should have never been committed)

* Removed accidentally committed files

* Moved the Decision Transformer test to its own directory

* Add type hints for Pegasus (#16324)

* Funnel type hints (#16323)

* add pt funnel type hints

* add tf funnel type hints

* Add type hints for ProphetNet PyTorch (#16272)

* [GLPN] Improve docs (#16331)

* Add link to notebook

* Add link

* Fix bug
Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Added type hints for Pytorch Marian calls (#16200)

* Added type hinting for forward functions in pytorch marian

* typo correction

* Removed type hints on functions from BART per Suraj Patil request

* fix import pb

* fix typo

* corrected tuple call

* ran black

* after fix-copies
Some optional tags on primitives were removed, past_key_values in MarianForCausalLM changed from Tuple of Tuple to List

* Fixing copies to roformer and pegasus
Co-authored-by: default avatarClementine Fourrier <cfourrie@inria.fr>
Co-authored-by: default avatarmatt <rocketknight1@gmail.com>

* Moved DecisionTransformOutput to modeling_decision_transformer

* Moved the example usage to research project and cleaned comments

* Made tests ignore the copy of gpt2 in Decision Transformer

* Added module output to modelling decision transformer

* removed copied gpt2 model from list of transformers models

* Updated tests and created __init__ file for new test location

* Update README.md
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unneeded summary type from config file

* Fixed copies

* Updated pretrained config map to refer to hopper-medium checkpoint

* done (#16340)

* Added Decision transformer to model docs

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add type annotations for Rembert/Splinter and copies (#16338)

* undo black autoformat

* minor fix to rembert forward with default

* make fix-copies, make quality

* Adding types to template model

* Removing List from the template types

* Remove `Optional` from a couple of types that don't accept `None`
Co-authored-by: default avatarmatt <rocketknight1@gmail.com>

* [Bug template] Shift responsibilities for long-range (#16344)

* Fix code repetition in serialization guide (#16346)

* Adopt framework-specific blocks for content (#16342)

*  refactor code samples with framework-specific blocks

*  update training.mdx

* 🖍

 apply feedback

* Updates the default branch from master to main (#16326)

* Updates the default branch from master to main

* Links from `master` to `main`

* Typo

* Update examples/flax/README.md
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updated model with custom docstring example

* Created the Decision Transformer Modle

* updating tests, copy to other machine

* Added last hidden size to Decision Transformer modelling outputs

* Removed copy of original DT file

* made a temporary change to gpt2 to have it conform with the Decision Transformer version

* Updated tests

* Ignoring a file used to test the DT model

* added comments to config file

* added comments and argument descriptions to decision transformer file

* Updated doc

* Ran "make style"

* Remove old model imports

* Removed unused imports, cleaned up init file

* Update docs/source/model_doc/decision_transformer.mdx

added my username
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>

* Reverted changes made to gpt2

* Removed datasets submodule

* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states

* Added support for return of hidden states, attentions and return dict of gpt2 model.

* Updated tests to include many of the ModelTesterMixin tests. 

The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes

* Added missing line to the end of gpt2 file

* Added an integration test for the Decision Transformer

Test performs and autoregressive evaluation for two time steps

* Set done and info to _ to fix failing test

* Updated integration test to be deterministic and check expected outputs

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unnecessary config options

* Cleaned up commented code and old comments.

* Cleaned up commented code.

* Changed DecisionTransformer to Decision Transformer

* Added Decision Transformer to the main README file

* Added copy of GTP2 called DecisionTranformerGPT2Model

* isorted imports

* isorted imports

* Added model to non-English README files

* Ran make fix-copies and corrected some cases.

* Updated index file to include Decision Transformer

* Added gpt2 model as copy inside the Decision Transformer model file

* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS

* Deleted redundant checkpoint files (I don't know how these got committed)

* Removed testing files. (These should have never been committed)

* Removed accidentally committed files

* Moved the Decision Transformer test to its own directory

* Moved DecisionTransformOutput to modeling_decision_transformer

* Moved the example usage to research project and cleaned comments

* Made tests ignore the copy of gpt2 in Decision Transformer

* Added module output to modelling decision transformer

* removed copied gpt2 model from list of transformers models

* Updated tests and created __init__ file for new test location

* Update README.md
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unneeded summary type from config file

* Fixed copies

* Updated pretrained config map to refer to hopper-medium checkpoint

* Added Decision transformer to model docs

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updated model with custom docstring example

* Updated copies, config auto, and readme files.
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarDan Tegzes <48134725+Tegzes@users.noreply.github.com>
Co-authored-by: default avatarAdam Montgomerie <adam@avanssion.com>
Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: default avatarClémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: default avatarClementine Fourrier <cfourrie@inria.fr>
Co-authored-by: default avatarmatt <rocketknight1@gmail.com>
Co-authored-by: default avatarFrancesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: default avatarJacob Dineen <54680234+jacobdineen@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarOmar Sanseviero <osanseviero@gmail.com>
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent c595b6e6
......@@ -252,7 +252,8 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DiT](https://huggingface.co/docs/transformers/main/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
......
......@@ -233,11 +233,12 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/main/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
......
......@@ -257,11 +257,12 @@ conda install -c huggingface transformers
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (来自 Facebook) 伴随论文 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (来自 Berkeley/Facebook/Google) 伴随论文 [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) 由 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch 发布。
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (来自 Facebook) 伴随论文 [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (来自 Facebook) 伴随论文 [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) 和德语版 DistilBERT。
1. **[DiT](https://huggingface.co/docs/transformers/main/model_doc/dit)** (来自 Microsoft Research) 伴随论文 [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) 由 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei 发布。
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (来自 Microsoft Research) 伴随论文 [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) 由 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei 发布。
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
......
......@@ -269,11 +269,12 @@ conda install -c huggingface transformers
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/main/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
......
......@@ -194,6 +194,8 @@
title: DeBERTa
- local: model_doc/deberta-v2
title: DeBERTa-v2
- local: model_doc/decision_transformer
title: Decision Transformer
- local: model_doc/deit
title: DeiT
- local: model_doc/detr
......
......@@ -78,6 +78,7 @@ conversion utilities for the following models.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
......@@ -191,6 +192,7 @@ Flax), PyTorch, and/or TensorFlow.
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ |
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
| DeBERTa-v2 | ✅ | ❌ | ✅ | ✅ | ❌ |
| Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ |
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ |
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
......
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Decision Transformer
## Overview
The Decision Transformer model was proposed in [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)
by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
The abstract from the paper is the following:
*We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem.
This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances
in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that
casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or
compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked
Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our
Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity,
Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on
Atari, OpenAI Gym, and Key-to-Door tasks.*
Tips:
This version of the model is for tasks where the state is a vector, image-based states will come soon.
This model was contributed by [edbeeching](https://huggingface.co/edbeeching). The original code can be found [here](https://github.com/kzl/decision-transformer).
## DecisionTransformerConfig
[[autodoc]] DecisionTransformerConfig
## DecisionTransformerGPT2Model
[[autodoc]] DecisionTransformerGPT2Model
- forward
## DecisionTransformerModel
[[autodoc]] DecisionTransformerModel
- forward
absl-py==1.0.0
aiohttp==3.8.1
aiosignal==1.2.0
alembic==1.7.7
appdirs==1.4.4
APScheduler==3.9.1
arrow==1.2.2
asttokens==2.0.5
astunparse==1.6.3
async-timeout==4.0.2
attrs==21.4.0
audioread==2.1.9
autopage==0.5.0
backcall==0.2.0
backoff==1.11.1
backports.zoneinfo==0.2.1
binaryornot==0.4.4
black==22.1.0
boto3==1.16.34
botocore==1.19.63
Brotli==1.0.9
cachetools==5.0.0
certifi==2021.10.8
cffi==1.15.0
chardet==4.0.0
charset-normalizer==2.0.12
chex==0.1.1
click==8.0.4
cliff==3.10.1
clldutils==3.11.1
cloudpickle==2.0.0
cmaes==0.8.2
cmd2==2.4.0
codecarbon==1.2.0
colorlog==6.6.0
cookiecutter==1.7.2
cryptography==36.0.2
csvw==2.0.0
cycler==0.11.0
Cython==0.29.28
dash==2.3.0
dash-bootstrap-components==1.0.3
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-table==5.0.0
datasets==2.0.0
decorator==5.1.1
Deprecated==1.2.13
dill==0.3.4
dlinfo==1.2.1
dm-tree==0.1.6
docker==4.4.4
execnet==1.9.0
executing==0.8.3
faiss-cpu==1.7.2
fasteners==0.17.3
filelock==3.6.0
fire==0.4.0
flake8==4.0.1
Flask==2.0.3
Flask-Compress==1.11
flatbuffers==2.0
flax==0.4.0
fonttools==4.31.1
frozenlist==1.3.0
fsspec==2022.2.0
fugashi==1.1.2
gast==0.5.3
gitdb==4.0.9
GitPython==3.1.18
glfw==2.5.1
google-auth==2.6.2
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
greenlet==1.1.2
grpcio==1.44.0
gym==0.23.1
gym-notices==0.0.6
h5py==3.6.0
huggingface-hub==0.4.0
hypothesis==6.39.4
idna==3.3
imageio==2.16.1
importlib-metadata==4.11.3
importlib-resources==5.4.0
iniconfig==1.1.1
ipadic==1.0.0
ipython==8.1.1
isodate==0.6.1
isort==5.10.1
itsdangerous==2.1.1
jax==0.3.4
jaxlib==0.3.2
jedi==0.18.1
Jinja2==2.11.3
jinja2-time==0.2.0
jmespath==0.10.0
joblib==1.1.0
jsonschema==4.4.0
keras==2.8.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.0
kubernetes==12.0.1
libclang==13.0.0
librosa==0.9.1
llvmlite==0.38.0
Mako==1.2.0
Markdown==3.3.6
MarkupSafe==1.1.1
matplotlib==3.5.1
matplotlib-inline==0.1.3
mccabe==0.6.1
msgpack==1.0.3
mujoco-py==2.1.2.14
multidict==6.0.2
multiprocess==0.70.12.2
mypy-extensions==0.4.3
nltk==3.7
numba==0.55.1
numpy==1.22.3
oauthlib==3.2.0
onnx==1.11.0
onnxconverter-common==1.9.0
opt-einsum==3.3.0
optax==0.1.1
optuna==2.10.0
packaging==21.3
pandas==1.4.1
parameterized==0.8.1
parso==0.8.3
pathspec==0.9.0
pbr==5.8.1
pexpect==4.8.0
phonemizer==3.0.1
pickleshare==0.7.5
Pillow==9.0.1
Pint==0.16.1
plac==1.3.4
platformdirs==2.5.1
plotly==5.6.0
pluggy==1.0.0
pooch==1.6.0
portalocker==2.0.0
poyo==0.5.0
prettytable==3.2.0
prompt-toolkit==3.0.28
protobuf==3.19.4
psutil==5.9.0
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
py-cpuinfo==8.0.0
pyarrow==7.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.8.0
pycparser==2.21
pyctcdecode==0.3.0
pyflakes==2.4.0
Pygments==2.11.2
pygtrie==2.4.2
pynvml==11.4.1
pyOpenSSL==22.0.0
pyparsing==3.0.7
pyperclip==1.8.2
pypng==0.0.21
pyrsistent==0.18.1
pytest==7.1.1
pytest-forked==1.4.0
pytest-timeout==2.1.0
pytest-xdist==2.5.0
python-dateutil==2.8.2
python-slugify==6.1.1
pytz==2022.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0
ray==1.11.0
redis==4.1.4
regex==2022.3.15
requests==2.27.1
requests-oauthlib==1.3.1
resampy==0.2.2
responses==0.18.0
rfc3986==1.5.0
rouge-score==0.0.4
rsa==4.8
s3transfer==0.3.7
sacrebleu==1.5.1
sacremoses==0.0.49
scikit-learn==1.0.2
scipy==1.8.0
segments==2.2.0
sentencepiece==0.1.96
sigopt==8.2.0
six==1.16.0
smmap==5.0.0
sortedcontainers==2.4.0
SoundFile==0.10.3.post1
SQLAlchemy==1.4.32
stack-data==0.2.0
stevedore==3.5.0
tabulate==0.8.9
tenacity==8.0.1
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorboardX==2.5
tensorflow==2.8.0
tensorflow-io-gcs-filesystem==0.24.0
termcolor==1.1.0
text-unidecode==1.3
tf-estimator-nightly==2.8.0.dev2021122109
tf2onnx==1.9.3
threadpoolctl==3.1.0
timeout-decorator==0.5.0
timm==0.5.4
tokenizers==0.11.6
tomli==2.0.1
toolz==0.11.2
torch==1.11.0
torchaudio==0.11.0
torchvision==0.12.0
tqdm==4.63.0
traitlets==5.1.1
-e git+git@github.com:edbeeching/transformers.git@77b90113ca0a0e4058b046796c874bdc98f1da61#egg=transformers
typing-extensions==4.1.1
tzdata==2022.1
tzlocal==4.1
unidic==1.1.0
unidic-lite==1.0.8
uritemplate==4.1.1
urllib3==1.26.9
wasabi==0.9.0
wcwidth==0.2.5
websocket-client==1.3.1
Werkzeug==2.0.3
wrapt==1.14.0
xxhash==3.0.0
yarl==1.7.2
zipp==3.7.0
\ No newline at end of file
import numpy as np
import torch
import gym
from mujoco_py import GlfwContext
from transformers import DecisionTransformerModel
GlfwContext(offscreen=True) # Create a window to init GLFW.
def get_action(model, states, actions, rewards, returns_to_go, timesteps):
# we don't care about the past rewards in this model
states = states.reshape(1, -1, model.config.state_dim)
actions = actions.reshape(1, -1, model.config.act_dim)
returns_to_go = returns_to_go.reshape(1, -1, 1)
timesteps = timesteps.reshape(1, -1)
if model.config.max_length is not None:
states = states[:, -model.config.max_length :]
actions = actions[:, -model.config.max_length :]
returns_to_go = returns_to_go[:, -model.config.max_length :]
timesteps = timesteps[:, -model.config.max_length :]
# pad all tokens to sequence length
attention_mask = torch.cat(
[torch.zeros(model.config.max_length - states.shape[1]), torch.ones(states.shape[1])]
)
attention_mask = attention_mask.to(dtype=torch.long, device=states.device).reshape(1, -1)
states = torch.cat(
[
torch.zeros(
(states.shape[0], model.config.max_length - states.shape[1], model.config.state_dim),
device=states.device,
),
states,
],
dim=1,
).to(dtype=torch.float32)
actions = torch.cat(
[
torch.zeros(
(actions.shape[0], model.config.max_length - actions.shape[1], model.config.act_dim),
device=actions.device,
),
actions,
],
dim=1,
).to(dtype=torch.float32)
returns_to_go = torch.cat(
[
torch.zeros(
(returns_to_go.shape[0], model.config.max_length - returns_to_go.shape[1], 1),
device=returns_to_go.device,
),
returns_to_go,
],
dim=1,
).to(dtype=torch.float32)
timesteps = torch.cat(
[
torch.zeros(
(timesteps.shape[0], model.config.max_length - timesteps.shape[1]), device=timesteps.device
),
timesteps,
],
dim=1,
).to(dtype=torch.long)
else:
attention_mask = None
_, action_preds, _ = model(
states=states,
actions=actions,
rewards=rewards,
returns_to_go=returns_to_go,
timesteps=timesteps,
attention_mask=attention_mask,
return_dict=False,
)
return action_preds[0, -1]
# build the environment
env = gym.make("Hopper-v3")
state_dim = env.observation_space.shape[0]
act_dim = env.action_space.shape[0]
max_ep_len = 1000
device = "cuda"
scale = 1000.0 # normalization for rewards/returns
TARGET_RETURN = 3600 / scale # evaluation conditioning targets, 3600 is reasonable from the paper LINK
state_mean = np.array(
[
1.311279,
-0.08469521,
-0.5382719,
-0.07201576,
0.04932366,
2.1066856,
-0.15017354,
0.00878345,
-0.2848186,
-0.18540096,
-0.28461286,
]
)
state_std = np.array(
[
0.17790751,
0.05444621,
0.21297139,
0.14530419,
0.6124444,
0.85174465,
1.4515252,
0.6751696,
1.536239,
1.6160746,
5.6072536,
]
)
state_mean = torch.from_numpy(state_mean).to(device=device)
state_std = torch.from_numpy(state_std).to(device=device)
# Create the decision transformer model
model = DecisionTransformerModel.from_pretrained("edbeeching/decision-transformer-gym-hopper-medium")
model = model.to(device)
model.eval()
for ep in range(10):
episode_return, episode_length = 0, 0
state = env.reset()
target_return = torch.tensor(TARGET_RETURN, device=device, dtype=torch.float32).reshape(1, 1)
states = torch.from_numpy(state).reshape(1, state_dim).to(device=device, dtype=torch.float32)
actions = torch.zeros((0, act_dim), device=device, dtype=torch.float32)
rewards = torch.zeros(0, device=device, dtype=torch.float32)
timesteps = torch.tensor(0, device=device, dtype=torch.long).reshape(1, 1)
for t in range(max_ep_len):
env.render()
# add padding
actions = torch.cat([actions, torch.zeros((1, act_dim), device=device)], dim=0)
rewards = torch.cat([rewards, torch.zeros(1, device=device)])
action = get_action(
model,
(states.to(dtype=torch.float32) - state_mean) / state_std,
actions.to(dtype=torch.float32),
rewards.to(dtype=torch.float32),
target_return.to(dtype=torch.float32),
timesteps.to(dtype=torch.long),
)
actions[-1] = action
action = action.detach().cpu().numpy()
state, reward, done, _ = env.step(action)
cur_state = torch.from_numpy(state).to(device=device).reshape(1, state_dim)
states = torch.cat([states, cur_state], dim=0)
rewards[-1] = reward
pred_return = target_return[0, -1] - (reward / scale)
target_return = torch.cat([target_return, pred_return.reshape(1, 1)], dim=1)
timesteps = torch.cat([timesteps, torch.ones((1, 1), device=device, dtype=torch.long) * (t + 1)], dim=1)
episode_return += reward
episode_length += 1
if done:
break
......@@ -173,6 +173,7 @@ _import_structure = {
"models.data2vec": ["DATA2VEC_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP", "Data2VecAudioConfig", "Data2VecTextConfig"],
"models.deberta": ["DEBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP", "DebertaConfig", "DebertaTokenizer"],
"models.deberta_v2": ["DEBERTA_V2_PRETRAINED_CONFIG_ARCHIVE_MAP", "DebertaV2Config"],
"models.decision_transformer": ["DECISION_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "DecisionTransformerConfig"],
"models.deit": ["DEIT_PRETRAINED_CONFIG_ARCHIVE_MAP", "DeiTConfig"],
"models.detr": ["DETR_PRETRAINED_CONFIG_ARCHIVE_MAP", "DetrConfig"],
"models.dialogpt": [],
......@@ -901,6 +902,15 @@ if is_torch_available():
"DebertaV2PreTrainedModel",
]
)
_import_structure["models.decision_transformer"].extend(
[
"DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"DecisionTransformerGPT2Model",
"DecisionTransformerGPT2PreTrainedModel",
"DecisionTransformerModel",
"DecisionTransformerPreTrainedModel",
]
)
_import_structure["models.deit"].extend(
[
"DEIT_PRETRAINED_MODEL_ARCHIVE_LIST",
......@@ -2509,6 +2519,10 @@ if TYPE_CHECKING:
from .models.data2vec import DATA2VEC_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP, Data2VecAudioConfig, Data2VecTextConfig
from .models.deberta import DEBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, DebertaConfig, DebertaTokenizer
from .models.deberta_v2 import DEBERTA_V2_PRETRAINED_CONFIG_ARCHIVE_MAP, DebertaV2Config
from .models.decision_transformer import (
DECISION_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
DecisionTransformerConfig,
)
from .models.deit import DEIT_PRETRAINED_CONFIG_ARCHIVE_MAP, DeiTConfig
from .models.detr import DETR_PRETRAINED_CONFIG_ARCHIVE_MAP, DetrConfig
from .models.distilbert import DISTILBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, DistilBertConfig, DistilBertTokenizer
......@@ -3128,6 +3142,13 @@ if TYPE_CHECKING:
DebertaV2Model,
DebertaV2PreTrainedModel,
)
from .models.decision_transformer import (
DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
DecisionTransformerGPT2Model,
DecisionTransformerGPT2PreTrainedModel,
DecisionTransformerModel,
DecisionTransformerPreTrainedModel,
)
from .models.deit import (
DEIT_PRETRAINED_MODEL_ARCHIVE_LIST,
DeiTForImageClassification,
......
......@@ -43,6 +43,7 @@ from . import (
data2vec,
deberta,
deberta_v2,
decision_transformer,
deit,
detr,
dialogpt,
......
......@@ -29,8 +29,10 @@ logger = logging.get_logger(__name__)
CONFIG_MAPPING_NAMES = OrderedDict(
[
# Add configs here
("decision_transformer", "DecisionTransformerConfig"),
("glpn", "GLPNConfig"),
("maskformer", "MaskFormerConfig"),
("decision_transformer", "DecisionTransformerConfig"),
("poolformer", "PoolFormerConfig"),
("convnext", "ConvNextConfig"),
("van", "VanConfig"),
......@@ -222,6 +224,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
MODEL_NAMES_MAPPING = OrderedDict(
[
# Add full (and cased) model names here
("decision_transformer", "Decision Transformer"),
("glpn", "GLPN"),
("maskformer", "MaskFormer"),
("poolformer", "PoolFormer"),
......
......@@ -28,8 +28,11 @@ logger = logging.get_logger(__name__)
MODEL_MAPPING_NAMES = OrderedDict(
[
# Base model mapping
("decision_transformer", "DecisionTransformerModel"),
("glpn", "GLPNModel"),
("maskformer", "MaskFormerModel"),
("decision_transformer", "DecisionTransformerModel"),
("decision_transformer_gpt2", "DecisionTransformerGPT2Model"),
("poolformer", "PoolFormerModel"),
("convnext", "ConvNextModel"),
("van", "VanModel"),
......
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2020 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
# rely on isort to merge the imports
from ...file_utils import _LazyModule, is_torch_available
_import_structure = {
"configuration_decision_transformer": [
"DECISION_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP",
"DecisionTransformerConfig",
],
}
if is_torch_available():
_import_structure["modeling_decision_transformer"] = [
"DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"DecisionTransformerGPT2Model",
"DecisionTransformerGPT2PreTrainedModel",
"DecisionTransformerModel",
"DecisionTransformerPreTrainedModel",
]
if TYPE_CHECKING:
from .configuration_decision_transformer import (
DECISION_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
DecisionTransformerConfig,
)
if is_torch_available():
from .modeling_decision_transformer import (
DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
DecisionTransformerGPT2Model,
DecisionTransformerGPT2PreTrainedModel,
DecisionTransformerModel,
DecisionTransformerPreTrainedModel,
)
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
# coding=utf-8
# Copyright 2022 The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Decision Transformer model configuration"""
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
DECISION_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"edbeeching/decision-transformer-gym-hopper-medium": "https://huggingface.co/edbeeching/decision-transformer-gym-hopper-medium/resolve/main/config.json",
# See all DecisionTransformer models at https://huggingface.co/models?filter=decision_transformer
}
class DecisionTransformerConfig(PretrainedConfig):
"""
This is the configuration class to store the configuration of a [`DecisionTransformerModel`]. It is used to
instantiate a Decision Transformer model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the standard
DecisionTransformer architecture. Many of the config options are used to instatiate the GPT2 model that is used as
part of the architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
state_dim (`int`, *optional*, defaults to 17):
The state size for the RL environment
act_dim (`int`, *optional*, defaults to 4):
The size of the output action space
hidden_size (`int`, *optional*, defaults to 128):
The size of the hidden layers
max_ep_len (`int`, *optional*, defaults to 4096):
The maximum length of an episode in the environment
action_tanh (`bool`, *optional*, defaults to True):
Whether to use a tanh activation on action prediction
vocab_size (`int`, *optional*, defaults to 50257):
Vocabulary size of the GPT-2 model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`DecisionTransformerModel`].
n_positions (`int`, *optional*, defaults to 1024):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
n_embd (`int`, *optional*, defaults to 768):
Dimensionality of the embeddings and hidden states.
n_layer (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
n_head (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
n_inner (`int`, *optional*):
Dimensionality of the inner feed-forward layers. If unset, will default to 4 times `n_embd`.
activation_function (`str`, *optional*, defaults to `"gelu"`):
Activation function, to be selected in the list `["relu", "silu", "gelu", "tanh", "gelu_new"]`.
resid_pdrop (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
embd_pdrop (`int`, *optional*, defaults to 0.1):
The dropout ratio for the embeddings.
attn_pdrop (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
The epsilon to use in the layer normalization layers.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
scale_attn_weights (`bool`, *optional*, defaults to `True`):
Scale attention weights by dividing by sqrt(hidden_size)..
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
scale_attn_by_inverse_layer_idx (`bool`, *optional*, defaults to `False`):
Whether to additionally scale attention weights by `1 / layer_idx + 1`.
reorder_and_upcast_attn (`bool`, *optional*, defaults to `False`):
Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention
dot-product/softmax to float() when training with mixed precision.
Example:
```python
>>> from transformers import DecisionTransformerModel, DecisionTransformerConfig
>>> # Initializing a DecisionTransformer configuration
>>> configuration = DecisionTransformerConfig()
>>> # Initializing a model from the configuration
>>> model = DecisionTransformerConfig(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "decision_transformer"
keys_to_ignore_at_inference = ["past_key_values"]
attribute_map = {
"max_position_embeddings": "n_positions",
"num_attention_heads": "n_head",
"num_hidden_layers": "n_layer",
}
def __init__(
self,
state_dim=17,
act_dim=4,
hidden_size=128,
max_ep_len=4096,
action_tanh=True,
vocab_size=1,
n_positions=1024,
n_embd=768,
n_layer=3,
n_head=1,
n_inner=None,
activation_function="relu",
resid_pdrop=0.1,
embd_pdrop=0.1,
attn_pdrop=0.1,
layer_norm_epsilon=1e-5,
initializer_range=0.02,
summary_type="cls_index",
summary_use_proj=True,
summary_activation=None,
summary_proj_to_labels=True,
summary_first_dropout=0.1,
scale_attn_weights=True,
use_cache=True,
bos_token_id=50256,
eos_token_id=50256,
scale_attn_by_inverse_layer_idx=False,
reorder_and_upcast_attn=False,
**kwargs,
):
self.state_dim = state_dim
self.act_dim = act_dim
self.hidden_size = hidden_size
self.max_ep_len = max_ep_len
self.action_tanh = action_tanh
self.vocab_size = vocab_size
self.n_positions = n_positions
self.n_embd = n_embd
self.n_layer = n_layer
self.n_head = n_head
self.n_inner = n_inner
self.activation_function = activation_function
self.resid_pdrop = resid_pdrop
self.embd_pdrop = embd_pdrop
self.attn_pdrop = attn_pdrop
self.layer_norm_epsilon = layer_norm_epsilon
self.initializer_range = initializer_range
self.summary_type = summary_type
self.summary_use_proj = summary_use_proj
self.summary_activation = summary_activation
self.summary_first_dropout = summary_first_dropout
self.summary_proj_to_labels = summary_proj_to_labels
self.scale_attn_weights = scale_attn_weights
self.use_cache = use_cache
self.scale_attn_by_inverse_layer_idx = scale_attn_by_inverse_layer_idx
self.reorder_and_upcast_attn = reorder_and_upcast_attn
self.bos_token_id = bos_token_id
self.eos_token_id = eos_token_id
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
......@@ -1425,6 +1425,37 @@ class DebertaV2PreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["torch"])
DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = None
class DecisionTransformerGPT2Model(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class DecisionTransformerGPT2PreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class DecisionTransformerModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class DecisionTransformerPreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
DEIT_PRETRAINED_MODEL_ARCHIVE_LIST = None
......
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Testing suite for the PyTorch DecisionTransformer model. """
import inspect
import unittest
from transformers import DecisionTransformerConfig, is_torch_available
from transformers.testing_utils import require_torch, slow, torch_device
from ..generation.test_generation_utils import GenerationTesterMixin
from ..test_configuration_common import ConfigTester
from ..test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask
if is_torch_available():
import torch
from transformers import DecisionTransformerModel
from transformers.models.decision_transformer.modeling_decision_transformer import (
DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
)
class DecisionTransformerModelTester:
def __init__(
self,
parent,
batch_size=13,
seq_length=7,
act_dim=6,
state_dim=17,
hidden_size=23,
max_length=11,
is_training=True,
):
self.parent = parent
self.batch_size = batch_size
self.seq_length = seq_length
self.act_dim = act_dim
self.state_dim = state_dim
self.hidden_size = hidden_size
self.max_length = max_length
self.is_training = is_training
def prepare_config_and_inputs(self):
states = floats_tensor((self.batch_size, self.seq_length, self.state_dim))
actions = floats_tensor((self.batch_size, self.seq_length, self.act_dim))
rewards = floats_tensor((self.batch_size, self.seq_length, 1))
returns_to_go = floats_tensor((self.batch_size, self.seq_length, 1))
timesteps = ids_tensor((self.batch_size, self.seq_length), vocab_size=1000)
attention_mask = random_attention_mask((self.batch_size, self.seq_length))
config = self.get_config()
return (
config,
states,
actions,
rewards,
returns_to_go,
timesteps,
attention_mask,
)
def get_config(self):
return DecisionTransformerConfig(
batch_size=self.batch_size,
seq_length=self.seq_length,
act_dim=self.act_dim,
state_dim=self.state_dim,
hidden_size=self.hidden_size,
max_length=self.max_length,
)
def create_and_check_model(
self,
config,
states,
actions,
rewards,
returns_to_go,
timesteps,
attention_mask,
):
model = DecisionTransformerModel(config=config)
model.to(torch_device)
model.eval()
result = model(states, actions, rewards, returns_to_go, timesteps, attention_mask)
self.parent.assertEqual(result.state_preds.shape, states.shape)
self.parent.assertEqual(result.action_preds.shape, actions.shape)
self.parent.assertEqual(result.return_preds.shape, returns_to_go.shape)
self.parent.assertEqual(
result.last_hidden_state.shape, (self.batch_size, self.seq_length * 3, self.hidden_size)
) # seq length *3 as there are 3 modelities: states, returns and actions
def prepare_config_and_inputs_for_common(self):
config_and_inputs = self.prepare_config_and_inputs()
(
config,
states,
actions,
rewards,
returns_to_go,
timesteps,
attention_mask,
) = config_and_inputs
inputs_dict = {
"states": states,
"actions": actions,
"rewards": rewards,
"returns_to_go": returns_to_go,
"timesteps": timesteps,
"attention_mask": attention_mask,
}
return config, inputs_dict
@require_torch
class DecisionTransformerModelTest(ModelTesterMixin, GenerationTesterMixin, unittest.TestCase):
all_model_classes = (DecisionTransformerModel,) if is_torch_available() else ()
all_generative_model_classes = ()
# Ignoring of a failing test from GenerationTesterMixin, as the model does not use inputs_ids
test_generate_without_input_ids = False
# Ignoring of a failing tests from ModelTesterMixin, as the model does not implement these features
test_pruning = False
test_resize_embeddings = False
test_head_masking = False
test_attention_outputs = False
test_hidden_states_output = False
test_inputs_embeds = False
test_model_common_attributes = False
test_gradient_checkpointing = False
def setUp(self):
self.model_tester = DecisionTransformerModelTester(self)
self.config_tester = ConfigTester(self, config_class=DecisionTransformerConfig, hidden_size=37)
def test_config(self):
self.config_tester.run_common_tests()
def test_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_model(*config_and_inputs)
@slow
def test_model_from_pretrained(self):
for model_name in DECISION_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
model = DecisionTransformerModel.from_pretrained(model_name)
self.assertIsNotNone(model)
def test_forward_signature(self):
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
for model_class in self.all_model_classes:
model = model_class(config)
signature = inspect.signature(model.forward)
# signature.parameters is an OrderedDict => so arg_names order is deterministic
arg_names = [*signature.parameters.keys()]
expected_arg_names = [
"states",
"actions",
"rewards",
"returns_to_go",
"timesteps",
"attention_mask",
]
self.assertListEqual(arg_names[: len(expected_arg_names)], expected_arg_names)
@require_torch
class DecisionTransformerModelIntegrationTest(unittest.TestCase):
@slow
def test_autoregressive_prediction(self):
"""
An integration test that performs autoregressive prediction of state, action and return
from a sequence of state, actions and returns. Test is performed over two timesteps.
"""
NUM_STEPS = 2 # number of steps of autoregressive prediction we will perform
TARGET_RETURN = 10 # defined by the RL environment, may be normalized
model = DecisionTransformerModel.from_pretrained("edbeeching/decision-transformer-gym-hopper-expert")
model = model.to(torch_device)
config = model.config
torch.manual_seed(0)
state = torch.randn(1, 1, config.state_dim).to(device=torch_device, dtype=torch.float32) # env.reset()
expected_outputs = torch.tensor([[0.2384, -0.2955, 0.8741], [0.6765, -0.0793, -0.1298]], device=torch_device)
returns_to_go = torch.tensor(TARGET_RETURN, device=torch_device, dtype=torch.float32).reshape(1, 1, 1)
states = state
actions = torch.zeros(1, 0, config.act_dim, device=torch_device, dtype=torch.float32)
rewards = torch.zeros(1, 0, device=torch_device, dtype=torch.float32)
timesteps = torch.tensor(0, device=torch_device, dtype=torch.long).reshape(1, 1)
for step in range(NUM_STEPS):
actions = torch.cat([actions, torch.zeros(1, 1, config.act_dim, device=torch_device)], dim=1)
rewards = torch.cat([rewards, torch.zeros(1, 1, device=torch_device)], dim=1)
attention_mask = torch.ones(1, states.shape[1]).to(dtype=torch.long, device=states.device)
with torch.no_grad():
_, action_pred, _ = model(
states=states,
actions=actions,
rewards=rewards,
returns_to_go=returns_to_go,
timesteps=timesteps,
attention_mask=attention_mask,
return_dict=False,
)
self.assertEqual(action_pred.shape, actions.shape)
self.assertTrue(torch.allclose(action_pred[0, -1], expected_outputs[step], atol=1e-4))
state, reward, _, _ = ( # env.step(action)
torch.randn(1, 1, config.state_dim).to(device=torch_device, dtype=torch.float32),
1.0,
False,
{},
)
actions[-1] = action_pred[0, -1]
states = torch.cat([states, state], dim=1)
pred_return = returns_to_go[0, -1] - reward
returns_to_go = torch.cat([returns_to_go, pred_return.reshape(1, 1, 1)], dim=1)
timesteps = torch.cat(
[timesteps, torch.ones((1, 1), device=torch_device, dtype=torch.long) * (step + 1)], dim=1
)
......@@ -45,6 +45,7 @@ PRIVATE_MODELS = [
# Being in this list is an exception and should **not** be the rule.
IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
# models to ignore for not tested
"DecisionTransformerGPT2Model", # Building part of bigger (tested) model.
"SegformerDecodeHead", # Building part of bigger (tested) model.
"PLBartEncoder", # Building part of bigger (tested) model.
"PLBartDecoder", # Building part of bigger (tested) model.
......@@ -95,6 +96,7 @@ IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
# Update this list with test files that don't have a tester with a `all_model_classes` variable and which don't
# trigger the common tests.
TEST_FILES_WITH_NO_COMMON_TESTS = [
"decision_transformer/test_modeling_decision_transformer.py",
"camembert/test_modeling_camembert.py",
"mt5/test_modeling_flax_mt5.py",
"mbart/test_modeling_mbart.py",
......@@ -108,12 +110,14 @@ TEST_FILES_WITH_NO_COMMON_TESTS = [
"xlm_roberta/test_modeling_xlm_roberta.py",
"vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py",
"vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py",
"decision_transformer/test_modeling_decision_transformer.py",
]
# Update this list for models that are not in any of the auto MODEL_XXX_MAPPING. Being in this list is an exception and
# should **not** be the rule.
IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
# models to ignore for model xxx mapping
"DecisionTransformerGPT2Model",
"GLPNForDepthEstimation",
"ViltForQuestionAnswering",
"ViltForImagesAndTextClassification",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment