[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/jannesg/takalane_sot_roberta/README.md
+++ b/model_cards/jannesg/takalane_sot_roberta/README.md
---
-language: 
- sot
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- sot
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Southern Sotho 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_sot_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_sot_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 20000
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jannesg/takalane_ssw_roberta/README.md
+++ b/model_cards/jannesg/takalane_ssw_roberta/README.md
---
-language: 
- tn
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- tn
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Tswana 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_ssw_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_ssw_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 380
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jannesg/takalane_tsn_roberta/README.md
+++ b/model_cards/jannesg/takalane_tsn_roberta/README.md
---
-language: 
- tn
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- tn
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Tswana 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_tsn_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_tsn_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 10000
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jannesg/takalane_tso_roberta/README.md
+++ b/model_cards/jannesg/takalane_tso_roberta/README.md
---
-language: 
- ts
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- ts
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Tsonga 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_tso_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_tso_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 20000
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jannesg/takalane_ven_roberta/README.md
+++ b/model_cards/jannesg/takalane_ven_roberta/README.md
---
-language: 
- ven
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- ven
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Venda 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_ven_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_ven_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 9279
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jannesg/takalane_xho_roberta/README.md
+++ b/model_cards/jannesg/takalane_xho_roberta/README.md
---
-language: 
- xho
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- xho
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Xhosa 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_xho_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_xho_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 100000
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jannesg/takalane_zul_roberta/README.md
+++ b/model_cards/jannesg/takalane_zul_roberta/README.md
---
-language: 
- zul
-thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
-tags:
- zul
- fill-mask
- pytorch
- roberta
- masked-lm
-license: MIT
---
-
-# Takalani Sesame - Zulu 🇿🇦
-
-<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
-
-## Model description
-
-Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_zul_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_zul_roberta")
-```
-
-#### Limitations and bias
-
-Updates will be added continously to improve performance. 
-
-## Training data
-
-Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
-**Sentences:** 410000
-
-## Training procedure
-
-No preprocessing. Standard Huggingface hyperparameters. 
-
-## Author
-
-Jannes Germishuys [website](http://jannesgg.github.io)
--- a/model_cards/jcblaise/bert-tagalog-base-cased-WWM/README.md
+++ b/model_cards/jcblaise/bert-tagalog-base-cased-WWM/README.md
---
-language: tl
-tags:
- bert
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# BERT Tagalog Base Cased (Whole Word Masking)
-Tagalog version of BERT trained on a large preprocessed text corpus scraped and sourced from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community. This particular version uses whole word masking.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@inproceedings{localization2020cruz,
-  title={{Localization of Fake News Detection via Multitask Transfer Learning}},
-  author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth},
-  booktitle={Proceedings of The 12th Language Resources and Evaluation Conference},
-  pages={2589--2597},
-  year={2020},
-  url={https://www.aclweb.org/anthology/2020.lrec-1.315}
-}
-
-@article{cruz2020establishing,
-  title={Establishing Baselines for Text Classification in Low-Resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:2005.02068},
-  year={2020}
-}
-
-@article{cruz2019evaluating,
-  title={Evaluating Language Model Finetuning Techniques for Low-resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:1907.00409},
-  year={2019}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/bert-tagalog-base-cased/README.md
+++ b/model_cards/jcblaise/bert-tagalog-base-cased/README.md
---
-language: tl
-tags:
- bert
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# BERT Tagalog Base Cased
-Tagalog version of BERT trained on a large preprocessed text corpus scraped and sourced from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/bert-tagalog-base-cased', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-cased', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/bert-tagalog-base-cased')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-cased', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@inproceedings{localization2020cruz,
-  title={{Localization of Fake News Detection via Multitask Transfer Learning}},
-  author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth},
-  booktitle={Proceedings of The 12th Language Resources and Evaluation Conference},
-  pages={2589--2597},
-  year={2020},
-  url={https://www.aclweb.org/anthology/2020.lrec-1.315}
-}
-
-@article{cruz2020establishing,
-  title={Establishing Baselines for Text Classification in Low-Resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:2005.02068},
-  year={2020}
-}
-
-@article{cruz2019evaluating,
-  title={Evaluating Language Model Finetuning Techniques for Low-resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:1907.00409},
-  year={2019}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/bert-tagalog-base-uncased-WWM/README.md
+++ b/model_cards/jcblaise/bert-tagalog-base-uncased-WWM/README.md
---
-language: tl
-tags:
- bert
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# BERT Tagalog Base Uncased (Whole Word Masking)
-Tagalog version of BERT trained on a large preprocessed text corpus scraped and sourced from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community. This particular version uses whole word masking.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/bert-tagalog-base-uncased-WWM', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-uncased-WWM', do_lower_case=True)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/bert-tagalog-base-uncased-WWM')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-uncased-WWM', do_lower_case=True)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@inproceedings{localization2020cruz,
-  title={{Localization of Fake News Detection via Multitask Transfer Learning}},
-  author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth},
-  booktitle={Proceedings of The 12th Language Resources and Evaluation Conference},
-  pages={2589--2597},
-  year={2020},
-  url={https://www.aclweb.org/anthology/2020.lrec-1.315}
-}
-
-@article{cruz2020establishing,
-  title={Establishing Baselines for Text Classification in Low-Resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:2005.02068},
-  year={2020}
-}
-
-@article{cruz2019evaluating,
-  title={Evaluating Language Model Finetuning Techniques for Low-resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:1907.00409},
-  year={2019}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/bert-tagalog-base-uncased/README.md
+++ b/model_cards/jcblaise/bert-tagalog-base-uncased/README.md
---
-language: tl
-tags:
- bert
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# BERT Tagalog Base Uncased
-Tagalog version of BERT trained on a large preprocessed text corpus scraped and sourced from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/bert-tagalog-base-uncased', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-uncased', do_lower_case=True)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/bert-tagalog-base-uncased')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-uncased', do_lower_case=True)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@inproceedings{localization2020cruz,
-  title={{Localization of Fake News Detection via Multitask Transfer Learning}},
-  author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth},
-  booktitle={Proceedings of The 12th Language Resources and Evaluation Conference},
-  pages={2589--2597},
-  year={2020},
-  url={https://www.aclweb.org/anthology/2020.lrec-1.315}
-}
-
-@article{cruz2020establishing,
-  title={Establishing Baselines for Text Classification in Low-Resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:2005.02068},
-  year={2020}
-}
-
-@article{cruz2019evaluating,
-  title={Evaluating Language Model Finetuning Techniques for Low-resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:1907.00409},
-  year={2019}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/distilbert-tagalog-base-cased/README.md
+++ b/model_cards/jcblaise/distilbert-tagalog-base-cased/README.md
---
-language: tl
-tags:
- distilbert
- bert
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# DistilBERT Tagalog Base Cased
-Tagalog version of DistilBERT, distilled from [`bert-tagalog-base-cased`](https://huggingface.co/jcblaise/bert-tagalog-base-cased). This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/distilbert-tagalog-base-cased', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/distilbert-tagalog-base-cased', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/distilbert-tagalog-base-cased')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/distilbert-tagalog-base-cased', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@inproceedings{localization2020cruz,
-  title={{Localization of Fake News Detection via Multitask Transfer Learning}},
-  author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth},
-  booktitle={Proceedings of The 12th Language Resources and Evaluation Conference},
-  pages={2589--2597},
-  year={2020},
-  url={https://www.aclweb.org/anthology/2020.lrec-1.315}
-}
-
-@article{cruz2020establishing,
-  title={Establishing Baselines for Text Classification in Low-Resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:2005.02068},
-  year={2020}
-}
-
-@article{cruz2019evaluating,
-  title={Evaluating Language Model Finetuning Techniques for Low-resource Languages},
-  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
-  journal={arXiv preprint arXiv:1907.00409},
-  year={2019}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-base-cased-discriminator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-base-cased-discriminator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Base Cased Discriminator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the discriminator model, which is the main Transformer used for finetuning to downstream tasks. For generation, mask-filling, and retraining, refer to the Generator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-base-cased-discriminator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-cased-discriminator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-base-cased-discriminator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-cased-discriminator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-base-cased-generator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-base-cased-generator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Base Cased Generator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-base-cased-generator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-cased-generator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-base-cased-generator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-cased-generator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-base-uncased-discriminator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-base-uncased-discriminator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Base Uncased Discriminator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the discriminator model, which is the main Transformer used for finetuning to downstream tasks. For generation, mask-filling, and retraining, refer to the Generator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-base-uncased-discriminator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-uncased-discriminator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-base-uncased-discriminator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-uncased-discriminator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-base-uncased-generator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-base-uncased-generator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Base Uncased Generator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-base-uncased-generator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-uncased-generator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-base-uncased-generator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-base-uncased-generator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-small-cased-discriminator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-small-cased-discriminator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Small Cased Discriminator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the discriminator model, which is the main Transformer used for finetuning to downstream tasks. For generation, mask-filling, and retraining, refer to the Generator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-small-cased-discriminator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-cased-discriminator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-small-cased-discriminator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-cased-discriminator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-small-cased-generator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-small-cased-generator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Small Cased Generator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-small-cased-generator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-cased-generator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-small-cased-generator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-cased-generator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-small-uncased-discriminator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-small-uncased-discriminator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Small Uncased Discriminator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the discriminator model, which is the main Transformer used for finetuning to downstream tasks. For generation, mask-filling, and retraining, refer to the Generator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-discriminator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-discriminator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-discriminator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-discriminator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
--- a/model_cards/jcblaise/electra-tagalog-small-uncased-generator/README.md
+++ b/model_cards/jcblaise/electra-tagalog-small-uncased-generator/README.md
---
-language: tl
-tags:
- electra
- tagalog
- filipino
-license: gpl-3.0
-inference: false
---
-
-# ELECTRA Tagalog Small Uncased Generator
-Tagalog ELECTRA model pretrained with a large corpus scraped from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
-
-This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
-
-## Usage
-The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
-
-```python
-from transformers import TFAutoModel, AutoModel, AutoTokenizer
-
-# TensorFlow
-model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', from_pt=True)
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', do_lower_case=False)
-
-# PyTorch
-model = AutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator')
-tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', do_lower_case=False)
-```
-Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
-
-## Citations
-All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
-
-```
-@article{cruz2020investigating,
-  title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
-  author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
-  journal={arXiv preprint arXiv:2010.11574},
-  year={2020}
-}
-```
-
-## Data and Other Resources
-Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
-
-## Contact
-If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph