add BERT trained from review corpus. (#4405)

* add model_cards for BERT trained on reviews. * add link to repository. * refine README.md for each review model

add BERT trained from review corpus. (#4405)
* add model_cards for BERT trained on reviews. * add link to repository. * refine README.md for each review model
9907dc52 · Hu Xu · GitHub · efbc1c5a · 9907dc52 · 9907dc52
Unverified Commit 9907dc52 authored May 20, 2020 by Hu Xu Committed by GitHub May 20, 2020
6 changed files
--- a/model_cards/activebus/BERT-DK_laptop/README.md
+++ b/model_cards/activebus/BERT-DK_laptop/README.md
+# ReviewBERT
+
+BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
+
+`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`. 
+
+
+## Model Description
+
+The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
+Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
+
+`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`. 
+
+## Instructions
+Loading the post-trained weights are as simple as, e.g., 
+
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-DK_laptop")
+model = AutoModel.from_pretrained("activebus/BERT-DK_laptop")
+
+```
+
+
+## Evaluation Results
+
+Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
+
+
+## Citation
+If you find this work useful, please cite as following.
+```
+@inproceedings{xu_bert2019,
+    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
+    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
+    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
+    month = "jun",
+    year = "2019",
+}
+```
--- a/model_cards/activebus/BERT-DK_rest/README.md
+++ b/model_cards/activebus/BERT-DK_rest/README.md
+# ReviewBERT
+
+BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
+
+`BERT-DK_rest` is trained from 1G (19 types) restaurants from Yelp.  
+
+## Model Description
+
+The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
+Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
+
+
+## Instructions
+Loading the post-trained weights are as simple as, e.g., 
+
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-DK_rest")
+model = AutoModel.from_pretrained("activebus/BERT-DK_rest")
+
+```
+
+
+## Evaluation Results
+
+Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
+
+
+## Citation
+If you find this work useful, please cite as following.
+```
+@inproceedings{xu_bert2019,
+    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
+    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
+    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
+    month = "jun",
+    year = "2019",
+}
+```
--- a/model_cards/activebus/BERT-PT_laptop/README.md
+++ b/model_cards/activebus/BERT-PT_laptop/README.md
+# ReviewBERT
+
+BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
+
+`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`. 
+`BERT-PT_*` addtionally uses SQuAD 1.1.  
+
+## Model Description
+
+The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
+Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
+
+
+## Instructions
+Loading the post-trained weights are as simple as, e.g., 
+
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-PT_laptop")
+model = AutoModel.from_pretrained("activebus/BERT-PT_laptop")
+
+```
+
+## Evaluation Results
+
+Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
+
+
+## Citation
+If you find this work useful, please cite as following.
+```
+@inproceedings{xu_bert2019,
+    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
+    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
+    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
+    month = "jun",
+    year = "2019",
+}
+```
--- a/model_cards/activebus/BERT-PT_rest/README.md
+++ b/model_cards/activebus/BERT-PT_rest/README.md
+# ReviewBERT
+
+BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
+
+`BERT-DK_rest` is trained from 1G (19 types) restaurants from Yelp.
+`BERT-PT_*` addtionally uses SQuAD 1.1.  
+
+## Model Description
+
+The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
+Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
+
+
+## Instructions
+Loading the post-trained weights are as simple as, e.g., 
+
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-PT_rest")
+model = AutoModel.from_pretrained("activebus/BERT-PT_rest")
+
+```
+
+
+## Evaluation Results
+
+Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
+
+
+## Citation
+If you find this work useful, please cite as following.
+```
+@inproceedings{xu_bert2019,
+    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
+    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
+    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
+    month = "jun",
+    year = "2019",
+}
+```
--- a/model_cards/activebus/BERT-XD_Review/README.md
+++ b/model_cards/activebus/BERT-XD_Review/README.md
+# ReviewBERT
+
+BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
+Please visit https://github.com/howardhsu/BERT-for-RRC-ABSA for details.  
+
+`BERT-XD_Review` is a cross-domain (beyond just `laptop` and `restaurant`) language model, where each example is from a single product / restaurant with the same rating, post-trained (fine-tuned) on a combination of 5-core Amazon reviews and all Yelp data, expected to be 22 G in total. It is trained for 4 epochs on `bert-base-uncased`.
+The preprocessing code [here](https://github.com/howardhsu/BERT-for-RRC-ABSA/transformers).
+
+## Model Description
+
+The original model is from `BERT-base-uncased`.  
+Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
+
+
+## Instructions
+Loading the post-trained weights are as simple as, e.g., 
+
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-XD_Review")
+model = AutoModel.from_pretrained("activebus/BERT-XD_Review")
+
+```
+
+
+## Evaluation Results
+
+Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
+`BERT_Review` is expected to have similar performance on domain-specific tasks (such as aspect extraction) as `BERT-DK`, but much better on general tasks such as aspect sentiment classification (different domains mostly share similar sentiment words).
+
+
+## Citation
+If you find this work useful, please cite as following.
+```
+@inproceedings{xu_bert2019,
+    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
+    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
+    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
+    month = "jun",
+    year = "2019",
+}
+```
--- a/model_cards/activebus/BERT_Review/README.md
+++ b/model_cards/activebus/BERT_Review/README.md
+# ReviewBERT
+
+BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
+
+`BERT_Review` is cross-domain (beyond just `laptop` and `restaurant`) language model with one example from randomly mixed domains, post-trained (fine-tuned) on a combination of 5-core Amazon reviews and all Yelp data, expected to be 22 G in total. It is trained for 4 epochs on `bert-base-uncased`.
+The preprocessing code [here](https://github.com/howardhsu/BERT-for-RRC-ABSA/transformers).
+
+
+## Model Description
+
+The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
+Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
+
+
+## Instructions
+Loading the post-trained weights are as simple as, e.g., 
+
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("activebus/BERT_Review")
+model = AutoModel.from_pretrained("activebus/BERT_Review")
+
+```
+
+
+## Evaluation Results
+
+Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
+`BERT_Review` is expected to have similar performance on domain-specific tasks (such as aspect extraction) as `BERT-DK`, but much better on general tasks such as aspect sentiment classification (different domains mostly share similar sentiment words).
+
+
+## Citation
+If you find this work useful, please cite as following.
+```
+@inproceedings{xu_bert2019,
+    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
+    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
+    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
+    month = "jun",
+    year = "2019",
+}
+```