"signatures/git@developer.sourcefind.cn:wangsen/mineru.git" did not exist on "3d969280ae3794541bca216de6743e1ecb09c8dd"
Unverified Commit 35575091 authored by Sagor Sarker's avatar Sagor Sarker Committed by GitHub
Browse files

added evaluation results for classification task (#7790)

parent bb9559a7
......@@ -49,6 +49,7 @@ Our final vocab file availabe at [https://github.com/sagorbrur/bangla-bert](http
## Evaluation Results
### LM Evaluation Results
After training 1 millions steps here is the evaluation resutls.
```
......@@ -61,9 +62,22 @@ next_sentence_loss = 0.040997364
perplexity = numpy.exp(2.2406516) = 9.393331287442784
Loss for final step: 2.426227
```
### Downstream Task Evaluation Results
Huge Thanks to [Nick Doiron](https://twitter.com/mapmeld) for providing evalution results of classification task.
He used [Bengali Classification Benchmark](https://github.com/rezacsedu/Classification_Benchmarks_Benglai_NLP) datasets for classification task.
Comparing to Nick's [Bengali electra](https://huggingface.co/monsoon-nlp/bangla-electra) and multi-lingual BERT, Bangla BERT Base achieves state of the art result.
Here is the [evaluation script](https://github.com/sagorbrur/bangla-bert/blob/master/notebook/bangla-bert-evaluation-classification-task.ipynb).
| Model | Sentiment Analysis | Hate Speech Task | News Topic Task | Average |
| ----- | -------------------| ---------------- | --------------- | ------- |
| mBERT | 68.15 | 52.32 | 72.27 | 64.25 |
| Bengali Electra | 69.19 | 44.84 | 82.33 | 65.45 |
| Bangla BERT Base | 70.37 | 71.83 | 89.19 | 77.13 |
**NB: If you use this model for any nlp task please share evaluation results with us. We will add it here.**
......@@ -73,8 +87,8 @@ You can use this model directly with a pipeline for masked language modeling:
```py
from transformers import BertForMaskedLM, BertTokenizer, pipeline
model = BertForMaskedLM.from_pretrained("bangla-bert-base")
tokenizer = BertTokenizer.from_pretrained("bangla-bert-base")
model = BertForMaskedLM.from_pretrained("sagorsarker/bangla-bert-base")
tokenizer = BertTokenizer.from_pretrained("sagorsarker/bangla-bert-base")
nlp = pipeline('fill-mask', model=model, tokenizer=tokenizer)
for pred in nlp(f"আমি বাংলায় {nlp.tokenizer.mask_token} গাই।"):
print(pred)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment