README.md 1.61 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language:
- bn
datasets:
- socian
- bangla-sentiment-benchmark
license: mit
tags:
- bengali
- bengali-sentiment
- sentiment-analysis
---

# bangla-bert-sentiment
`bangla-bert-sentiment` is a pretrained model for bengali **Sentiment Analysis** using [bangla-bert-base](https://huggingface.co/sagorsarker/bangla-bert-base) model.

## Datasets Details
This model was trained with two combined datasets
* [socian sentiment data](https://github.com/socian-ai/socian-bangla-sentiment-dataset-labeled)
* [bangla classification dataset](https://github.com/rezacsedu/Classification_Benchmarks_Benglai_NLP)

|||
|--|--|
|Data Size| 10889 |
|Positive| 4999 |
|Negative| 5890 |
|Train | 8711 |
| Test | 2178 |

## Training Details
Model trained with [simpletransformers](https://github.com/ThilinaRajapakse/simpletransformers) binary classification script with total of **3 epochs** in `google colab gpu`.


## Evaluation Details
Model evaluate with 2178 sentences

Here is the evaluation result details in table


|Eval Loss | TP | TN | FP | FN | F1 Score |
| -------- | -- | -- | -- | -- | -------- |
| 0.3289 | 880 | 1158 | 59 | 81 | 92.63 |

## Usage

Calculate sentiment from given sentence

```py

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("sagorsarker/bangla-bert-sentiment")

model = AutoModelForSequenceClassification.from_pretrained("sagorsarker/bangla-bert-sentiment")

nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
sentence = "বাংলার ঘরে ঘরে আজ নবান্নের উৎসব"
nlp(sentence)

```