README.md 1.75 KB
Newer Older
Typicasoft's avatar
Typicasoft committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
language: fr
widget:
- text: "Je m'appelle Hicham et je vis a F猫s"
---

# MagBERT-NER: a state-of-the-art NER model for Moroccan French language (Maghreb)

## Introduction

[MagBERT-NER] is a state-of-the-art NER model for Moroccan French language (Maghreb). The MagBERT-NER model was fine-tuned for NER Task based the language model for French Camembert (based on the RoBERTa architecture).

For further information or requests, please go to [Typica.AI Website](https://typicasoft.io/)

## How to use MagBERT-NER with HuggingFace

##### Load MagBERT-NER and its sub-word tokenizer :
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("TypicaAI/magbert-ner")
model = AutoModelForTokenClassification.from_pretrained("TypicaAI/magbert-ner")


##### Process text sample (from wikipedia about the current Prime Minister of Morocco) Using NER pipeline  

from transformers import pipeline

nlp = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
nlp("Saad Dine El Otmani, n茅 le 16 janvier 1956 脿 Inezgane, est un homme d'脡tat marocain, chef du gouvernement du Maroc depuis le 5 avril 2017")


#[{'entity_group': 'I-PERSON',
#  'score': 0.8941445276141167,
#  'word': 'Saad Dine El Otmani'},
# {'entity_group': 'B-DATE',
#  'score': 0.5967703461647034,
#  'word': '16 janvier 1956'},
# {'entity_group': 'B-GPE', 'score': 0.7160899192094803, 'word': 'Inezgane'},
# {'entity_group': 'B-NORP', 'score': 0.7971733212471008, 'word': 'marocain'},
# {'entity_group': 'B-GPE', 'score': 0.8921478390693665, 'word': 'Maroc'},
# {'entity_group': 'B-DATE',
#  'score': 0.5760444005330404,
#  'word': '5 avril 2017'}]

```

```


## Authors 

MagBert-NER was trained and evaluated by Hicham Assoudi, Ph.D.