Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
ed5456da
Unverified
Commit
ed5456da
authored
May 20, 2020
by
Manuel Romero
Committed by
GitHub
May 20, 2020
Browse files
Model card for RuPERTa-base fine-tuned for NER (#4466)
parent
c76450e2
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
92 additions
and
0 deletions
+92
-0
model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md
model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md
+92
-0
No files found.
model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md
0 → 100644
View file @
ed5456da
---
language
:
spanish
thumbnail
:
---
# RuPERTa-base (Spanish RoBERTa) + NER 🎃🏷
This model is a fine-tuned on
[
NER-C
](
https://www.kaggle.com/nltkdata/conll-corpora
)
version of
[
RuPERTa-base
](
https://huggingface.co/mrm8488/RuPERTa-base
)
for
**NER**
downstream task.
## Details of the downstream task (NER) - Dataset
-
[
Dataset: CONLL Corpora ES
](
https://www.kaggle.com/nltkdata/conll-corpora
)
📚
| Dataset | # Examples |
| ---------------------- | ----- |
| Train | 329 K |
| Dev | 40 K |
-
[
Fine-tune on NER script provided by Huggingface
](
https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner.py
)
-
Labels covered:
```
B-LOC
B-MISC
B-ORG
B-PER
I-LOC
I-MISC
I-ORG
I-PER
O
```
## Metrics on evaluation set 🧾
| Metric | # score |
| :------------------------------------------------------------------------------------: | :-------: |
| F1 |
**77.55**
| Precision |
**75.53**
|
| Recall |
**79.68**
|
## Model in action 🔨
Example of usage:
```
python
import
torch
from
transformers
import
AutoModelForTokenClassification
,
AutoTokenizer
id2label
=
{
"0"
:
"B-LOC"
,
"1"
:
"B-MISC"
,
"2"
:
"B-ORG"
,
"3"
:
"B-PER"
,
"4"
:
"I-LOC"
,
"5"
:
"I-MISC"
,
"6"
:
"I-ORG"
,
"7"
:
"I-PER"
,
"8"
:
"O"
}
text
=
"Julien, CEO de HF, nació en Francia."
input_ids
=
torch
.
tensor
(
tokenizer
.
encode
(
text
)).
unsqueeze
(
0
)
outputs
=
model
(
input_ids
)
last_hidden_states
=
outputs
[
0
]
for
m
in
last_hidden_states
:
for
index
,
n
in
enumerate
(
m
):
if
(
index
>
0
and
index
<=
len
(
text
.
split
(
" "
))):
print
(
text
.
split
(
" "
)[
index
-
1
]
+
": "
+
id2label
[
str
(
torch
.
argmax
(
n
).
item
())])
'''
Output:
--------
Julien,: I-PER
CEO: O
de: O
HF,: B-ORG
nació: I-PER
en: I-PER
Francia.: I-LOC
'''
```
Yeah! Not too bad 🎉
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">♥</span> in Spain
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment