Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
ed5456da
"test/git@developer.sourcefind.cn:zhaoyu6/sglang.git" did not exist on "ad0ff62a4c25f9d47533c22be083cacf38f60c68"
Unverified
Commit
ed5456da
authored
May 20, 2020
by
Manuel Romero
Committed by
GitHub
May 20, 2020
Browse files
Model card for RuPERTa-base fine-tuned for NER (#4466)
parent
c76450e2
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
92 additions
and
0 deletions
+92
-0
model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md
model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md
+92
-0
No files found.
model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md
0 → 100644
View file @
ed5456da
---
language
:
spanish
thumbnail
:
---
# RuPERTa-base (Spanish RoBERTa) + NER 🎃🏷
This model is a fine-tuned on
[
NER-C
](
https://www.kaggle.com/nltkdata/conll-corpora
)
version of
[
RuPERTa-base
](
https://huggingface.co/mrm8488/RuPERTa-base
)
for
**NER**
downstream task.
## Details of the downstream task (NER) - Dataset
-
[
Dataset: CONLL Corpora ES
](
https://www.kaggle.com/nltkdata/conll-corpora
)
📚
| Dataset | # Examples |
| ---------------------- | ----- |
| Train | 329 K |
| Dev | 40 K |
-
[
Fine-tune on NER script provided by Huggingface
](
https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner.py
)
-
Labels covered:
```
B-LOC
B-MISC
B-ORG
B-PER
I-LOC
I-MISC
I-ORG
I-PER
O
```
## Metrics on evaluation set 🧾
| Metric | # score |
| :------------------------------------------------------------------------------------: | :-------: |
| F1 |
**77.55**
| Precision |
**75.53**
|
| Recall |
**79.68**
|
## Model in action 🔨
Example of usage:
```
python
import
torch
from
transformers
import
AutoModelForTokenClassification
,
AutoTokenizer
id2label
=
{
"0"
:
"B-LOC"
,
"1"
:
"B-MISC"
,
"2"
:
"B-ORG"
,
"3"
:
"B-PER"
,
"4"
:
"I-LOC"
,
"5"
:
"I-MISC"
,
"6"
:
"I-ORG"
,
"7"
:
"I-PER"
,
"8"
:
"O"
}
text
=
"Julien, CEO de HF, nació en Francia."
input_ids
=
torch
.
tensor
(
tokenizer
.
encode
(
text
)).
unsqueeze
(
0
)
outputs
=
model
(
input_ids
)
last_hidden_states
=
outputs
[
0
]
for
m
in
last_hidden_states
:
for
index
,
n
in
enumerate
(
m
):
if
(
index
>
0
and
index
<=
len
(
text
.
split
(
" "
))):
print
(
text
.
split
(
" "
)[
index
-
1
]
+
": "
+
id2label
[
str
(
torch
.
argmax
(
n
).
item
())])
'''
Output:
--------
Julien,: I-PER
CEO: O
de: O
HF,: B-ORG
nació: I-PER
en: I-PER
Francia.: I-LOC
'''
```
Yeah! Not too bad 🎉
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">♥</span> in Spain
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment