Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
72768b6b
Commit
72768b6b
authored
Mar 12, 2020
by
Julien Chaumond
Browse files
[model_cards] polbert: simplify usage example with pipelines
Co-Authored-By:
Darek Kłeczek
<
darek.kleczek@gmail.com
>
parent
a4c75f14
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
20 deletions
+13
-20
model_cards/dkleczek/bert-base-polish-uncased-v1/README.md
model_cards/dkleczek/bert-base-polish-uncased-v1/README.md
+13
-20
No files found.
model_cards/dkleczek/bert-base-polish-uncased-v1/README.md
View file @
72768b6b
...
...
@@ -35,26 +35,19 @@ Polbert is released via [HuggingFace Transformers library](https://huggingface.c
For an example use as language model, see
[
this notebook
](
https://github.com/kldarek/polbert/blob/master/LM_testing.ipynb
)
file.
```
python
import
numpy
as
np
import
torch
import
transformers
as
ppb
tokenizer
=
ppb
.
BertTokenizer
.
from_pretrained
(
'dkleczek/bert-base-polish-uncased-v1'
)
bert_model
=
ppb
.
BertForMaskedLM
.
from_pretrained
(
'dkleczek/bert-base-polish-uncased-v1'
)
string1
=
'Adam mickiewicz wielkim polskim [MASK] był .'
indices
=
tokenizer
.
encode
(
string1
,
add_special_tokens
=
True
)
masked_token
=
np
.
argwhere
(
np
.
array
(
indices
)
==
3
).
flatten
()[
0
]
# 3 is the vocab id for [MASK] token
input_ids
=
torch
.
tensor
([
indices
])
with
torch
.
no_grad
():
last_hidden_states
=
bert_model
(
input_ids
)[
0
]
more_words
=
np
.
argsort
(
np
.
asarray
(
last_hidden_states
[
0
,
masked_token
,:]))[
-
4
:]
print
(
more_words
)
from
transformers
import
*
model
=
BertForMaskedLM
.
from_pretrained
(
"dkleczek/bert-base-polish-uncased-v1"
)
tokenizer
=
BertTokenizer
.
from_pretrained
(
"dkleczek/bert-base-polish-uncased-v1"
)
nlp
=
pipeline
(
'fill-mask'
,
model
=
model
,
tokenizer
=
tokenizer
)
for
pred
in
nlp
(
f
"Adam Mickiewicz wielkim polskim
{
nlp
.
tokenizer
.
mask_token
}
był."
):
print
(
pred
)
# Output:
# poeta
# bohaterem
# człowiekiem
# pisarzem
# {'sequence': '[CLS] adam mickiewicz wielkim polskim poeta był. [SEP]', 'score': 0.47196975350379944, 'token': 26596}
# {'sequence': '[CLS] adam mickiewicz wielkim polskim bohaterem był. [SEP]', 'score': 0.09127858281135559, 'token': 10953}
# {'sequence': '[CLS] adam mickiewicz wielkim polskim człowiekiem był. [SEP]', 'score': 0.0647173821926117, 'token': 5182}
# {'sequence': '[CLS] adam mickiewicz wielkim polskim pisarzem był. [SEP]', 'score': 0.05232388526201248, 'token': 24293}
# {'sequence': '[CLS] adam mickiewicz wielkim polskim politykiem był. [SEP]', 'score': 0.04554257541894913, 'token': 44095}
```
See the next section for an example usage of Polbert in downstream tasks.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment