[model_cards] polbert: simplify usage example with pipelines

Co-Authored-By: Darek Kłeczek <darek.kleczek@gmail.com>

[model_cards] polbert: simplify usage example with pipelines
Co-Authored-By: Darek Kłeczek <darek.kleczek@gmail.com>
72768b6b · Julien Chaumond · a4c75f14 · 72768b6b
Commit 72768b6b authored Mar 12, 2020 by Julien Chaumond
Show whitespace changes
Inline Side-by-side

Showing with 13 additions and 20 deletions

model_cards/dkleczek/bert-base-polish-uncased-v1/README.md model_cards/dkleczek/bert-base-polish-uncased-v1/README.md +13 -20

No files found.
--- a/model_cards/dkleczek/bert-base-polish-uncased-v1/README.md
+++ b/model_cards/dkleczek/bert-base-polish-uncased-v1/README.md
@@ -35,26 +35,19 @@ Polbert is released via [HuggingFace Transformers library](https://huggingface.c
 For an example use as language model, see [this notebook](https://github.com/kldarek/polbert/blob/master/LM_testing.ipynb) file. 
 ```python
-import numpy as np
+from transformers import *
-import torch
+model = BertForMaskedLM.from_pretrained("dkleczek/bert-base-polish-uncased-v1")
-import transformers as ppb
+tokenizer = BertTokenizer.from_pretrained("dkleczek/bert-base-polish-uncased-v1")
+nlp = pipeline('fill-mask', model=model, tokenizer=tokenizer)
-tokenizer = ppb.BertTokenizer.from_pretrained('dkleczek/bert-base-polish-uncased-v1')
+for pred in nlp(f"Adam Mickiewicz wielkim polskim {nlp.tokenizer.mask_token} był."):
-bert_model = ppb.BertForMaskedLM.from_pretrained('dkleczek/bert-base-polish-uncased-v1') 
+  print(pred)
-string1 = 'Adam mickiewicz wielkim polskim [MASK] był .'
-indices = tokenizer.encode(string1, add_special_tokens=True)
-masked_token = np.argwhere(np.array(indices) == 3).flatten()[0] # 3 is the vocab id for [MASK] token
-input_ids = torch.tensor([indices])
-with torch.no_grad():
-    last_hidden_states = bert_model(input_ids)[0]
-more_words = np.argsort(np.asarray(last_hidden_states[0,masked_token,:]))[-4:]
-print(more_words)
 # Output:
-# poeta
+# {'sequence': '[CLS] adam mickiewicz wielkim polskim poeta był. [SEP]', 'score': 0.47196975350379944, 'token': 26596}
-# bohaterem
+# {'sequence': '[CLS] adam mickiewicz wielkim polskim bohaterem był. [SEP]', 'score': 0.09127858281135559, 'token': 10953}
-# człowiekiem
+# {'sequence': '[CLS] adam mickiewicz wielkim polskim człowiekiem był. [SEP]', 'score': 0.0647173821926117, 'token': 5182}
-# pisarzem
+# {'sequence': '[CLS] adam mickiewicz wielkim polskim pisarzem był. [SEP]', 'score': 0.05232388526201248, 'token': 24293}
+# {'sequence': '[CLS] adam mickiewicz wielkim polskim politykiem był. [SEP]', 'score': 0.04554257541894913, 'token': 44095}
 ```
 See the next section for an example usage of Polbert in downstream tasks.