Adding doctest for `token-classification` pipeline. (#20265)

* Adding doctest for `token-classification` pipeline. * Adding doctest to `token-classification` pipeline. * Remove nested_simplify.

Adding doctest for `token-classification` pipeline. (#20265)
* Adding doctest for `token-classification` pipeline. * Adding doctest to `token-classification` pipeline. * Remove nested_simplify.
9ea1dbd2 · Nicolas Patry · GitHub · 21b0ad05 · 9ea1dbd2
Unverified Commit 9ea1dbd2 authored Nov 16, 2022 by Nicolas Patry Committed by GitHub Nov 16, 2022
Show whitespace changes
Inline Side-by-side

Showing with 23 additions and 0 deletions

src/transformers/pipelines/token_classification.py src/transformers/pipelines/token_classification.py +23 -0

No files found.
--- a/src/transformers/pipelines/token_classification.py
+++ b/src/transformers/pipelines/token_classification.py
@@ -88,6 +88,29 @@ class TokenClassificationPipeline(Pipeline):
    Named Entity Recognition pipeline using any `ModelForTokenClassification`. See the [named entity recognition
    examples](../task_summary#named-entity-recognition) for more information.
+    Example:
+    ```python
+    >>> from transformers import pipeline
+    >>> token_classifier = pipeline(model="Jean-Baptiste/camembert-ner", aggregation_strategy="simple")
+    >>> sentence = "Je m'appelle jean-baptiste et je vis à montréal"
+    >>> token_classifier(sentence)
+    [{'entity_group': 'PER', 'score': 0.9931, 'word': 'jean-baptiste', 'start': 12, 'end': 26}, {'entity_group': 'LOC', 'score': 0.998, 'word': 'montréal', 'start': 38, 'end': 47}]
+    >>> token = tokens[0]
+    >>> # Start and end provide an easy way to highlight words in the original text.
+    >>> sentence[token["start"] : token["end"]]
+    ' jean-baptiste'
+    >>> # Some models use the same idea to do part of speech.
+    >>> syntaxer = pipeline(model="vblagoje/bert-english-uncased-finetuned-pos", aggregation_strategy="simple")
+    >>> syntaxer("My name is Sarah and I live in London")
+    [{'entity_group': 'PRON', 'score': 0.999, 'word': 'my', 'start': 0, 'end': 2}, {'entity_group': 'NOUN', 'score': 0.997, 'word': 'name', 'start': 3, 'end': 7}, {'entity_group': 'AUX', 'score': 0.994, 'word': 'is', 'start': 8, 'end': 10}, {'entity_group': 'PROPN', 'score': 0.999, 'word': 'sarah', 'start': 11, 'end': 16}, {'entity_group': 'CCONJ', 'score': 0.999, 'word': 'and', 'start': 17, 'end': 20}, {'entity_group': 'PRON', 'score': 0.999, 'word': 'i', 'start': 21, 'end': 22}, {'entity_group': 'VERB', 'score': 0.998, 'word': 'live', 'start': 23, 'end': 27}, {'entity_group': 'ADP', 'score': 0.999, 'word': 'in', 'start': 28, 'end': 30}, {'entity_group': 'PROPN', 'score': 0.999, 'word': 'london', 'start': 31, 'end': 37}]
+    ```
+    [Learn more about the basics of using a pipeline in the [pipeline tutorial]](../pipeline_tutorial)
    This token recognition pipeline can currently be loaded from [`pipeline`] using the following task identifier:
    `"ner"` (for predicting the classes of tokens in a sequence: person, organisation, location or miscellaneous).