"git@developer.sourcefind.cn:OpenDAS/torch-cluster.git" did not exist on "29a7267040c105972254315507c2b80844907333"
NerPipeline (TokenClassification) now outputs offsets of words (#8781)
* NerPipeline (TokenClassification) now outputs offsets of words
- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:
```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
# or entity["entity"] if not grouped
```
* Fixing doc style
Showing
Please register or sign in to comment