Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)

* Added missing code in exemplary notebook - custom datasets fine-tuning Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification. The missing code concerns adding labels for all but first token in a single word. The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb). * Changes requested in the review - keep the code as simple as possible

Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)
* Added missing code in exemplary notebook - custom datasets fine-tuning Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification. The missing code concerns adding labels for all but first token in a single word. The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb). * Changes requested in the review - keep the code as simple as possible
e79a0fae · Maciej Pawłowski · GitHub · 0501beb8 · e79a0fae
Unverified Commit e79a0fae authored Jan 25, 2022 by Maciej Pawłowski Committed by GitHub Jan 25, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 1 deletion

docs/source/custom_datasets.mdx docs/source/custom_datasets.mdx +3 -1

No files found.
--- a/docs/source/custom_datasets.mdx
+++ b/docs/source/custom_datasets.mdx
@@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
                label_ids.append(-100)
            elif word_idx != previous_word_idx:  # Only label the first token of a given word.
                label_ids.append(label[word_idx])
-
+            else:
+                label_ids.append(-100)
+            previous_word_idx = word_idx
        labels.append(label_ids)

    tokenized_inputs["labels"] = labels