"examples/tensorflow/summarization/run_summarization.py" did not exist on "bf2e0cf70b68e0d46cdf15a4ece1f5c0a03de084"
Unverified Commit e79a0fae authored by Maciej Pawłowski's avatar Maciej Pawłowski Committed by GitHub
Browse files

Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)

* Added missing code in exemplary notebook - custom datasets fine-tuning

Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).

* Changes requested in the review - keep the code as simple as possible
parent 0501beb8
......@@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
label_ids.append(-100)
elif word_idx != previous_word_idx: # Only label the first token of a given word.
label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment