"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "ae093eef016533a3670561fa9e26addb42d446d1"
Unverified Commit e79a0fae authored by Maciej Pawłowski's avatar Maciej Pawłowski Committed by GitHub
Browse files

Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)

* Added missing code in exemplary notebook - custom datasets fine-tuning

Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).

* Changes requested in the review - keep the code as simple as possible
parent 0501beb8
...@@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples): ...@@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
label_ids.append(-100) label_ids.append(-100)
elif word_idx != previous_word_idx: # Only label the first token of a given word. elif word_idx != previous_word_idx: # Only label the first token of a given word.
label_ids.append(label[word_idx]) label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids) labels.append(label_ids)
tokenized_inputs["labels"] = labels tokenized_inputs["labels"] = labels
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment