Unverified Commit 29dada00 authored by Ibraheem Moosa's avatar Ibraheem Moosa Committed by GitHub
Browse files

Use original key for label in DataCollatorForTokenClassification (#13057)

* Use original key for label in DataCollatorForTokenClassification

DataCollatorForTokenClassification accepts either `label` or `labels` as key for label in it's input. However after padding the label it assigns the padded labels to key `labels`. If originally `label` was used as key than the original upadded labels still remains in the batch. Then at line 192 when we try to convert the batch elements to torch tensor than these original unpadded labels cannot be converted as the labels for different samples have different lengths.

* Fixed style.
parent 95e2e14f
......@@ -185,9 +185,13 @@ class DataCollatorForTokenClassification:
sequence_length = torch.tensor(batch["input_ids"]).shape[1]
padding_side = self.tokenizer.padding_side
if padding_side == "right":
batch["labels"] = [label + [self.label_pad_token_id] * (sequence_length - len(label)) for label in labels]
batch[label_name] = [
label + [self.label_pad_token_id] * (sequence_length - len(label)) for label in labels
]
else:
batch["labels"] = [[self.label_pad_token_id] * (sequence_length - len(label)) + label for label in labels]
batch[label_name] = [
[self.label_pad_token_id] * (sequence_length - len(label)) + label for label in labels
]
batch = {k: torch.tensor(v, dtype=torch.int64) for k, v in batch.items()}
return batch
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment