(It rarely makes sense to start from `bart-large` unless you are a researching finetuning methods).
(It rarely makes sense to start from `bart-large` unless you are a researching finetuning methods).
**Update 2018-07-18**
**Update 2018-07-18**
Datasets: `Seq2SeqDataset` should be used for all tokenizers without a `prepare_translation_batch` method. For those who do (like Marian, MBart), `TranslationDataset` should be used.**
Datasets: `Seq2SeqDataset` should be used for all tokenizers without a `prepare_seq2seq_batch` method. For those who do (like Marian, MBart), `TranslationDataset` should be used.**
A new dataset is needed to support multilingual tasks.
A new dataset is needed to support multilingual tasks.
@@ -1522,3 +1522,37 @@ class TokenizerTesterMixin:
...
@@ -1522,3 +1522,37 @@ class TokenizerTesterMixin:
ifbatch_encoded_sequence_fastisNone:
ifbatch_encoded_sequence_fastisNone:
raiseValueError("Cannot convert list to numpy tensor on batch_encode_plus() (fast)")
raiseValueError("Cannot convert list to numpy tensor on batch_encode_plus() (fast)")
@require_torch
deftest_prepare_seq2seq_batch(self):
tokenizer=self.get_tokenizer()
ifnothasattr(tokenizer,"prepare_seq2seq_batch"):
return
# Longer text that will definitely require truncation.
src_text=[
" UN Chief Says There Is No Military Solution in Syria",
" Secretary-General Ban Ki-moon says his response to Russia's stepped up military support for Syria is that 'there is no military solution' to the nearly five-year conflict and more weapons will only worsen the violence and misery for millions of people.",
]
tgt_text=[
"Şeful ONU declară că nu există o soluţie militară în Siria",
"Secretarul General Ban Ki-moon declară că răspunsul său la intensificarea sprijinului militar al Rusiei "
'pentru Siria este că "nu există o soluţie militară" la conflictul de aproape cinci ani şi că noi arme nu '
"vor face decât să înrăutăţească violenţele şi mizeria pentru milioane de oameni.",