-
Liang Wang authored
Summary: When we have multiple valid subsets, say `valid`, `valid1` and `valid2`, if `combine=True` holds, when loading `valid` subset, it will try to locate and load `valid`, `valid1`, `valid2`... and then combine them into one dataset. Set `combine` to `False` solves this issue. In my experiment, I have 3 valid subsets with 3000, 5000 and 8701 examples, with argument `--valid-subset valid,valid1,valid2`, the log is as follows: ``` ...... | ./mix_data/bin valid src-trg 3000 examples | ./mix_data/bin valid1 src-trg 5000 examples | ./mix_data/bin valid2 src-trg 7801 examples | ./mix_data/bin valid1 src-trg 5000 examples | ./mix_data/bin valid2 src-trg 7801 examples ...... ``` As shown above, `valid1` and `valid2` subsets are incorrectly loaded twice. Pull Request resolved: https://github.com/pytorch/fairseq/pull/835 Differential Revision: D16006343 Pulled By: myleott fbshipit-source-id: ece7fee3a00f97a6b3409defbf7f7ffaf0a54fdc
8b514b9f