"vscode:/vscode.git/clone" did not exist on "4be4b134247cf79617480e5f4646dfa07bd96a4e"
  • Ning Dong's avatar
    Support dataset upsampling / relative ratio in PytorchTranslateTask (#494) · ff74ca94
    Ning Dong authored
    Summary:
    Pull Request resolved: https://github.com/pytorch/translate/pull/494
    
    Pull Request resolved: https://github.com/pytorch/fairseq/pull/657
    
    Library side change split from D14924942
    
    Added 2 arguments for load_dataset in PytorchTranslateTask
    1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.
    
    2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.
    
    At most one of them could be specified.
    
    Reviewed By: liezl200
    
    Differential Revision: D15041293
    
    fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
    ff74ca94
data_utils.py 6.41 KB