use numpy function for filter by size when possible (#845)
Summary: For general Masked language modeling use-case, this is much faster, (`3 minutes vs 1 sec`). Let me know what you think about it myleott, if you don't like all the special case checking, we can think of reorganizing the dataset APIs to always have `sizes` as property calculated in `__init__`. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/845 Reviewed By: myleott Differential Revision: D16993769 Pulled By: myleott fbshipit-source-id: 161bba62af2965190c07c47e838ee967cb886e88
Showing
Please register or sign in to comment