Extend mask_along_axis{,_iid} (#3289)
Summary: (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. To solve these issues, here we - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. The introduction of SpecAugment transform will be done in another PR. Pull Request resolved: https://github.com/pytorch/audio/pull/3289 Reviewed By: hwangjeff Differential Revision: D45460357 Pulled By: xiaohui-zhang fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
Showing
Please register or sign in to comment