- 23 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Fix https://github.com/pytorch/audio/issues/3361 When adding FunctionalCUDAOnlyTest, the class should inherit from `TestBaseMixin` instead of `Functional` Pull Request resolved: https://github.com/pytorch/audio/pull/3363 Reviewed By: atalman, osalpekar Differential Revision: D46112084 Pulled By: nateanl fbshipit-source-id: 67c6472fda98cb718e0fc53ab248beda745feab5
-
- 22 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3354 when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0. Reviewed By: xiaohui-zhang Differential Revision: D46059971 fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8
-
- 20 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3348 The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame. Reviewed By: vineelpratap, xiaohui-zhang Differential Revision: D45867265 fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb
-
- 17 May, 2023 1 commit
-
-
moto authored
Summary: This commit add support to decode YUV420P010LE format. The image tensor returned by this format - NCHW format (C == 3) - int16 type - value range [0, 2^10). Note that the value range is different from what "hevc_cuvid" decoder returns. "hevc_cuvid" decoder uses full range of int16 (internally, it's uint16) to express the color (with some intervals), but the values returned by CPU "hevc" decoder are with in [0, 2^10). Address https://github.com/pytorch/audio/issues/3331 Pull Request resolved: https://github.com/pytorch/audio/pull/3332 Reviewed By: hwangjeff Differential Revision: D45925097 Pulled By: mthrok fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
-
- 10 May, 2023 2 commits
-
-
moto authored
Summary: This commit makes the code defaults to the backend dispatcher by default. Enabling backend dispatcher puts the FFmpeg-based I/O implementation on higher priority (if the corresponding FFmpeg is available), and allows individual function call to specify the backend. See also https://github.com/pytorch/audio/issues/2950 Pull Request resolved: https://github.com/pytorch/audio/pull/3241 Reviewed By: hwangjeff Differential Revision: D44709068 Pulled By: mthrok fbshipit-source-id: 43aac3433f78a681df6669e9ac46e8ecf3beb1be
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2643 - replace `SGD` optimization with `torch.linalg.lstsq` which is much faster. - Add autograd test for `InverseMelScale` - update other tests Pull Request resolved: https://github.com/pytorch/audio/pull/3280 Reviewed By: hwangjeff Differential Revision: D45679988 Pulled By: nateanl fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59
-
- 09 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`. Pull Request resolved: https://github.com/pytorch/audio/pull/3322 Reviewed By: mthrok Differential Revision: D45691769 Pulled By: nateanl fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045
-
- 05 May, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: (2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this. To solve these issues, here we [done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. [done in this PR] Introducing SpecAugment transform. Pull Request resolved: https://github.com/pytorch/audio/pull/3309 Reviewed By: nateanl Differential Revision: D45592926 Pulled By: xiaohui-zhang fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2
-
- 04 May, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. To solve these issues, here we - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. The introduction of SpecAugment transform will be done in another PR. Pull Request resolved: https://github.com/pytorch/audio/pull/3289 Reviewed By: hwangjeff Differential Revision: D45460357 Pulled By: xiaohui-zhang fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 12 Apr, 2023 2 commits
-
-
moto authored
Summary: When `TORCHAUDIO_TEST_TEMP_DIR` is set, all the unit test temporary data are stored in the given directory. Running unit tests multiple times reuses the directory and the temporary files from the previous test runs are found there. FFmpeg save test writes reference data to the temporary directory, but it is not given the overwrite flag ("-y"), so it fails in such cases. This commit fixes that. Pull Request resolved: https://github.com/pytorch/audio/pull/3263 Reviewed By: hwangjeff Differential Revision: D44859003 Pulled By: mthrok fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b -
moto authored
Summary: Preparation to land https://github.com/pytorch/audio/pull/3241 This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled. Pull Request resolved: https://github.com/pytorch/audio/pull/3262 Reviewed By: hwangjeff Differential Revision: D44897513 Pulled By: mthrok fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32
-
- 05 Apr, 2023 1 commit
-
-
moto authored
Summary: In dispatcher mode, FFmpeg backend does not handle file-like object, and C++ implementation raises an issue. This commit fixes it by normalizing file-like object to string. Pull Request resolved: https://github.com/pytorch/audio/pull/3243 Reviewed By: nateanl Differential Revision: D44719280 Pulled By: mthrok fbshipit-source-id: 9dae459e2a5fb4992b4ef53fe4829fe8c35b2edd
-
- 03 Apr, 2023 1 commit
-
-
moto authored
Summary: Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( https://github.com/pybind/pybind11/discussions/4013 and https://github.com/pybind/pybind11/pull/2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address https://github.com/pytorch/audio/issues/3218 Pull Request resolved: https://github.com/pytorch/audio/pull/3230 Reviewed By: hwangjeff Differential Revision: D44642989 Pulled By: mthrok fbshipit-source-id: a90af828c7c576bc0eb505164327365ebaadc471
-
- 01 Apr, 2023 1 commit
-
-
moto authored
Summary: This commit adds a new feature AudioEffector, which can be used to apply various effects and codecs to waveforms in Tensor. Under the hood it uses StreamWriter and StreamReader to apply filters and encode/decode. This is going to replace the deprecated `apply_codec` and `apply_sox_effect_tensor` functions. It can also perform online, chunk-by-chunk filtering. Tutorial to follow. closes https://github.com/pytorch/audio/issues/3161 Pull Request resolved: https://github.com/pytorch/audio/pull/3163 Reviewed By: hwangjeff Differential Revision: D44576660 Pulled By: mthrok fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
-
- 30 Mar, 2023 2 commits
-
-
moto authored
Summary: This commit adds support for changing the spec of media (such as sample rate, #channels, image size and frame rate) on-the-fly at encoding time. The motivation behind this addition is that certain media formats support only limited number of spec, and it is cumbersome to require client code to change the spec every time. For example, OPUS supports only 48kHz sampling rate, and vorbis only supports stereo. To make it easy to work with media of different formats, this commit makes it so that anything that's not compatible with the format is automatically converted, and allows users to specify the override. Notable implementation detail is that, for sample format and pixel format, the default value of encoder has higher precedent to source value, while for other attributes like sample rate and #channels, the source value has higher precedent as long as they are supported. Pull Request resolved: https://github.com/pytorch/audio/pull/3207 Reviewed By: nateanl Differential Revision: D44439622 Pulled By: mthrok fbshipit-source-id: 09524f201d485d201150481884a3e9e4d2aab081
-
moto authored
Summary: This commit adds `num_channels` argument, which allows one to change the number of channels on-the-fly. Pull Request resolved: https://github.com/pytorch/audio/pull/3216 Reviewed By: hwangjeff Differential Revision: D44516925 Pulled By: mthrok fbshipit-source-id: 3e5a11b3fdbb19071f712a8148e27aff60341df3
-
- 29 Mar, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3217 This commit removes some tests for file-like object from StreamWriter test. The rational is that testing things after the output file is opened are same for file-like object and regular files. Things like filter-graph and encoder format change does not affect how the encoded bynary are written. Reviewed By: hwangjeff Differential Revision: D44518626 fbshipit-source-id: 821ec20deca92e5e5c85bf4d47997eed51735374
-
- 28 Mar, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3194 Reviewed By: hwangjeff Differential Revision: D44283910 Pulled By: mthrok fbshipit-source-id: 49125724896bf7190ec27f056b6bfef260019f8e
-
- 27 Mar, 2023 1 commit
-
-
hwangjeff authored
Summary: For `StreamWriter`, * Renames arg `config` to codec_config`. * Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`. * Adds docstrings for arg codec_config`. * Updates `chunk` to `frames` in `write_*_chunk` methods. Pull Request resolved: https://github.com/pytorch/audio/pull/3203 Reviewed By: mthrok Differential Revision: D44350153 Pulled By: hwangjeff fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343
-
- 25 Mar, 2023 1 commit
-
-
moto authored
Summary: Some audio encoders expect specific, exact number of samples described as in `AVCodecContext.frame_size`. The `AVFrame.nb_samples` is set for the frames passed to `AVFilterGraph`, but frames coming out of the graph do not necessarily have the same numbr of frames. This causes issues with encoding OPUS (among others). This commit fixes it by inserting `asetnsamples` to filter graph if a fixed number of samples is requested. Note: It turned out that FFmpeg 4.1 has issue with OPUS encoding. It does not properly discard some sample. We should probably move the minimum required FFmpeg to 4.2, but I am not sure if we can enforce it via ABI. Work around will be to issue an warning if encoding OPUS with 4.1. (follow-up) Pull Request resolved: https://github.com/pytorch/audio/pull/3204 Reviewed By: nateanl Differential Revision: D44374668 Pulled By: mthrok fbshipit-source-id: 10ef5333dc0677dfb83c8e40b78edd8ded1b21dc
-
- 23 Mar, 2023 3 commits
-
-
moto authored
Summary: With the support of CUDA filter in https://github.com/pytorch/audio/issues/3183, it is now possible to change the pixel format of CUDA frame. This commit adds conversion for YUV444P format. Pull Request resolved: https://github.com/pytorch/audio/pull/3199 Reviewed By: hwangjeff Differential Revision: D44323928 Pulled By: mthrok fbshipit-source-id: 6d9b205e7235df5f21e7d3e06166b3a169f1ae9f
-
Zhaoheng Ni authored
Summary: The PR adds the pre-trained pipeline for `SquimSubjective` model which predicts MOS score for speech enhancement task. Pull Request resolved: https://github.com/pytorch/audio/pull/3197 Reviewed By: mthrok Differential Revision: D44313244 Pulled By: nateanl fbshipit-source-id: 905095ff77006e9f441faa826fc25d9d8681e8aa
-
moto authored
Summary: OPUS encoder and VORBIS encoders require "strict=experimental" flags. This commit enables it automatically. The rational behind of it is typically we care if we can encode these formats at all and not how they are encoded. (This might be concern when these encoder becomes more mature on FFmpeg side and providing flags would result in weird behavior) Also when writing high-level functions that uses StreamWriter, if we do not set these flags, then these high-level functions have to add new options that should be passed down to StreamWriter, which turned out to be very painful in https://github.com/pytorch/audio/issues/3163 Pull Request resolved: https://github.com/pytorch/audio/pull/3192 Reviewed By: nateanl Differential Revision: D44275089 Pulled By: mthrok fbshipit-source-id: 74a757b4b7fc8467c8c88ffcb54fbaf89d6e4384
-
- 22 Mar, 2023 1 commit
-
-
moto authored
Summary: Follow up of https://github.com/pytorch/audio/pull/3083 Pull Request resolved: https://github.com/pytorch/audio/pull/3196 Reviewed By: nateanl Differential Revision: D44308940 Pulled By: mthrok fbshipit-source-id: e3ef27656e74c28ae78b767517d8e0ba3a9ac4a6
-
- 21 Mar, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task. Pull Request resolved: https://github.com/pytorch/audio/pull/3189 Reviewed By: mthrok Differential Revision: D44267255 Pulled By: nateanl fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db
-
Zhaoheng Ni authored
Summary: In librosa 0.10 release, positional arguments are deprecated (see https://github.com/librosa/librosa/pull/1521 for details). The PR fixes the HiFiGAN unit test by using keyword arguments for `librosa.filters.mel` function. Pull Request resolved: https://github.com/pytorch/audio/pull/3185 Reviewed By: mthrok Differential Revision: D44218852 Pulled By: nateanl fbshipit-source-id: 6171f7bec6a2144917697c1d640e701d95ec60d7
-
- 20 Mar, 2023 1 commit
-
-
moto authored
Summary: This commit adds CUDA frame support to FilterGraph It initializes and attaches CUDA frames context to FilterGraph, so that CUDA frames can be processed in FilterGraph. As a result, it enables 1. CUDA filter support such as `scale_cuda` 2. Properly retrieve the pixel format coming out of FilterGraph when CUDA HW acceleration is enabled. (currently it is reported as "cuda") Resolves https://github.com/pytorch/audio/issues/3159 Pull Request resolved: https://github.com/pytorch/audio/pull/3183 Reviewed By: hwangjeff Differential Revision: D44183722 Pulled By: mthrok fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f
-
- 17 Mar, 2023 1 commit
-
-
moto authored
Summary: Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level. Pull Request resolved: https://github.com/pytorch/audio/pull/3179 Pull Request resolved: https://github.com/pytorch/audio/pull/3164 Reviewed By: mthrok Differential Revision: D43861413 Pulled By: hwangjeff fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161
-
- 16 Mar, 2023 1 commit
-
-
moto authored
Summary: Currently, when the Buffer converts AVFrame* to torch::Tensor, it checks the format at each time a frame is passed, and perform the conversion. This commit changes it so that the conversion operation is pre-instantiated at the time outside stream is configured. It introduces Converter implementations for various formats, and use template to embed them in Buffer class. This way, branching like if/switch are eliminated from decoding path. Pull Request resolved: https://github.com/pytorch/audio/pull/3170 Reviewed By: xiaohui-zhang Differential Revision: D44048293 Pulled By: mthrok fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
-
- 15 Mar, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Autograd test randomly fails for MFCC transform. Fix it by increasing `nondet_tol` to `1e-10`. Pull Request resolved: https://github.com/pytorch/audio/pull/3169 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D44069673 Pulled By: nateanl fbshipit-source-id: addafefe381104e778b09bfbaafb322df1d9054c
-
- 08 Mar, 2023 2 commits
-
-
moto authored
Summary: This commit adds fields to OutputStream, which shows the result of fitlers, such as width and height after filtering. Before ``` OutputStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray') ``` After ``` OutputVideoStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0) ``` Pull Request resolved: https://github.com/pytorch/audio/pull/3155 Reviewed By: nateanl Differential Revision: D43882399 Pulled By: mthrok fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d -
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3135 Reviewed By: xiaohui-zhang Differential Revision: D43724273 Pulled By: mthrok fbshipit-source-id: 9b52823618948945a26e57d5b3deccbf5f9268c1
-
- 07 Mar, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: `filtfilt` function uses `lfilter`, which calls `conv_1d` operation internally. `conv_1d` is expected to have autograd test failures (see https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html). The PR uses deterministic algorithms in the autograd tests to make `filtfilt` related tests pass. Pull Request resolved: https://github.com/pytorch/audio/pull/3150 Reviewed By: mthrok Differential Revision: D43872977 Pulled By: nateanl fbshipit-source-id: c3d6ec281f34db8a7092526ccb245797bf2338da
-
Zhaoheng Ni authored
Summary: Autograd test randomly failed on gpu linux machine. Increase `nondet_tol` to make it pass. Pull Request resolved: https://github.com/pytorch/audio/pull/3154 Reviewed By: mthrok Differential Revision: D43873028 Pulled By: nateanl fbshipit-source-id: a6668c47967a085e5eafb00e2dd4e61b2b46412e
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3152 In StreamWriter, if the destination is not opened when attempting to write data, it causes segmentation fault. This commit adds guard so that instead of segfault, it will error-out. Reviewed By: nateanl Differential Revision: D43852649 fbshipit-source-id: aef5db7c1508f8a7db5834c2ab6de3cad09f9d60
-
- 02 Mar, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3131 In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable is removed. PTS can be incremented the same way, but the timing was wrong in #3122. This commit fixes it. Reviewed By: xiaohui-zhang Differential Revision: D43712046 fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9
-
- 01 Mar, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: `sox` is not available on Windows machines. Add skip decorators to the sox related tests to skip running tests on Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/3119 Reviewed By: mthrok Differential Revision: D43682754 Pulled By: nateanl fbshipit-source-id: f69987dac8232a3569be83f096b32389bd8bda81
-
- 27 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/3103 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43611794 Pulled By: nateanl fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d
-
- 25 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3099 Reviewed By: mthrok Differential Revision: D43596866 Pulled By: nateanl fbshipit-source-id: 43a139bf8ebdf3261414e2855aefc3b53df298ac
-