- 06 May, 2022 2 commits
-
-
moto authored
Summary: This commit changes the way torchaudio binary distributions are built. * For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries. * The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL. * The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment. * The torchaudio binary build process will use them to bootstrap its build process. * The custom FFmpeg libraries are NOT shipped. This commit also add disclaimer about FFmpeg in README. Pull Request resolved: https://github.com/pytorch/audio/pull/2355 Reviewed By: nateanl Differential Revision: D36202087 Pulled By: mthrok fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef
-
moto authored
Summary: The smoke test jobs simply perform `import torchaudio` to check if the package artifacts are sane. Originally, the CI was executing it in the root directory. This was fine unless the source code is checked out. When source code is checked out, performing `import torchaudio` in root directory would import source torchaudio directory, instead of the installed package. This error is difficult to notice, so this commit introduces common script to perform the smoke test, while moving out of root directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2365 Reviewed By: carolineechen Differential Revision: D36202069 Pulled By: mthrok fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf
-
- 05 May, 2022 2 commits
-
-
moto authored
Summary: Currently smoke tests are only executed on nightly jobs. This is inconvenient as PRs that changes build process do not get the signal naturally. This commit changes it by always executing smoke tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2364 Reviewed By: atalman Differential Revision: D36171267 Pulled By: mthrok fbshipit-source-id: e549965ba139b5992177b7a094d87c9ef4432a7f
-
Andrey Talman authored
Summary: This PR fixes Windows Smoke tests Tested via circleci : https://app.circleci.com/pipelines/github/pytorch/audio/10572/workflows/970fd791-25cc-4af4-8183-a7835e1891bf/jobs/637607 Pull Request resolved: https://github.com/pytorch/audio/pull/2361 Reviewed By: nateanl, mthrok Differential Revision: D36167317 Pulled By: atalman fbshipit-source-id: 1418ebffd74614cc1110dc032d16ee9502a7d571
-
- 28 Apr, 2022 2 commits
-
-
moto authored
Summary: libmad integration should be enabled only from source-build Pull Request resolved: https://github.com/pytorch/audio/pull/2354 Reviewed By: nateanl Differential Revision: D36012035 Pulled By: mthrok fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7
-
Andrey Talman authored
Summary: Fix audio win smoke test to use GPU hosts for CUDA builds Pull Request resolved: https://github.com/pytorch/audio/pull/2353 Reviewed By: mthrok Differential Revision: D36006928 Pulled By: atalman fbshipit-source-id: a27c4cc34093810c8cc08e01188e09b474478001
-
- 27 Apr, 2022 1 commit
-
-
Guo Liyong authored
Summary: This PR amends `RNNTBeamSearch`'s streaming decoding method to correctly unsqueeze `length` when its dimension is 0. Original comment: Is "input.dim() == 0" unreachable as it could only be 2 or 3 in assertion of Line 329? Pull Request resolved: https://github.com/pytorch/audio/pull/2344 Reviewed By: carolineechen, nateanl Differential Revision: D35899740 Pulled By: hwangjeff fbshipit-source-id: 84c1692b8cc9e5d35798d87f4a1bd052d94af9fb
-
- 26 Apr, 2022 5 commits
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
Zhaoheng Ni authored
Summary: In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)). This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function. Pull Request resolved: https://github.com/pytorch/audio/pull/2345 Reviewed By: carolineechen, xiaohui-zhang Differential Revision: D35845117 Pulled By: nateanl fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7
-
Bingcheng Hu authored
Summary: fix false shape Pull Request resolved: https://github.com/pytorch/audio/pull/2347 Reviewed By: carolineechen Differential Revision: D35921047 Pulled By: nateanl fbshipit-source-id: 5b58820ee777920c68f13a15d80cd2bcc931af87
-
Zhaoheng Ni authored
Summary: The `LibriMix` dataset is missing on the [documentation webpage](https://pytorch.org/audio/stable/datasets.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2351 Reviewed By: carolineechen Differential Revision: D35926695 Pulled By: nateanl fbshipit-source-id: 168aed3bb15510d1b1ec57d77727932e481aca48
-
Andrey Talman authored
Summary: Fix for torchaudio windows tests Following is an example of such test failing: https://app.circleci.com/pipelines/github/pytorch/audio/9408/workflows/e6e5a05c-7080-4fdc-b478-2182aed5f234/jobs/531612 The following code is failing: `conda install -v -y $(ls ~/workspace/torchaudio*.tar.bz2)` This is because the install package is generated in the following directory: `/workspace/conda-bld/win-64/` Pull Request resolved: https://github.com/pytorch/audio/pull/2350 Reviewed By: mthrok Differential Revision: D35912424 Pulled By: atalman fbshipit-source-id: fc4f66ffca24061cc768a5f1010b448f065b9410
-
- 25 Apr, 2022 1 commit
-
-
Andrey Talman authored
Summary: Fix python 3.10 smoke tests Pull Request resolved: https://github.com/pytorch/audio/pull/2348 Reviewed By: mthrok Differential Revision: D35906343 Pulled By: atalman fbshipit-source-id: 6dbb39e69c9751da4b86d5da38a6d11816d527c5
-
- 22 Apr, 2022 3 commits
-
-
Andrey Talman authored
Summary: Cuda 11.5 remove since we introduced cuda 11.6 Pull Request resolved: https://github.com/pytorch/audio/pull/2346 Reviewed By: mthrok Differential Revision: D35856758 Pulled By: atalman fbshipit-source-id: d3c0cf7639fd20f9ccc52c0738f247b8598f1ed7
-
Andrey Talman authored
Summary: Same change as done in this vision [PR](https://github.com/pytorch/vision/pull/5802) As Ubuntu-1604 runners will no longer be available in early May Update ubuntu-1604-cuda-10.1:201909-23 to ubuntu-2004-cuda-11.4:202110-01 Per [CircleCI Configuration reference](https://circleci.com/docs/2.0/configuration-reference/) Resolves https://github.com/pytorch/audio/issues/2279 Pull Request resolved: https://github.com/pytorch/audio/pull/2343 Reviewed By: mthrok Differential Revision: D35844880 Pulled By: atalman fbshipit-source-id: 318a9fa42455e55664f3da6ab67625cb969f72e6
-
Zhaoheng Ni authored
Summary: When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode. The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`. The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value. Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same. Pull Request resolved: https://github.com/pytorch/audio/pull/2299 Reviewed By: hwangjeff Differential Revision: D35781538 Pulled By: nateanl fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a
-
- 21 Apr, 2022 2 commits
-
-
Andrey Talman authored
Summary: CUDA 11.6 for TorchAudio Pull Request resolved: https://github.com/pytorch/audio/pull/2328 Reviewed By: mthrok Differential Revision: D35826414 Pulled By: atalman fbshipit-source-id: 0a471f0566286d69c0c73191aea7fd5ac0647e5f
-
hwangjeff authored
Summary: PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2339 Reviewed By: nateanl Differential Revision: D35806529 Pulled By: hwangjeff fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
-
- 19 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Introduces prototype of convolution-augmented Emformer layer. At a high level, it incorporates Conformer's macaron feedforward network structure and convolution module with Emformer. Pull Request resolved: https://github.com/pytorch/audio/pull/2324 Reviewed By: mthrok Differential Revision: D35734252 Pulled By: hwangjeff fbshipit-source-id: c7ea0bdcfe53a948b00881a74f1f1e1928f5ac57
-
- 18 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py) modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline Pull Request resolved: https://github.com/pytorch/audio/pull/2290 Reviewed By: nateanl Differential Revision: D35692551 Pulled By: carolineechen fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c
-
- 15 Apr, 2022 1 commit
-
-
Moto Hira authored
Summary: Disable clang-tidy's `modernize-use-trailing-return-type` suggestion. Trailing return type has no impact on performance. The lint warning shows up everywhere, and it's nothing but noise. Pull Request resolved: https://github.com/pytorch/audio/pull/2337 Reviewed By: hwangjeff Differential Revision: D35635718 Pulled By: mthrok fbshipit-source-id: beb2d3ec657f829493e08b2c159f215053b0e784
-
- 14 Apr, 2022 3 commits
-
-
moto authored
Summary: This commit adds support to specify decoder to Streamer's add stream method. This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options. This allows to override the decoder codec and/or specify the option of the decoder. This change allows to specify Nvidia NVDEC codec for supported formats, which uses dedicated hardware for decoding the video. --- Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.) Pull Request resolved: https://github.com/pytorch/audio/pull/2327 Reviewed By: carolineechen Differential Revision: D35626904 Pulled By: mthrok fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede
-
moto authored
Summary: Support NV12 format in Streamer API. NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values. https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21 The original UV plane is smaller than Y plane, so in this implmentation, UV plane is upsampled to match the size of Y plane. Pull Request resolved: https://github.com/pytorch/audio/pull/2330 Reviewed By: hwangjeff Differential Revision: D35632351 Pulled By: mthrok fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27
-
moto authored
Summary: This commit adds YUV420P format support to Streamer API. When the native format of a video is YUV420P, the Streamer will output Tensor of YUV color channel. Pull Request resolved: https://github.com/pytorch/audio/pull/2334 Reviewed By: hwangjeff Differential Revision: D35632916 Pulled By: mthrok fbshipit-source-id: a7a0078788433060266b8bd3e7cad023f41389f5
-
- 13 Apr, 2022 2 commits
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T LibriSpeech training recipe to examples directory. Produces 30M-parameter model that achieves the following WER: | | WER | |:-------------------:|-------------:| | test-clean | 0.0310 | | test-other | 0.0805 | | dev-clean | 0.0314 | | dev-other | 0.0827 | Pull Request resolved: https://github.com/pytorch/audio/pull/2329 Reviewed By: xiaohui-zhang Differential Revision: D35578727 Pulled By: hwangjeff fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d
-
hwangjeff authored
Summary: Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally. Pull Request resolved: https://github.com/pytorch/audio/pull/2325 Reviewed By: xiaohui-zhang Differential Revision: D35597753 Pulled By: hwangjeff fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c
-
- 12 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following: - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances. - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`. - Introduces tests for `conformer_rnnt_model`. - Adds docs. Pull Request resolved: https://github.com/pytorch/audio/pull/2322 Reviewed By: xiaohui-zhang Differential Revision: D35565987 Pulled By: hwangjeff fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
-
- 11 Apr, 2022 1 commit
-
-
moto authored
Summary: This commit makes the FFmpeg integration support FFmpeg 5.0 In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed, so that they deal with constant version of `AVInputFormat`. > 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h > Constified the pointers to AVInputFormats and AVOutputFormats > in AVFormatContext, avformat_alloc_output_context2(), > av_find_input_format(), av_probe_input_format(), > av_probe_input_format2(), av_probe_input_format3(), > av_probe_input_buffer2(), av_probe_input_buffer(), > avformat_open_input(), av_guess_format() and av_guess_codec(). > Furthermore, constified the AVProbeData in av_probe_input_format(), > av_probe_input_format2() and av_probe_input_format3(). https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260 Pull Request resolved: https://github.com/pytorch/audio/pull/2326 Reviewed By: carolineechen Differential Revision: D35551380 Pulled By: mthrok fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666
-
- 08 Apr, 2022 1 commit
-
-
moto authored
Summary: Add badges of supported properties and devices to functionals and transforms. This commit adds `.. devices::` and `.. properties::` directives to sphinx. APIs with these directives will have badges (based off of shields.io) which link to the page with description of these features. Continuation of https://github.com/pytorch/audio/issues/2316 Excluded dtypes for further improvement, and actually added badges to most of functional/transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2321 Reviewed By: hwangjeff Differential Revision: D35489063 Pulled By: mthrok fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
-
- 06 Apr, 2022 2 commits
-
-
Xiaohui Zhang authored
Summary: Add an option to use GroupNorm rather than BatchNorm1d, and another option to re-order Convolution/MHA modules in Conformer model. Pull Request resolved: https://github.com/pytorch/audio/pull/2320 Reviewed By: hwangjeff Differential Revision: D35422112 Pulled By: xiaohui-zhang fbshipit-source-id: 360a8aaa37b883b0f656da2e4f654e86688ac270
-
Xiaohui Zhang authored
Summary: Add an option to use Tanh instead of ReLU in RNNT joiner, which enables better training performance sometimes. --- Pull Request resolved: https://github.com/pytorch/audio/pull/2319 Reviewed By: hwangjeff Differential Revision: D35422122 Pulled By: xiaohui-zhang fbshipit-source-id: c6a0f8b25936e47081110af046b57d0e8751f9a2
-
- 05 Apr, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue. Pull Request resolved: https://github.com/pytorch/audio/pull/2311 Reviewed By: mthrok Differential Revision: D35393813 Pulled By: nateanl fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11
-
Caroline Chen authored
Summary: Resolves https://github.com/pytorch/audio/issues/2294 Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type. Pull Request resolved: https://github.com/pytorch/audio/pull/2318 Reviewed By: mthrok Differential Revision: D35379276 Pulled By: carolineechen fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0
-
- 04 Apr, 2022 2 commits
-
-
Caroline Chen authored
Summary: update example ASR pipeline to use the recently added pretrained LM API for decoding Pull Request resolved: https://github.com/pytorch/audio/pull/2317 Reviewed By: mthrok Differential Revision: D35361354 Pulled By: carolineechen fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46
-
Zhaoheng Ni authored
Summary: Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser. Pull Request resolved: https://github.com/pytorch/audio/pull/2315 Reviewed By: carolineechen Differential Revision: D35357678 Pulled By: nateanl fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92
-
- 01 Apr, 2022 5 commits
-
-
Zhaoheng Ni authored
Summary: When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default. Pull Request resolved: https://github.com/pytorch/audio/pull/2310 Reviewed By: mthrok Differential Revision: D35316903 Pulled By: nateanl fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed
-
moto authored
Summary: This commit 1. Updates the config.guess and config.sub files and 2. applies them to all the third party libraries that use them. This resolves the following build failure on M1 mac with newer SDK. On MacBookPro with M1 chip, with the recent OS update, something about the development environment has been changed (probably newer version of XCode) and the build stopeed working with the following errors from third party dependencies. ``` checking build system type... Invalid configuration ‘arm64-apple-darwin20.0.0': machine ‘arm64-apple' not recognized ``` note: config files are taken from https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html Pull Request resolved: https://github.com/pytorch/audio/pull/2307 Reviewed By: nateanl Differential Revision: D35318273 Pulled By: mthrok fbshipit-source-id: 746ac51dd1816767aa78b88445f76a29acfd29e8
-
moto authored
Summary: Change the cmake logic to search CONDA_PREFIX before falling back to the other default paths and system paths. 1. FFMPEG_ROOT 2. CONDA_PREFIX 3. Other locations (Package managers and system paths) For users with regular conda installation, ffmpeg from conda should be picked automatically. If anyone wants to specify the ffmpeg, then can set FFMPEG_ROOT variable to the location of desired installation. Pull Request resolved: https://github.com/pytorch/audio/pull/2312 Reviewed By: hwangjeff Differential Revision: D35317383 Pulled By: mthrok fbshipit-source-id: 52aef8f3f7f0f8f1eaf7a89a2d1ccfb6265e2c50
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2309 For upcoming improved Kaldi features which are comprised of multiple classes / functions, put all the transforms implementations in dedicated directory. Reviewed By: nateanl Differential Revision: D35303682 fbshipit-source-id: 5bc8c07ef639683008c0f76ffe56e3941f772659
-
moto authored
Summary: The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows. https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272 ``` > self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol) E AssertionError: Tensor-likes are not close! E E Mismatched elements: 28 / 196608 (0.0%) E Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed) E Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed) ``` The value of atol==1e-08 seems very strict but all the other batch consistency tests are passing. The violation is for very small number of samples, which looks suspicious, but I think it is okay to reduce it to `1e-06` for Windows. `1e-06` is still more strict than the majority of the comparison tests we have. Pull Request resolved: https://github.com/pytorch/audio/pull/2305 Reviewed By: hwangjeff Differential Revision: D35298056 Pulled By: mthrok fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
-