- 15 May, 2023 1 commit
-
-
atalman authored
Summary: Switch windows nightly builds to GHA Similar to: https://github.com/pytorch/vision/pull/7578 Pull Request resolved: https://github.com/pytorch/audio/pull/3330 Reviewed By: mthrok Differential Revision: D45871892 Pulled By: atalman fbshipit-source-id: 817490a2abcaffceec5174c624f9e7d0377bbc4a
-
- 11 May, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3328 Make the `AVIOContext`-based constructor protected for better encapsulation. AVFormatContext and optional AVIOContext are managed by StreamReader/Writer, so it's better that they are abstracted away from client code. Reviewed By: hwangjeff Differential Revision: D45779629 fbshipit-source-id: 44c31e8af785447cb47aad0c44bf4ecf1aeebeaa
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3326 Reviewed By: hwangjeff Differential Revision: D45760678 Pulled By: mthrok fbshipit-source-id: 79b5d846c93516ca90c9700279124a9a04470242
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3325 Reviewed By: hwangjeff Differential Revision: D45759434 Pulled By: mthrok fbshipit-source-id: f3b1127fcf3b23beeab61fb7ff18f1b89b11ddc6
-
- 10 May, 2023 4 commits
-
-
moto authored
Summary: This commit makes the code defaults to the backend dispatcher by default. Enabling backend dispatcher puts the FFmpeg-based I/O implementation on higher priority (if the corresponding FFmpeg is available), and allows individual function call to specify the backend. See also https://github.com/pytorch/audio/issues/2950 Pull Request resolved: https://github.com/pytorch/audio/pull/3241 Reviewed By: hwangjeff Differential Revision: D44709068 Pulled By: mthrok fbshipit-source-id: 43aac3433f78a681df6669e9ac46e8ecf3beb1be
-
moto authored
Summary: https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/3226 Reviewed By: nateanl Differential Revision: D45402724 Pulled By: mthrok fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262
-
moto authored
Summary: This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241 Making FFmpeg backend default causes some issues on tutorials, so this commit disable it. The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change. Since it is necessary to mention the changes related to migration in the IO tutorial, I also update the IO documentation to include migration work so that it's easy to redirect. Pull Request resolved: https://github.com/pytorch/audio/pull/3285 Reviewed By: nateanl Differential Revision: D45671237 Pulled By: mthrok fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2643 - replace `SGD` optimization with `torch.linalg.lstsq` which is much faster. - Add autograd test for `InverseMelScale` - update other tests Pull Request resolved: https://github.com/pytorch/audio/pull/3280 Reviewed By: hwangjeff Differential Revision: D45679988 Pulled By: nateanl fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59
-
- 09 May, 2023 6 commits
-
-
moto authored
Summary: NumPy is an optional runtime dependency of TorchAudio, and it is not required at build time. Pull Request resolved: https://github.com/pytorch/audio/pull/3315 Reviewed By: nateanl Differential Revision: D45702243 Pulled By: mthrok fbshipit-source-id: 6ca6598931764c46be6323868e8cce7c8adc5024
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3296 Reviewed By: hwangjeff Differential Revision: D45503774 fbshipit-source-id: 806c22bd0f54fd0cea43d61ef3dbedd67ffeb012
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3320 Add StreamReaderCustomIO, which is analogous to StreamWriterCustomIO and which takes custom read/seek functions to fetch media data. Reviewed By: hwangjeff Differential Revision: D45482843 fbshipit-source-id: 3ccf771c0fdce153aaa7551053e9a77facedc983
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3319 * Merge the source with StreamWriter * Add docstrings * Move CustomIO to detail::CustomOutput to prepare for adding CustomInput Reviewed By: hwangjeff Differential Revision: D45481807 fbshipit-source-id: 4a9ac8a57acda47b126f8ae18e607b72919f9988
-
Zhaoheng Ni authored
Summary: The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`. Pull Request resolved: https://github.com/pytorch/audio/pull/3322 Reviewed By: mthrok Differential Revision: D45691769 Pulled By: nateanl fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045
-
Nikita Shulga authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3321 Reviewed By: atalman, mthrok Differential Revision: D45673225 Pulled By: malfet fbshipit-source-id: f2b915f3307ba95445702e3018254ad254fe2bb3
-
- 05 May, 2023 6 commits
-
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3314 Reviewed By: nateanl Differential Revision: D45621958 Pulled By: xiaohui-zhang fbshipit-source-id: 17555a865790adadc2abd40a86571596386a12fc
-
Zhaoheng Ni authored
Summary: Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths. Pull Request resolved: https://github.com/pytorch/audio/pull/3313 Reviewed By: hwangjeff Differential Revision: D45620311 Pulled By: nateanl fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
-
Xiaohui Zhang authored
Summary: (2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this. To solve these issues, here we [done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. [done in this PR] Introducing SpecAugment transform. Pull Request resolved: https://github.com/pytorch/audio/pull/3309 Reviewed By: nateanl Differential Revision: D45592926 Pulled By: xiaohui-zhang fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2
-
huyao authored
Summary: Fix **Failed to write packet (Invalid argument)** error when encoding FLV video streams using NVIDIA hardware encoders. Resolve https://github.com/pytorch/audio/issues/3311 Pull Request resolved: https://github.com/pytorch/audio/pull/3312 Reviewed By: nateanl Differential Revision: D45611656 Pulled By: mthrok fbshipit-source-id: 531a83a27d3b19ed9e9aedd161769c60aa0bd175
-
moto authored
Summary: Fixes the regression caused by build_doc job GHA migration. The version number is not properly set. Pull Request resolved: https://github.com/pytorch/audio/pull/3310 Reviewed By: nateanl Differential Revision: D45607829 Pulled By: mthrok fbshipit-source-id: 3450a38fa6982fcc56676a80144e9eed1aad02ec
-
moto authored
Summary: * Remove MKL and NumPy from Conda build env * Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac. TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase. However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them. Also we don't need NumPy on build/run time, so that is removed as well. Pull Request resolved: https://github.com/pytorch/audio/pull/3307 Reviewed By: atalman Differential Revision: D45606944 Pulled By: mthrok fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd
-
- 04 May, 2023 3 commits
-
-
atalman authored
Summary: Similar to what we used to have here: https://github.com/pytorch/test-infra/pull/3896/files Pull Request resolved: https://github.com/pytorch/audio/pull/3302 Reviewed By: nateanl Differential Revision: D45574845 Pulled By: atalman fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb
-
atalman authored
Summary: Add mkl dependency to torchaudio MacOS x86 builds Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137 Pull Request resolved: https://github.com/pytorch/audio/pull/3300 Reviewed By: jeanschmidt, mthrok Differential Revision: D45566352 Pulled By: atalman fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b
-
Xiaohui Zhang authored
Summary: (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. To solve these issues, here we - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. The introduction of SpecAugment transform will be done in another PR. Pull Request resolved: https://github.com/pytorch/audio/pull/3289 Reviewed By: hwangjeff Differential Revision: D45460357 Pulled By: xiaohui-zhang fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
-
- 03 May, 2023 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3299 Reviewed By: xiaohui-zhang Differential Revision: D45530945 Pulled By: mthrok fbshipit-source-id: 3443e4de693898534687b26ee1a9376ff86651f9
-
generatedunixname89002005367269 authored
Reviewed By: adamjernst Differential Revision: D45522319 fbshipit-source-id: d73a137c8738a215cc711ad39461f5b2f9ba76da
-
moto authored
Summary: https://github.com/pytorch/audio/pull/3292 migrates the doc deployment to GHA. Pull Request resolved: https://github.com/pytorch/audio/pull/3293 Reviewed By: xiaohui-zhang Differential Revision: D45527256 Pulled By: mthrok fbshipit-source-id: 18eb2580243b6b842147caaac10b3d28aa3d6dd0
-
moto authored
Summary: Follow-up of https://github.com/pytorch/audio/issues/3292 Doc deployment is gated by branch_name == nightly, but nightly branch fires push and PR events and there will be two deployment jobs. This commit specify push event. Pull Request resolved: https://github.com/pytorch/audio/pull/3294 Reviewed By: hwangjeff Differential Revision: D45501983 Pulled By: mthrok fbshipit-source-id: 8eb66b463800f6a30affafb27f5f2448a561cfe1
-
- 02 May, 2023 3 commits
-
-
atalman authored
Summary: [Nova] Add windows conda workflows Same as: https://github.com/pytorch/vision/pull/7547 Pull Request resolved: https://github.com/pytorch/audio/pull/3288 Reviewed By: osalpekar Differential Revision: D45456203 Pulled By: atalman fbshipit-source-id: 067fd3b9abaeb9b7b0cd45c05b7c72982dfbfe0f
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3292 Reviewed By: nateanl Differential Revision: D45492729 Pulled By: mthrok fbshipit-source-id: 11578166854c01deb50a6011550a91b87b426385
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3162 Reviewed By: mthrok Differential Revision: D43964995 Pulled By: xiaohui-zhang fbshipit-source-id: bba8fffe25f2f39f558f080fef319b1df4c6e440
-
- 01 May, 2023 2 commits
-
-
atalman authored
Summary: Adding win wheels builds Same as : https://github.com/pytorch/vision/pull/7540 Pull Request resolved: https://github.com/pytorch/audio/pull/3287 Reviewed By: osalpekar Differential Revision: D45452770 Pulled By: atalman fbshipit-source-id: e70ad3a8f456e805b46da3d1752c42208dadb8da
-
pbialecki authored
Summary: CC atalman malfet Pull Request resolved: https://github.com/pytorch/audio/pull/3284 Reviewed By: mthrok Differential Revision: D45444670 Pulled By: atalman fbshipit-source-id: d0cf8696a99000c2b9a7e41ceeb781f5a54daeda
-
- 29 Apr, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS). Pull Request resolved: https://github.com/pytorch/audio/pull/3279 Reviewed By: hwangjeff Differential Revision: D45415404 Pulled By: nateanl fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 25 Apr, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3277 Adds `StreamWriterCustomIO` to support encoding and writing media to arbitrary destinations. Reviewed By: mthrok Differential Revision: D44904807 fbshipit-source-id: 23a47531973a7dce0638feb825d38c81d46dc02f
-
- 19 Apr, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: The `master` branch of PyTorch has been updated to `main` recently. The url of `collect_env.py` in the new issue page should be updated as well. Pull Request resolved: https://github.com/pytorch/audio/pull/3271 Reviewed By: xiaohui-zhang Differential Revision: D45087038 Pulled By: nateanl fbshipit-source-id: 167262ae6ed179baabcf55064fc5f0f0ac3b0be9
-
hwangjeff authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3272 Reviewed By: mthrok Differential Revision: D45095440 Pulled By: hwangjeff fbshipit-source-id: 135eb0f5d9047bf172563a9a05a9d2e323796d4d
-
- 18 Apr, 2023 1 commit
-
-
nateanl authored
Summary: The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement. Pull Request resolved: https://github.com/pytorch/audio/pull/3036 Reviewed By: hwangjeff Differential Revision: D45061841 Pulled By: nateanl fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4
-
- 12 Apr, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: When `key_padding_mask` is not `None`, it needs to be combined with `attn_mask_rel_pos` as one mask for `scaled_dot_product_attention` function. Pull Request resolved: https://github.com/pytorch/audio/pull/3265 Reviewed By: hwangjeff Differential Revision: D44901093 Pulled By: nateanl fbshipit-source-id: 73ca7af48faf7f4eb36b35b603187a11e5582c70
-
moto authored
Summary: When `TORCHAUDIO_TEST_TEMP_DIR` is set, all the unit test temporary data are stored in the given directory. Running unit tests multiple times reuses the directory and the temporary files from the previous test runs are found there. FFmpeg save test writes reference data to the temporary directory, but it is not given the overwrite flag ("-y"), so it fails in such cases. This commit fixes that. Pull Request resolved: https://github.com/pytorch/audio/pull/3263 Reviewed By: hwangjeff Differential Revision: D44859003 Pulled By: mthrok fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b
-