- 05 May, 2023 1 commit
-
-
moto authored
Summary: * Remove MKL and NumPy from Conda build env * Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac. TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase. However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them. Also we don't need NumPy on build/run time, so that is removed as well. Pull Request resolved: https://github.com/pytorch/audio/pull/3307 Reviewed By: atalman Differential Revision: D45606944 Pulled By: mthrok fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd
-
- 04 May, 2023 3 commits
-
-
atalman authored
Summary: Similar to what we used to have here: https://github.com/pytorch/test-infra/pull/3896/files Pull Request resolved: https://github.com/pytorch/audio/pull/3302 Reviewed By: nateanl Differential Revision: D45574845 Pulled By: atalman fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb
-
atalman authored
Summary: Add mkl dependency to torchaudio MacOS x86 builds Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137 Pull Request resolved: https://github.com/pytorch/audio/pull/3300 Reviewed By: jeanschmidt, mthrok Differential Revision: D45566352 Pulled By: atalman fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b
-
Xiaohui Zhang authored
Summary: (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. To solve these issues, here we - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. The introduction of SpecAugment transform will be done in another PR. Pull Request resolved: https://github.com/pytorch/audio/pull/3289 Reviewed By: hwangjeff Differential Revision: D45460357 Pulled By: xiaohui-zhang fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
-
- 03 May, 2023 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3299 Reviewed By: xiaohui-zhang Differential Revision: D45530945 Pulled By: mthrok fbshipit-source-id: 3443e4de693898534687b26ee1a9376ff86651f9
-
generatedunixname89002005367269 authored
Reviewed By: adamjernst Differential Revision: D45522319 fbshipit-source-id: d73a137c8738a215cc711ad39461f5b2f9ba76da
-
moto authored
Summary: https://github.com/pytorch/audio/pull/3292 migrates the doc deployment to GHA. Pull Request resolved: https://github.com/pytorch/audio/pull/3293 Reviewed By: xiaohui-zhang Differential Revision: D45527256 Pulled By: mthrok fbshipit-source-id: 18eb2580243b6b842147caaac10b3d28aa3d6dd0
-
moto authored
Summary: Follow-up of https://github.com/pytorch/audio/issues/3292 Doc deployment is gated by branch_name == nightly, but nightly branch fires push and PR events and there will be two deployment jobs. This commit specify push event. Pull Request resolved: https://github.com/pytorch/audio/pull/3294 Reviewed By: hwangjeff Differential Revision: D45501983 Pulled By: mthrok fbshipit-source-id: 8eb66b463800f6a30affafb27f5f2448a561cfe1
-
- 02 May, 2023 3 commits
-
-
atalman authored
Summary: [Nova] Add windows conda workflows Same as: https://github.com/pytorch/vision/pull/7547 Pull Request resolved: https://github.com/pytorch/audio/pull/3288 Reviewed By: osalpekar Differential Revision: D45456203 Pulled By: atalman fbshipit-source-id: 067fd3b9abaeb9b7b0cd45c05b7c72982dfbfe0f
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3292 Reviewed By: nateanl Differential Revision: D45492729 Pulled By: mthrok fbshipit-source-id: 11578166854c01deb50a6011550a91b87b426385
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3162 Reviewed By: mthrok Differential Revision: D43964995 Pulled By: xiaohui-zhang fbshipit-source-id: bba8fffe25f2f39f558f080fef319b1df4c6e440
-
- 01 May, 2023 2 commits
-
-
atalman authored
Summary: Adding win wheels builds Same as : https://github.com/pytorch/vision/pull/7540 Pull Request resolved: https://github.com/pytorch/audio/pull/3287 Reviewed By: osalpekar Differential Revision: D45452770 Pulled By: atalman fbshipit-source-id: e70ad3a8f456e805b46da3d1752c42208dadb8da
-
pbialecki authored
Summary: CC atalman malfet Pull Request resolved: https://github.com/pytorch/audio/pull/3284 Reviewed By: mthrok Differential Revision: D45444670 Pulled By: atalman fbshipit-source-id: d0cf8696a99000c2b9a7e41ceeb781f5a54daeda
-
- 29 Apr, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS). Pull Request resolved: https://github.com/pytorch/audio/pull/3279 Reviewed By: hwangjeff Differential Revision: D45415404 Pulled By: nateanl fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 25 Apr, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3277 Adds `StreamWriterCustomIO` to support encoding and writing media to arbitrary destinations. Reviewed By: mthrok Differential Revision: D44904807 fbshipit-source-id: 23a47531973a7dce0638feb825d38c81d46dc02f
-
- 19 Apr, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: The `master` branch of PyTorch has been updated to `main` recently. The url of `collect_env.py` in the new issue page should be updated as well. Pull Request resolved: https://github.com/pytorch/audio/pull/3271 Reviewed By: xiaohui-zhang Differential Revision: D45087038 Pulled By: nateanl fbshipit-source-id: 167262ae6ed179baabcf55064fc5f0f0ac3b0be9
-
hwangjeff authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3272 Reviewed By: mthrok Differential Revision: D45095440 Pulled By: hwangjeff fbshipit-source-id: 135eb0f5d9047bf172563a9a05a9d2e323796d4d
-
- 18 Apr, 2023 1 commit
-
-
nateanl authored
Summary: The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement. Pull Request resolved: https://github.com/pytorch/audio/pull/3036 Reviewed By: hwangjeff Differential Revision: D45061841 Pulled By: nateanl fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4
-
- 12 Apr, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: When `key_padding_mask` is not `None`, it needs to be combined with `attn_mask_rel_pos` as one mask for `scaled_dot_product_attention` function. Pull Request resolved: https://github.com/pytorch/audio/pull/3265 Reviewed By: hwangjeff Differential Revision: D44901093 Pulled By: nateanl fbshipit-source-id: 73ca7af48faf7f4eb36b35b603187a11e5582c70
-
moto authored
Summary: When `TORCHAUDIO_TEST_TEMP_DIR` is set, all the unit test temporary data are stored in the given directory. Running unit tests multiple times reuses the directory and the temporary files from the previous test runs are found there. FFmpeg save test writes reference data to the temporary directory, but it is not given the overwrite flag ("-y"), so it fails in such cases. This commit fixes that. Pull Request resolved: https://github.com/pytorch/audio/pull/3263 Reviewed By: hwangjeff Differential Revision: D44859003 Pulled By: mthrok fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b -
moto authored
Summary: Preparation to land https://github.com/pytorch/audio/pull/3241 This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled. Pull Request resolved: https://github.com/pytorch/audio/pull/3262 Reviewed By: hwangjeff Differential Revision: D44897513 Pulled By: mthrok fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32
-
- 11 Apr, 2023 2 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3258 Reviewed By: nateanl Differential Revision: D44859397 Pulled By: mthrok fbshipit-source-id: 361ac6a8c7092cc753f77d7745ec178760e8b9c3
-
moto authored
Summary: GCC should not be used when building FFmpeg for torchaudio, as torchaudio uses MSVC (cl.exe) Pull Request resolved: https://github.com/pytorch/audio/pull/3257 Reviewed By: nateanl Differential Revision: D44835169 Pulled By: mthrok fbshipit-source-id: 038c70caae58cec47dd2d6d08b8244c193104eda
-
- 10 Apr, 2023 4 commits
-
-
Zhaoheng Ni authored
Summary: Fix https://github.com/pytorch/audio/issues/3219. `torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`. Pull Request resolved: https://github.com/pytorch/audio/pull/3252 Reviewed By: mthrok Differential Revision: D44798634 Pulled By: nateanl fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665
-
Zhaoheng Ni authored
Summary: Replace the attention computation with `torch.nn.functional.scaled_dot_product_attention` to improve running efficiency. Pull Request resolved: https://github.com/pytorch/audio/pull/3253 Reviewed By: mthrok Differential Revision: D44800353 Pulled By: nateanl fbshipit-source-id: 41550d868c809099aadbe812b0ebe2c38121efb8
-
Zhaoheng Ni authored
Summary: - Add citations of [`TorchAudio-Squim`](https://arxiv.org/abs/2304.01448) publication. - Update descriptions in the `SQUIM_OBJECTIVE` and `SQUIM_SUBJECTIVE` pipelines. Pull Request resolved: https://github.com/pytorch/audio/pull/3254 Reviewed By: hwangjeff Differential Revision: D44802015 Pulled By: nateanl fbshipit-source-id: ca08298ec1eafefdd671ff2e010ef18f7372f9f8
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3255 Prefixing what is always pointer with `p` does not improve readability... Reviewed By: hwangjeff Differential Revision: D44799531 fbshipit-source-id: bc2ce4e534009e2cb577719953207ddb82cf2d3d
-
- 07 Apr, 2023 5 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3251 Removes unnecessary media type check in FilterGraph. Allows to define filters that have different media type for input and output. Reviewed By: nateanl Differential Revision: D44792340 fbshipit-source-id: e00497e0d30b5b3c3aacc66dd9b8c401757af288
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3249 - Put ptr member private so that it's more secure and subclasses won't mess with it - Remove unused `reset` method - Do not default construct the managed object - Introduce helper function for default allocation. (for AVFrame and AVPacket as they are allocated in both reader and writer) - for others, allocation logics are moved to where it is used. - Remove unused `pHWBufferRef` attribute from `StreamWriter`. Reviewed By: hwangjeff Differential Revision: D44775297 fbshipit-source-id: ff6db528152cd54c1ae398191110c30b9c1e238c
-
atalman authored
Summary: Remove temp channel for python 3.11, simplify logic around cuda Pull Request resolved: https://github.com/pytorch/audio/pull/3250 Reviewed By: mthrok Differential Revision: D44788219 Pulled By: atalman fbshipit-source-id: 421ff9e0bf1818b41e395708cc4589d4a9c865bd
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3220 Introduces methods to `StreamReader` and `StreamWriter` that allow for reading and writing `AVPacket` instances rather than tensors. Useful for efficiently remuxing data pulled as is from source. Reviewed By: mthrok Differential Revision: D44271536 fbshipit-source-id: 9b9d743c0119a5eb564fa628fd6a67806d120985
-
moto authored
Summary: Follow up of https://github.com/pytorch/audio/issues/3243. Save compat module had different semantics than info and load, which requires different way of performing path normalization. Pull Request resolved: https://github.com/pytorch/audio/pull/3248 Reviewed By: hwangjeff Differential Revision: D44774997 Pulled By: mthrok fbshipit-source-id: 4b967ae3ca6b45850d455b8e95aaa31762c5457e
-
- 06 Apr, 2023 2 commits
-
-
moto authored
Summary: In https://github.com/pytorch/audio/pull/3232, the CTC decoder is excluded from binary distribution. To use CTCDecoder, users need to install flashlight-text. Currently, if flashlight-text is not available, torchaudio still attempts to import the custom bundle. This commit clean up this behavior by delaying the error until one of the components is actually used, and providing a better message. Pull Request resolved: https://github.com/pytorch/audio/pull/3246 Test Plan: Binary smoke tests import torchaudio without installing flashlight. Unit test CI jobs run the CTC decoder with flashlight installed. Reviewed By: jacobkahn Differential Revision: D44748413 Pulled By: mthrok fbshipit-source-id: 21d2cbd9961ed88405a739cc682071066712f5e4
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3244 Adds methods to `StreamWriter` that allow for passing in `AVFrame` instances rather than tensors. Reviewed By: mthrok Differential Revision: D44589256 fbshipit-source-id: f100e0d349708482b873a9a4bae1eaf5eb65301a
-
- 05 Apr, 2023 2 commits
-
-
moto authored
Summary: In dispatcher mode, FFmpeg backend does not handle file-like object, and C++ implementation raises an issue. This commit fixes it by normalizing file-like object to string. Pull Request resolved: https://github.com/pytorch/audio/pull/3243 Reviewed By: nateanl Differential Revision: D44719280 Pulled By: mthrok fbshipit-source-id: 9dae459e2a5fb4992b4ef53fe4829fe8c35b2edd
-
moto authored
Summary: Following https://github.com/pytorch/audio/pull/3232, static build of flashlight-text has been disabled and removed from nightly build. This commit removes the related source/build from torchaudio code base. Pull Request resolved: https://github.com/pytorch/audio/pull/3236 Reviewed By: jacobkahn Differential Revision: D44712539 Pulled By: mthrok fbshipit-source-id: a201c89b5046f224526309cd4e17a5105e58a949
-
- 04 Apr, 2023 3 commits
-
-
moto authored
Summary: Recently, we added bunch of options to make StreamReader/Writer flexible. As a result, their methods have many number of arguments, and some of them have semantic grouping. For example, the arguments of ``StreamWriter.add_video_stream`` are roughly grouped as follow; - Information about input media format `frame_rate`, `width`, `height`, `format` - Information about encoder `encoder`, `encoder_option` - Information about codec configuration `codec_config` - Information about encode media format `encoder_format`, `encoder_frame_rate`, `encoder_width`, `encoder_height` - Information about additional processing `filter_desc` - Hardware acceleration `hw_accel` We do not know what arguments will be added in the future, but when we do, we want to keep them roughly grouped, by inserting the new argument somewhere in a middle without breaking backward compatibility. This commit puts most of them in keyword-only argument, so that we can rearrange them without breaking backward compatibility. Pull Request resolved: https://github.com/pytorch/audio/pull/3227 Reviewed By: hwangjeff Differential Revision: D44681620 Pulled By: mthrok fbshipit-source-id: b55f6168f4c2f3d0f59731b9bb0db4ae54e5a90f
-
moto authored
Summary: As we migrate to use upstream flashlight-text and KenLM, this PR disable building CTC decoder by default. This will stop shipping flashlight-text and KenLM bundle in torchaudio binary. Ref: https://github.com/pytorch/audio/issues/3088 cc jacobkahn Pull Request resolved: https://github.com/pytorch/audio/pull/3232 Reviewed By: hwangjeff Differential Revision: D44650872 Pulled By: mthrok fbshipit-source-id: 2415623abaf3cafa181135db5112d3c711137cd7
-
hwangjeff authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3235 Reviewed By: mthrok Differential Revision: D44653654 Pulled By: hwangjeff fbshipit-source-id: f28a6068e826581d76ed4a216adb6019b6486e53
-