- 19 Sep, 2023 1 commit
-
-
moto authored
Some changes at matplotlib 3.8.0 rejects torch.Tensor passed to `plot` function.
-
- 13 Sep, 2023 1 commit
-
-
moto authored
-
- 08 Sep, 2023 1 commit
-
-
Pingchuan Ma authored
* Simplify trainining step in av-asr recipe * Run pre-commit
-
- 04 Sep, 2023 2 commits
-
-
hwangjeff authored
Summary: Fixes decoder calls and related code in Device ASR/AVSR tutorials to account for changes to RNN-T decoder introduced in https://github.com/pytorch/audio/issues/3295. Pull Request resolved: https://github.com/pytorch/audio/pull/3572 Reviewed By: mthrok Differential Revision: D48629428 Pulled By: hwangjeff fbshipit-source-id: 63ede307fb4412aa28f88972d56dca8405607b7a
-
moto authored
Summary: Add incremental decoding support to CTC decoder. Resolves https://github.com/pytorch/audio/issues/3574 Pull Request resolved: https://github.com/pytorch/audio/pull/3594 Reviewed By: nateanl Differential Revision: D48940584 Pulled By: mthrok fbshipit-source-id: 31871614008cf197cf3900f7183ec6cff34d2905
-
- 21 Aug, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3569 Reviewed By: huangruizhe Differential Revision: D48508244 Pulled By: mthrok fbshipit-source-id: 6e14267e2dbdf08ea3c25a1dab480cb0e908e0c3
-
- 20 Aug, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3566 Reviewed By: huangruizhe Differential Revision: D48499338 Pulled By: mthrok fbshipit-source-id: 7f837e1a1f8116d7d82411607c91628b729077d8
-
- 15 Aug, 2023 1 commit
-
-
moto authored
Summary: In https://github.com/pytorch/audio/pull/3460, we switched the build process for FFmpeg extension. Since it is complicated to install FFmpeg in some environments, at build time, pre-built binaries and its headers are downloaded and used as a scaffolding for torchaudio build. Now even though we did not change any code or FFmpeg version, it turned out that this causes segmentation fault on Ubuntu when using system Python and FFmpeg 4.4 installed via aptitude. While investigating the issue, I swapped the said pre-built FFmpeg scaffolding with FFmpeg 4.4 from aptitude, and the segmentation fault did not happen. This indicates that it is binary compatibility issue. Before https://github.com/pytorch/audio/issues/3460, each binary build job was building FFmpeg 4.1.8 using the same compiler used to build torchaudio, but after https://github.com/pytorch/audio/issues/3460 the environments to build FFmpeg 4.1.8 and torchaudio are different. My hypothesis is that this difference is causing some ABI incompatibility when linking against FFmpeg 4.4. (Also, I don't remember well, but I read somewhere that 4.4 has a different ABI) Through experiments, it turned out upgrading the pre-built FFmpeg scaffolding to 4.4 resolves this. So this commit upgrade the pre-built FFmpeg 4 to 4.4. The potential (yet unconfirmed) downside is that torchaudio will no longer work with 4.1, 4.2, and 4.3. Since FFmpeg 4.4 is what Ubuntu 20.04 and 22.04 support by default, and Google Colab is also on 20.04, I think it is more important to support 4.4. Therefore we drop the support for 4.1-4.3 from normal build (and official distributions). Those who wish to use 4.1-4.3 can build torchaudio from source by linking to specific FFmpeg. Pull Request resolved: https://github.com/pytorch/audio/pull/3561 Reviewed By: hwangjeff Differential Revision: D48340201 Pulled By: mthrok fbshipit-source-id: 7ece82910f290c7cf83f58311c4cf6a384e8795e
-
- 10 Aug, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3546 Reviewed By: huangruizhe Differential Revision: D48219274 Pulled By: mthrok fbshipit-source-id: 6881f039bf70cf7240fbcfeb48443471ef457bd4
-
- 08 Aug, 2023 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3542 Reviewed By: huangruizhe Differential Revision: D48166025 Pulled By: mthrok fbshipit-source-id: 29fee7dbf08394993972ec2967f94ce9fcb1c853
-
Pingchuan Ma authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3532 Reviewed By: mthrok Differential Revision: D48165499 Pulled By: mpc001 fbshipit-source-id: c87b3361f0e6282684f218b32888df883d56682b
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3534 Reviewed By: huangruizhe Differential Revision: D48155817 Pulled By: mthrok fbshipit-source-id: a3d45fdfd360f9668063a3ecb3b00364290134c9
-
Ruizhe (Ray) Huang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3336 Reviewed By: mthrok Differential Revision: D47846814 Pulled By: huangruizhe fbshipit-source-id: dc12362bf243c52222dccadec3176e25e43dd652
-
- 04 Aug, 2023 1 commit
-
-
moto authored
Summary: - Simplify the step to generate token-level alignment Pull Request resolved: https://github.com/pytorch/audio/pull/3529 Reviewed By: huangruizhe Differential Revision: D48066787 Pulled By: mthrok fbshipit-source-id: 452c243d278e508926a59894928e280fea76dcc6
-
- 01 Aug, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: Add a separate tutorial for cuctc. Reslove https://github.com/pytorch/audio/issues/3096 Pull Request resolved: https://github.com/pytorch/audio/pull/3297 Reviewed By: huangruizhe Differential Revision: D47928400 Pulled By: mthrok fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec
-
- 31 Jul, 2023 2 commits
-
-
moto authored
Summary: torch.norm is now deprecated. The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm Resolves https://github.com/pytorch/audio/issues/3484 Pull Request resolved: https://github.com/pytorch/audio/pull/3522 Reviewed By: huangruizhe Differential Revision: D47926659 Pulled By: mthrok fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084
-
moto authored
Summary: - Set global matplotlib rc params - Fix style check - Fix and updates FA tutorial plots - Add av-asr index cars Pull Request resolved: https://github.com/pytorch/audio/pull/3515 Reviewed By: huangruizhe Differential Revision: D47894156 Pulled By: mthrok fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e
-
- 29 Jul, 2023 1 commit
-
-
moto authored
Summary: The I/O functions in _compat module was introduced there so that everything related to FFmpeg is in torchaudio.io and FFmpeg library initialization can be carried out in `torchaudio.io.__init__`. Now that this constraint is removed, (all the initialization happens at `torchaudio._extension.__init__`) and `_compat` is only used by FFmpeg dispatcher backend, we move the module to `torchaudio._backend` for better locality. Pull Request resolved: https://github.com/pytorch/audio/pull/3518 Reviewed By: huangruizhe Differential Revision: D47877412 Pulled By: mthrok fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f
-
- 28 Jul, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release. Pull Request resolved: https://github.com/pytorch/audio/pull/3512 Reviewed By: mthrok Differential Revision: D47837434 Pulled By: nateanl fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8
-
Pingchuan Ma authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511 Reviewed By: mthrok Differential Revision: D47852108 Pulled By: mpc001 fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa
-
- 26 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR moves video loading outside detector during pre-processing. Pull Request resolved: https://github.com/pytorch/audio/pull/3498 Reviewed By: mthrok Differential Revision: D47811044 Pulled By: mpc001 fbshipit-source-id: f17839b695b13d3cf2d9db343d7e9a0202eea7d5
-
- 25 Jul, 2023 2 commits
-
-
Pingchuan Ma authored
Summary: This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe. Pull Request resolved: https://github.com/pytorch/audio/pull/3493 Reviewed By: mthrok Differential Revision: D47758072 Pulled By: mpc001 fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483 Differential Revision: D47725664 Pulled By: mthrok fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b
-
- 24 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489 Reviewed By: mthrok Differential Revision: D47726448 Pulled By: mpc001 fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841
-
- 18 Jul, 2023 1 commit
-
-
moto authored
Summary: Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders. Pull Request resolved: https://github.com/pytorch/audio/pull/3478 Differential Revision: D47519672 Pulled By: mthrok fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2
-
- 15 Jul, 2023 1 commit
-
-
moto authored
Summary: The nightly builds support FFmpeg version 4, 5 and 6. Pull Request resolved: https://github.com/pytorch/audio/pull/3480 Differential Revision: D47482841 Pulled By: mthrok fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9
-
- 05 Jul, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3433 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: mthrok Differential Revision: D46657526 fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89
-
- 28 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449 Differential Revision: D47094402 Pulled By: mthrok fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622
-
- 26 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442 Differential Revision: D46797481 Pulled By: mthrok fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe
-
- 21 Jun, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: Splitting the multilingual example part into another tutorial. Pull Request resolved: https://github.com/pytorch/audio/pull/3443 Reviewed By: mthrok Differential Revision: D46802844 Pulled By: xiaohui-zhang fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998
-
- 16 Jun, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset. This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability. Pull Request resolved: https://github.com/pytorch/audio/pull/3421 Reviewed By: mpc001 Differential Revision: D46799748 Pulled By: mthrok fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
-
- 15 Jun, 2023 1 commit
-
-
moto authored
Summary: * Fix backtrack visualization (the cooridnate was off-by-one.) * Add note about the simplification and the new align API * Explicitly handle SOS and EOS Pull Request resolved: https://github.com/pytorch/audio/pull/3440 Reviewed By: xiaohui-zhang Differential Revision: D46761282 Pulled By: mthrok fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8
-
- 07 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415 Differential Revision: D46526437 Pulled By: mthrok fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b
-
- 06 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410 Differential Revision: D46496786 Pulled By: mthrok fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059
-
- 04 Jun, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: There are some BC-Breaking changes from pytorch_lightning to lightning library. The PR adjust those changes to support latest lightning library. Pull Request resolved: https://github.com/pytorch/audio/pull/3396 Reviewed By: mthrok Differential Revision: D46345206 Pulled By: nateanl fbshipit-source-id: 59469c15dc5fe5466a99a5b5380eb4f98c2c633f
-
- 02 Jun, 2023 2 commits
-
-
moto authored
Summary: This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion https://github.com/pytorch/audio/issues/1269 Pull Request resolved: https://github.com/pytorch/audio/pull/3368 Differential Revision: D46406176 Pulled By: mthrok fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
-
moto authored
Summary: Replace sox_effects with `torchaudio.io.AudioEffector` 1. To show case the new and better feature 2. To prepare for the upcoming removal of file-like support object Pull Request resolved: https://github.com/pytorch/audio/pull/3375 Reviewed By: nateanl Differential Revision: D46379016 Pulled By: mthrok fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315
-
- 31 May, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3379 Fixes `RNNTBeamSearch.infer`'s docstring and removes unused import from tutorial. Reviewed By: mthrok Differential Revision: D46227174 fbshipit-source-id: 7c1c3f05a6476cb0437622dea6f3ae6cb3ea9468
-
- 26 May, 2023 2 commits
-
-
atalman authored
Summary: This reverts commit d38a7854. This is temporary revert to unblock unit test migration from circleci to github Pull Request resolved: https://github.com/pytorch/audio/pull/3377 Reviewed By: mthrok Differential Revision: D46230498 Pulled By: atalman fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d
-
Lakshmi Krishnan authored
Summary: This commit fixes the following issues affecting streaming decoding quality 1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided. 2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies. 3. Some minor errors regarding shape checking for length. This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript. Pull Request resolved: https://github.com/pytorch/audio/pull/3295 Reviewed By: nateanl Differential Revision: D46216113 Pulled By: hwangjeff fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
-