- 19 Sep, 2023 1 commit
-
-
moto authored
Extracted from #3604 Add Wall helper class and C++ unit test
-
- 05 Sep, 2023 1 commit
-
-
moto authored
Summary: The PR https://github.com/pytorch/audio/issues/3549 re-organized the backend implementations and deprecated the direct access to torchaudio.backend. The change was supposed to be BC-compatible while issuing a warning to users, but the implementation of module-level `__getattr__` was not quite right. See an issue https://github.com/pyannote/pyannote-audio/pull/1456. This commit fixes it so that the following imports work; ```python from torchaudio.backend.common import AudioMetaData from torchaudio.backend import sox_io_backend from torchaudio.backend.sox_io_backend import save, load, info from torchaudio.backend import no_backend from torchaudio.backend.no_backend import save, load, info from torchaudio.backend import soundfile_backend from torchaudio.backend.soundfile_backend import save, load, info ``` Pull Request resolved: https://github.com/pytorch/audio/pull/3595 Reviewed By: nateanl Differential Revision: D48957446 Pulled By: mthrok fbshipit-source-id: ebb256461dd3032025fd27d0455ce980888f7778
-
- 04 Sep, 2023 1 commit
-
-
moto authored
Summary: This PR removes the legacy backend switch mechanism. The implementation itself is still available. Merge after v2.1 release Pull Request resolved: https://github.com/pytorch/audio/pull/3559 Reviewed By: nateanl Differential Revision: D48353764 Pulled By: mthrok fbshipit-source-id: 4d3924dbe6f334ecebe2b12fcd4591c61c4aa656
-
- 20 Aug, 2023 1 commit
-
-
moto authored
Summary: Turned out FFmpeg 5 installed via conda reports video frame rate -1. FFmpeg 4 and 6 are fine. This is either a regression in FFmpeg or in the underlying decoding library. Make the reference value adoptive. Pull Request resolved: https://github.com/pytorch/audio/pull/3568 Reviewed By: huangruizhe Differential Revision: D48499621 Pulled By: mthrok fbshipit-source-id: fb64187bcf0dc57b753cb6c05f04d436238f5c51
-
- 14 Aug, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3558 In the event that `use_tmp_hub_dir` isn't specified as an option, pytest shouldn't fail. To resolve such failures, this PR modifies function `temp_hub_dir` to fall back on a default value of `False` for `use_tmp_hub_dir`. Reviewed By: mthrok Differential Revision: D48318947 fbshipit-source-id: 5dd692f9202ef37ec3e2c9ea39896156f928d693
-
- 11 Aug, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3551 Restores VGGish pipeline test to be a function rather than class. Reviewed By: mthrok Differential Revision: D48236197 fbshipit-source-id: 25ac19d87a7a0964a9c3f7552037cd6c21dc38a9
-
- 10 Aug, 2023 2 commits
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3545 Adds function for computing the Fréchet distance between two multivariate normal distributions. Reviewed By: mthrok Differential Revision: D48126102 fbshipit-source-id: e4e122b831e1e752037c03f5baa9451e81ef1697
-
moto authored
Summary: The backend dispatcher is implemented in `torchaudio._backend`, while the legacy backend is implemented in `torchaudio.backend`. The initialization happen in `torchaudio._backend`. This commit moves it to `torchaudio.__init__`, so that `backend` and `_backend` is more independent. Pull Request resolved: https://github.com/pytorch/audio/pull/3548 Reviewed By: huangruizhe Differential Revision: D48219244 Pulled By: mthrok fbshipit-source-id: e694cb232794f90902a60ee51c7bf11b7f0548a0
-
- 09 Aug, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3544 Revises VGGish inference pipeline test to support internal testing. Reviewed By: mthrok Differential Revision: D48058409 fbshipit-source-id: 045140a0e9d50128d32ef6510bdb2f642a365c83
-
- 07 Aug, 2023 1 commit
-
-
moto authored
Summary: This commit adds `merge_tokens` function which removes repeated tokens from CTC token sequences returned from `forced_align`. Resolving repeated tokens is a necessary step and almost universal, thus it makes sense to have such helper function in torchaudio. Pull Request resolved: https://github.com/pytorch/audio/pull/3535 Reviewed By: huangruizhe Differential Revision: D48111202 Pulled By: mthrok fbshipit-source-id: 25354bfa210aa5c03f8c1d3e201f253ca3761b24
-
- 03 Aug, 2023 1 commit
-
-
hwangjeff authored
Summary: Increases numerical tolerance on Conformer RNN-T TorchScript consistency tests to resolve CI test failures. Pull Request resolved: https://github.com/pytorch/audio/pull/3525 Reviewed By: mthrok Differential Revision: D48000613 Pulled By: hwangjeff fbshipit-source-id: 1d35ba58055a8346dc40e2b67f37ccfd2e015894
-
- 01 Aug, 2023 1 commit
-
-
hwangjeff authored
Summary: Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset. Pull Request resolved: https://github.com/pytorch/audio/pull/3491 Reviewed By: mthrok Differential Revision: D47738130 Pulled By: hwangjeff fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67
-
- 31 Jul, 2023 1 commit
-
-
moto authored
Summary: torch.norm is now deprecated. The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm Resolves https://github.com/pytorch/audio/issues/3484 Pull Request resolved: https://github.com/pytorch/audio/pull/3522 Reviewed By: huangruizhe Differential Revision: D47926659 Pulled By: mthrok fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084
-
- 29 Jul, 2023 1 commit
-
-
moto authored
Summary: The I/O functions in _compat module was introduced there so that everything related to FFmpeg is in torchaudio.io and FFmpeg library initialization can be carried out in `torchaudio.io.__init__`. Now that this constraint is removed, (all the initialization happens at `torchaudio._extension.__init__`) and `_compat` is only used by FFmpeg dispatcher backend, we move the module to `torchaudio._backend` for better locality. Pull Request resolved: https://github.com/pytorch/audio/pull/3518 Reviewed By: huangruizhe Differential Revision: D47877412 Pulled By: mthrok fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f
-
- 28 Jul, 2023 2 commits
-
-
moto authored
Summary: In https://github.com/pytorch/audio/issues/2419, we added ffmpeg as fallback for sox_io backend. The was a warkaround for solving the issue with libmad removal. Now that we introduced `backend` argument to I/O functions, and libsox integration is moved to dynamic binding where users can use libsox with libmad integration, we do not need the workaround. This commit is based on reverting https://github.com/pytorch/audio/issues/2416 (fd7ace17). Pull Request resolved: https://github.com/pytorch/audio/pull/3516 Reviewed By: huangruizhe Differential Revision: D47855272 Pulled By: mthrok fbshipit-source-id: 5af73af7865f6e545ccb052d478e86588ff2a014
-
Zhaoheng Ni authored
Summary: The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release. Pull Request resolved: https://github.com/pytorch/audio/pull/3512 Reviewed By: mthrok Differential Revision: D47837434 Pulled By: nateanl fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8
-
- 26 Jul, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3499 Differential Revision: D47803654 Pulled By: mthrok fbshipit-source-id: 2b916fa66d84c91c01b4dfe6dd5ee3501159f451
-
- 25 Jul, 2023 1 commit
-
-
moto authored
Summary: In preparation for https://github.com/pytorch/audio/pull/3082 Disable those FFmpeg tests that depend on sox CLI. These tests need to be updated or removed so as not to use sox CLI. Auto-skip some sox tests if decoder/encoder are not available Pull Request resolved: https://github.com/pytorch/audio/pull/3494 Differential Revision: D47761948 Pulled By: mthrok fbshipit-source-id: 3a48d7f280f8376a48d223947dd41a7cdc8cbc30
-
- 17 Jul, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3467 Differential Revision: D47482388 Pulled By: mthrok fbshipit-source-id: abff36491dc28b83270673860d6457a084b1327d
-
- 12 Jul, 2023 1 commit
-
-
moto authored
Summary: This commit introduces support for multiple FFmpeg versions for OSS binary distributions. Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking. This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4. The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them. At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension. The order of preference is 6, 5, then 4. To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build. They are LGPL and downloaded from S3 at build time, instead of building every time. The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built so that it will only support one specific version of FFmpeg. Pull Request resolved: https://github.com/pytorch/audio/pull/3464 Differential Revision: D47300223 Pulled By: mthrok fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04
-
- 10 Jul, 2023 1 commit
-
-
moto authored
Summary: 1. Update smoke test script to change directory so that there is no `torchaudio` directory in CWD when smoke test is being executed. 2. Disable the part of smoke test which requires FFmpeg for wheel. The preparation for https://github.com/pytorch/test-infra/pull/4358 Pull Request resolved: https://github.com/pytorch/audio/pull/3465 Reviewed By: nateanl Differential Revision: D47345117 Pulled By: mthrok fbshipit-source-id: 95aad0a22922d44ee9a24a05d9ece85166b8c17e
-
- 05 Jul, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3433 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: mthrok Differential Revision: D46657526 fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89
-
- 21 Jun, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3427 Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms. Reviewed By: mthrok Differential Revision: D46547418 fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960
-
- 14 Jun, 2023 1 commit
-
-
moto authored
Summary: Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate. Pull Request resolved: https://github.com/pytorch/audio/pull/3374 Differential Revision: D46235358 Pulled By: mthrok fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4
-
- 09 Jun, 2023 1 commit
-
-
moto authored
Summary: The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test. See https://github.com/pytorch/audio/issues/3430 Pull Request resolved: https://github.com/pytorch/audio/pull/3431 Differential Revision: D46592883 Pulled By: mthrok fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6
-
- 08 Jun, 2023 2 commits
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3395 Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`. Reviewed By: mthrok Differential Revision: D46307672 fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5
-
moto authored
Summary: StreamReader decoding process is composed of the three steps; 1. Decode the incoming AVPacket into AVFrame 2. Pass AVFrame through AVFilter to perform post process 3. Convert the resulgint AVFrame The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved. For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable. However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405 AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape. Fix https://github.com/pytorch/audio/issues/3405 Pull Request resolved: https://github.com/pytorch/audio/pull/3419 Differential Revision: D46557505 Pulled By: mthrok fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
-
- 07 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415 Differential Revision: D46526437 Pulled By: mthrok fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b
-
- 06 Jun, 2023 3 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410 Differential Revision: D46496786 Pulled By: mthrok fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059
-
Moto Hira authored
Differential Revision: D46126226 Original commit changeset: 42cb52b19d91 Original Phabricator Diff: D46126226 fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3365 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: vineelpratap Differential Revision: D46126226 fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2
-
- 02 Jun, 2023 1 commit
-
-
moto authored
Summary: This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion https://github.com/pytorch/audio/issues/1269 Pull Request resolved: https://github.com/pytorch/audio/pull/3368 Differential Revision: D46406176 Pulled By: mthrok fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
-
- 01 Jun, 2023 3 commits
-
-
moto authored
Summary: This commit removes file-like obejct support so that we can remove custom patch The motivation and plan is outlined in https://github.com/pytorch/audio/issues/2950. Pull Request resolved: https://github.com/pytorch/audio/pull/3035 Reviewed By: hwangjeff Differential Revision: D44695647 Pulled By: mthrok fbshipit-source-id: 13af0234e288c041bc7b490e1f967f85ce7eb8ec
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3398 Reviewed By: nateanl Differential Revision: D46354862 Pulled By: mthrok fbshipit-source-id: b86dcdfeff8ed9db87b0b78eca20f6f18117e97e
-
moto authored
Summary: The arguments of TorchAudio's save function ("format", "bits_per_sample" and "encoding") are not one-to-one mapping to the arguments of FFmpeg encoding. For example, to use vorbis codec, FFmpeg expects "ogg" container/extension with "vorbis" encoder. It does not recognize "vorbis" extension like TorchAudio (libsox) does. This commit refactors the logic to parse/map the arguments. As a result it now properly works with vorbis and mp3 extension. Pull Request resolved: https://github.com/pytorch/audio/pull/3387 Reviewed By: hwangjeff Differential Revision: D46328787 Pulled By: mthrok fbshipit-source-id: 36f993952a062bfec58a8b51be6aa86297571f90
-
- 30 May, 2023 1 commit
-
-
atalman authored
Summary: Disable failing GPU unit test. See associated issue: https://github.com/pytorch/audio/issues/3376 Pull Request resolved: https://github.com/pytorch/audio/pull/3384 Reviewed By: mthrok Differential Revision: D46279324 Pulled By: atalman fbshipit-source-id: 3a606bb992e0261451f48d1fb458e054f7fd5583
-
- 27 May, 2023 1 commit
-
-
moto authored
Summary: When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform. Pull Request resolved: https://github.com/pytorch/audio/pull/3372 Reviewed By: hwangjeff Differential Revision: D46234772 Pulled By: mthrok fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552
-
- 26 May, 2023 3 commits
-
-
moto authored
Summary: g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up. Pull Request resolved: https://github.com/pytorch/audio/pull/3373 Reviewed By: hwangjeff Differential Revision: D46233181 Pulled By: mthrok fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c
-
Zhaoheng Ni authored
Summary: The tests failed for several bundles. Remove them and will re-add once the root cause is figured out. Pull Request resolved: https://github.com/pytorch/audio/pull/3378 Reviewed By: atalman Differential Revision: D46230884 Pulled By: nateanl fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92
-
Lakshmi Krishnan authored
Summary: This commit fixes the following issues affecting streaming decoding quality 1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided. 2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies. 3. Some minor errors regarding shape checking for length. This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript. Pull Request resolved: https://github.com/pytorch/audio/pull/3295 Reviewed By: nateanl Differential Revision: D46216113 Pulled By: hwangjeff fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
-