Commits · 05592dffe8a582db1cfdfe3ed46f351b935b3fad · OpenDAS / Torchaudio

22 Mar, 2022 1 commit

Add download utility specialized for torchaudio (#2283) · 64b98521

moto authored Mar 22, 2022

Summary:
In recent updates, torchaudio added features that download assets/models from
download.pytorch.org/torchaudio.

To reduce the code duplication, the implementations uses utilities from
``torch.hub``, but still, there are patterns repeated in implementing
the fetch mechanism, notably cache and local file path handling.

This commit introduces the utility function that handles
download/cache/local path management that can be used for
fetching pre-trained model data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2283

Reviewed By: carolineechen

Differential Revision: D35050469

Pulled By: mthrok

fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b

64b98521

04 Mar, 2022 2 commits

Flush and reset internal state after seek (#2264) · 7e1afc40

moto authored Mar 04, 2022

Summary:
This commit adds the following behavior to `seek` so that `seek`
works after a frame is decoded.

1. Flush the decoder buffer.
2. Recreate filter graphs (so that internal state is re-initialized)
3. Discard the buffered tensor. (decoded chunks)

Also it disallows negative values for seek timestamp.

Pull Request resolved: https://github.com/pytorch/audio/pull/2264

Reviewed By: carolineechen

Differential Revision: D34497826

Pulled By: mthrok

fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d

7e1afc40

Make Streamer fail if an invalid option is provided (#2263) · 04875eef

moto authored Mar 04, 2022

Summary:
`torchaudio.prototype.io.Streamer` class takes context dependant options
as `option` argument in the form of mappings of strings.

Currently there is no check if the provided options were valid for
the given input.

This commit adds the check and raise an error if an invalid erro is given.

This is analogous to `ffmpeg` command error handling.

```
$ ffmpeg -foo
...
Unrecognized option 'foo'.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2263

Reviewed By: hwangjeff

Differential Revision: D34495111

Pulled By: mthrok

fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f

04875eef

26 Feb, 2022 2 commits

Enable ffmpeg prototyep unit test (#2261) · 955ffb47

Moto Hira authored Feb 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2261

Enables prototype ffmpeg io tests in fbcode.

Reviewed By: nateanl

Differential Revision: D33698353

fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036

955ffb47

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

25 Feb, 2022 5 commits

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d

Add rtf_evd method to torchaudio.functional (#2230) · 86fe4fa7

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds `rtf_evd` method to `torchaudio.functional`.
The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
The input argument is the power spectral density (PSD) matrix of the target speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2230

Reviewed By: mthrok

Differential Revision: D34474188

Pulled By: nateanl

fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134

86fe4fa7

Add mvdr_weights_rtf to torchaudio.functional (#2229) · 3566ffc5

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference.
The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2229

Reviewed By: mthrok

Differential Revision: D34474119

Pulled By: nateanl

fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2

3566ffc5

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

Add psd method to torchaudio.functional (#2227) · 07bd1aa3

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``psd`` method to ``torchaudio.functional``.
It computes the power spectral density (PSD) matrix of the complex-valued spectrum.
The method also supports normalization of Time-Frequency mask.

Pull Request resolved: https://github.com/pytorch/audio/pull/2227

Reviewed By: mthrok

Differential Revision: D34473908

Pulled By: nateanl

fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9

07bd1aa3

24 Feb, 2022 1 commit

Add Python 3.10 (build and test) (#2224) · 27dff6ba

Andrey Talman authored Feb 24, 2022

Summary:
Adding py3.10 to audio

Pull Request resolved: https://github.com/pytorch/audio/pull/2224

Reviewed By: malfet, atalman, mthrok

Differential Revision: D34442377

Pulled By: seemethere

fbshipit-source-id: 2656de73427063958d609a74c01b526a476cb06a

27dff6ba

17 Feb, 2022 2 commits

Refactor batch consistency test in functional (#2245) · 9cf59e75

Zhaoheng Ni authored Feb 17, 2022

Summary:
In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests.
This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2245

Reviewed By: mthrok

Differential Revision: D34273035

Pulled By: nateanl

fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056

9cf59e75

Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15

Zhaoheng Ni authored Feb 17, 2022

Summary:
- Refactor the current `LibriSpeechRNNTModule`'s unit test.
- Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
- Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2240

Reviewed By: mthrok

Differential Revision: D34285195

Pulled By: nateanl

fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f

b5d77b15

16 Feb, 2022 2 commits

Refactor torchscript consistency test in functional (#2246) · 87d79889

Zhaoheng Ni authored Feb 16, 2022

Summary:
In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2246

Reviewed By: carolineechen

Differential Revision: D34273057

Pulled By: nateanl

fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b

87d79889

Add complex dtype support in functional autograd test (#2244) · eeba91dc

Zhaoheng Ni authored Feb 16, 2022

Summary:
In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2244

Reviewed By: mthrok

Differential Revision: D34272998

Pulled By: nateanl

fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5

eeba91dc

15 Feb, 2022 1 commit

Adjust Conformer args (#2223) · 411b5dcf

hwangjeff authored Feb 14, 2022

Summary:
Orders and names Conformer's initializer args to be more consistent with Emformer's.

Pull Request resolved: https://github.com/pytorch/audio/pull/2223

Reviewed By: mthrok

Differential Revision: D34226177

Pulled By: hwangjeff

fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829

411b5dcf

11 Feb, 2022 2 commits

Add fixed random seed for Emformer RNN-T recipe test (#2220) · bc0fcadb

hwangjeff authored Feb 11, 2022

Summary:
Adds fixed random seed to Emformer RNN-T training recipe test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2220

Reviewed By: nateanl

Differential Revision: D34180644

Pulled By: hwangjeff

fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9

bc0fcadb

Add unit tests for Emformer RNN-T LibriSpeech recipe (#2216) · bbdbd582

hwangjeff authored Feb 11, 2022

Summary:
Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/2216

Reviewed By: nateanl

Differential Revision: D34171480

Pulled By: hwangjeff

fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e

bbdbd582

09 Feb, 2022 2 commits

Clean up Emformer (#2207) · 87d7694d

hwangjeff authored Feb 09, 2022

Summary:
- Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases.
- Clarify expected input shapes in API documentation.
- Adjust `infer` tests to reflect expected usage.
- Add assertion for input shape for `infer`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2207

Reviewed By: mthrok

Differential Revision: D34101205

Pulled By: hwangjeff

fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304

87d7694d

Fix librosa calls (#2208) · e5d567c9

hwangjeff authored Feb 08, 2022

Summary:
Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2208

Reviewed By: mthrok

Differential Revision: D34099793

Pulled By: hwangjeff

fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc

e5d567c9

02 Feb, 2022 1 commit

Add Streaming API (#2164) · 7a3e262d

moto authored Feb 01, 2022

Summary:
This PR adds the prototype streaming API.
The implementation is based on ffmpeg libraries.

For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2164

Reviewed By: hwangjeff

Differential Revision: D33934457

Pulled By: mthrok

fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe

7a3e262d

01 Feb, 2022 2 commits

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

Add CTC decoder timesteps (#2184) · d43ce015

Caroline Chen authored Feb 01, 2022

Summary:
add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens

Pull Request resolved: https://github.com/pytorch/audio/pull/2184

Reviewed By: mthrok

Differential Revision: D33905530

Pulled By: carolineechen

fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a

d43ce015

27 Jan, 2022 2 commits

Add no lm support for CTC decoder (#2174) · 4c3fa875

Caroline Chen authored Jan 27, 2022

Summary:
Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)

Pull Request resolved: https://github.com/pytorch/audio/pull/2174

Reviewed By: hwangjeff, nateanl

Differential Revision: D33798674

Pulled By: carolineechen

fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde

4c3fa875

Add `is_ffmpeg_available` in test (#2170) · 39fe9df6

moto authored Jan 26, 2022

Summary:
Part of https://github.com/pytorch/audio/issues/2164.
To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available,
this commit adds `is_ffmpeg_available`.

The availability of the features depend on two factors;
1. If it was enabled at build.
2. If the ffmpeg libraries are found at runtime.

A simple way (for OSS workflow) to detect these is simply checking if
`libtorchaudio_ffmpeg` presents and can be loaded without a failure.

To facilitate this, this commit changes the
`torchaudio._extension._load_lib` to return boolean result.

Pull Request resolved: https://github.com/pytorch/audio/pull/2170

Reviewed By: carolineechen

Differential Revision: D33797695

Pulled By: mthrok

fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97

39fe9df6

26 Jan, 2022 3 commits

Add tacotron2 unittest with different batch_size (#2176) · 691317a9

Zhaoheng Ni authored Jan 26, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2176

Reviewed By: carolineechen, mthrok

Differential Revision: D33794216

Pulled By: nateanl

fbshipit-source-id: e039c1fc03a89f1e8130a5c4dbc4beceff4081eb

691317a9

Add integration test for Emformer RNN-T LibriSpeech pipeline (#2172) · 0d6d0669

hwangjeff authored Jan 26, 2022

Summary:
Adds integration test for pretrained ASR pipeline `EMFORMER_RNNT_BASE_LIBRISPEECH`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2172

Reviewed By: carolineechen, nateanl

Differential Revision: D33793324

Pulled By: hwangjeff

fbshipit-source-id: d0613e2ab98fe5afa7b16ca39b67f0a0304d13fc

0d6d0669

Remove subsampling and positional embedding logic from Conformer (#2171) · b81f0b45

hwangjeff authored Jan 26, 2022

Summary:
To facilitate experimenting with different strategies, this PR removes the existing subsampling and positional embedding logic from `Conformer`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2171

Reviewed By: nateanl

Differential Revision: D33793338

Pulled By: hwangjeff

fbshipit-source-id: 9f97614b09964a101a891b9c840b61a26fc1541f

b81f0b45

21 Jan, 2022 1 commit

Add video test asset for streaming API (#2167) · db10bdfb

moto authored Jan 21, 2022

Summary:
Split from https://github.com/pytorch/audio/issues/2164
Add new test assets. Adding this commit separately so that
this commit message about the origin of the file is easier to find.

The original video is in public domain par
- https://svs.gsfc.nasa.gov/13013
- https://www.nasa.gov/multimedia/guidelines/index.html
(The YouTube page directly says so)
- https://www.youtube.com/watch?v=6zNsc0e3Zns

So, the video is modified to fit the needs for testing.
1. multiple audio/video streams
2. Non-audio/video (subtitle) streams
3. Different FPS and sampling rate
4. Ones without audio and video.

```
#!/usr/bin/env bash
original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt

# Fetch the original video, embed the subtitle
ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y

# Extract, rescale video and resample audio
ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y
ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000  tmp2.mp4 -y

# Merge them, retaining all the streams (6 in total)
ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y

# Make versions without audio / video
ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y
ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2167

Reviewed By: carolineechen

Differential Revision: D33712954

Pulled By: mthrok

fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211

db10bdfb

20 Jan, 2022 1 commit

Relax absolute tolerance for Kaldi compat tests (#2165) · 75ca501a

Nikita Shulga authored Jan 19, 2022

Summary:
Find out that tests are failing after change for tester GPU class, see https://github.com/pytorch/audio/pull/1791

Pull Request resolved: https://github.com/pytorch/audio/pull/2165

Reviewed By: mthrok

Differential Revision: D33674802

Pulled By: malfet

fbshipit-source-id: 2e39386c0f129cf44a30d5dfea67e9e2d0e875cf

75ca501a

05 Jan, 2022 1 commit

Do not auto-skip tests on CI (#2127) · 4f487c4a

moto authored Jan 05, 2022

Summary:
Update the internal of `skipIfXXX` decorators so that tests in CI will not be automatically skipped.

Currently we automatically skip some tests based on the availability of related features/test tools.
This causes issues where we miss signals on certain important features. (CUDA on Windows) https://github.com/pytorch/audio/issues/1565

The new `skipIf` decorator will fail if in CI unless it is explicitly allowed to skip tests.
It does so by checking `CI` and `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` environment variables.

For non-CI environments, the behavior is same as before, but users can now set `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX=false` to disallow the automatic skip.

Results without `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` https://app.circleci.com/pipelines/github/pytorch/audio/9112/workflows/4e6db046-a1a2-4965-b0fe-d5baf4a1efac

Pull Request resolved: https://github.com/pytorch/audio/pull/2127

Reviewed By: hwangjeff

Differential Revision: D33430711

Pulled By: mthrok

fbshipit-source-id: d8954dd720469c5ab0f34ea062fd8cf04a8afa3e

4f487c4a

30 Dec, 2021 2 commits

Enforce lint checks and fix/mute lint errors (#2116) · 8ed14782

Joao Gomes authored Dec 30, 2021

Summary:
cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/2116

Reviewed By: mthrok

Differential Revision: D33368453

Pulled By: jdsgomes

fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c

8ed14782

Clean up Emformer module (#2091) · 4c8fd760

hwangjeff authored Dec 30, 2021

Summary:
* Removes redundant declaration `right_context_blocks = []`, as flagged by kobenaxie.
* Adds random seed to tests, as flagged by carolineechen in other PRs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2091

Reviewed By: mthrok

Differential Revision: D33340964

Pulled By: hwangjeff

fbshipit-source-id: a9de43e28d1bae7bd4806b280717b0d822bb42fc

4c8fd760

29 Dec, 2021 3 commits

Add parameter p to TimeMasking (#2090) · 1ec7ff73

hwangjeff authored Dec 29, 2021

Summary:
Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779).

Pull Request resolved: https://github.com/pytorch/audio/pull/2090

Reviewed By: carolineechen

Differential Revision: D33344772

Pulled By: hwangjeff

fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf

1ec7ff73

Allow token list as CTC decoder input (#2112) · 896ade04

Caroline Chen authored Dec 29, 2021

Summary:
Additionally accept list of tokens as CTC decoder input. This makes it possible to directly pass in something like `bundles.get_labels()` into the decoder factory function instead of requiring a separate tokens file.

Pull Request resolved: https://github.com/pytorch/audio/pull/2112

Reviewed By: hwangjeff, nateanl, mthrok

Differential Revision: D33352909

Pulled By: carolineechen

fbshipit-source-id: 6d22072e34f6cd7c6f931ce4eaf294ae4cf0c5cc

896ade04

Reorganize RNN-T components in prototype module (#2110) · 67cdf882

hwangjeff authored Dec 29, 2021

Summary:
Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`.

Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2110

Reviewed By: carolineechen, mthrok

Differential Revision: D33354116

Pulled By: hwangjeff

fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb

67cdf882

23 Dec, 2021 3 commits

Add Python CTC decoder API (#2089) · a76b0066

Caroline Chen authored Dec 23, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review

This PR adds Python decoder API and basic README

Pull Request resolved: https://github.com/pytorch/audio/pull/2089

Reviewed By: mthrok

Differential Revision: D33299818

Pulled By: carolineechen

fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc

a76b0066

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

Introduce Conformer (#2068) · 1b17b011

hwangjeff authored Dec 22, 2021

Summary:
Adds implementation of Conformer module.

Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770.

Pull Request resolved: https://github.com/pytorch/audio/pull/2068

Reviewed By: mthrok

Differential Revision: D33236957

Pulled By: hwangjeff

fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6

1b17b011

21 Dec, 2021 1 commit

Fix load behavior for 24-bit input (#2084) · 4554d242

moto authored Dec 20, 2021

Summary:
## bug description

When a 24 bits-par-sample audio is loaded via file-like object,
the loaded Tensor is wrong. It was fine if the audio is loaded
from local file.

## The cause of the bug

The core of the sox's decoding mechanism is `sox_read` function,
one of which parameter is the maximum number of samples to decode
from the given buffer.

https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]

The `sox_read` function is called in what is called `drain` effect,
callback and this callback receives output buffer and its size in
byte. The previous implementation passed this size value as
the argument of `sox_read` for the maximum number of samples to
read. Since buffer size is larger than the number of samples fit in
the buffer, `sox_read` function always consumed the entire
buffer. (This behavior is not wrong except when the input is
24 bits-per-sample and file-like object.)

When the input is read from file-like object, inside of drain
callback, new data are fetched via Python's `read` method and
loaded on fixed-size memory region. The size of this memory region
can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
but the default value is 8096.

If the input format is 24 bits-per-sample, the end of memory region
does not necessarily correspond to the end of a valid sample.
When `sox_read` consumes all the data in the buffer region, the data
at the end introduces some unexpected values.
This causes the aforementioned bug

## Fix

Pass proper (better estimated) maximum number of samples decodable to
`sox_read`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2084

Reviewed By: carolineechen

Differential Revision: D33236947

Pulled By: mthrok

fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4

4554d242