Commits · 93024ace026e6e0a30449a932fa30cfd49258251 · OpenDAS / Torchaudio

"projects/vscode:/vscode.git/clone" did not exist on "335d439385bbb968f6cdf1c4b7da5d3c6959d7a5"

01 Jun, 2022 1 commit

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

31 May, 2022 1 commit

Fail on Python if sox_io info/load does not succeed (#2423) · b56f60bf

moto authored May 31, 2022

Summary:
Extracted from https://github.com/pytorch/audio/issues/2419. Move the failure of sox_io from C++ to Python layer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2423

Reviewed By: carolineechen

Differential Revision: D36766152

Pulled By: mthrok

fbshipit-source-id: 53f897a608e97b81ebe5df29577374d88ce178f3

b56f60bf

29 May, 2022 1 commit

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

23 May, 2022 2 commits

Add assertion checks to multi-channel functions (#2401) · 38e530d7

Zhaoheng Ni authored May 23, 2022

Summary:
- The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
- The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
- The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
- The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2401

Reviewed By: carolineechen

Differential Revision: D36597689

Pulled By: nateanl

fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6

38e530d7

Add LibriLightLimited dataset (#2302) · af9cab3b

Zhaoheng Ni authored May 23, 2022

Summary:
The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
It contains "10 min", "1 hour", "10 hour" splits.

Pull Request resolved: https://github.com/pytorch/audio/pull/2302

Reviewed By: mthrok

Differential Revision: D36388188

Pulled By: nateanl

fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249

af9cab3b

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

20 May, 2022 1 commit

Refactor LibriSpeech tests to accommodate different dataset classes (#2392) · 010583b6

Jeff Hwang authored May 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2392

Refactors LibriSpeech tests to accommodate different dataset classes

Reviewed By: xiaohui-zhang

Differential Revision: D36387835

fbshipit-source-id: 73b4e7565b4a077b25f036f4bd854ac7f2194b28

010583b6

19 May, 2022 1 commit

Refactor Streamer implementation (#2402) · eed57534

moto authored May 19, 2022

Summary:
* Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
* Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
* Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†

† Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.

Ref https://github.com/pytorch/audio/issues/2400

Pull Request resolved: https://github.com/pytorch/audio/pull/2402

Reviewed By: nateanl

Differential Revision: D36488808

Pulled By: mthrok

fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047

eed57534

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

13 May, 2022 1 commit

Move Streamer API out of prototype (#2378) · 72b712a1

moto authored May 13, 2022

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328

72b712a1

12 May, 2022 3 commits

Use module-level `__getattr__` to implement delayed initialization (#2377) · 9499f642

moto authored May 12, 2022

Summary:
This commit updates the lazy module initialization logic for
`torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`.

- The modules are importable regarless of optional dependencies.
i.e. `import torchaudio.prototype.io` does not trigger the check for
optional dependencies.

- Optional dependencies are checked when the actual
API is imported for the first time.
i.e. `from torchaudio.prototype.io import Streamer` triggers the check
for optional dependencies.

The downside is that;

- `import torchaudio.prototype.io.Streamer` no longer works.

## Details:

Starting from Python 3.7, modules can bave `__getattr__` function,
which serves as a fallback if the import mechanism cannot find the
attribute.

This can be used to implement lazy import.

```python
def __getattr__(name):
    global pi
    if name == 'pi':
        import math
        pi = math.pi
        return pi
    raise AttributeError(...)
```

Ref: https://twitter.com/raymondh/status/1094686528440168453

The implementation performs lazy import for the APIs that work with
external/optional dependencies. In addition, it also check if the
binding is initialized only once.

## Why is this preferable approach?

Previously, the optional dependencies were checked at the tiem module
is imported;

https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5

As long as this module is in `prototype`, which we ask users to import
explictly, users had control whether they want/do not want to install
the optional dependencies.

This approach only works for one optional dependencies per one module.
Say, we add different I/O library as an optional dependency, we need to
put all the APIs in dedicated submodule. This prevents us from having
flat namespace.
i.e. the I/O modules with multiple optional dependencies would look like

```python
# Client code
from torchaudio.io.foo import FooFeature
from torchaudio.io.bar import BarFeature
```

where the new approach would allow

```python
#client code
from torchaudio.io import FooFeature, BarFeature
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2377

Reviewed By: nateanl

Differential Revision: D36305603

Pulled By: mthrok

fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae

9499f642

Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680

Zhaoheng Ni authored May 12, 2022

Summary:
- When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
- If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/2296

Reviewed By: mthrok

Differential Revision: D36323217

Pulled By: nateanl

fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423

09639680

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

11 May, 2022 2 commits

Move FFmpeg integrity test from conda smoke test to custom smoke test (#2381) · 9877f544

moto authored May 11, 2022

Summary:
Conda package build performs simple smoke test, which is different
from smoke_test jobs we define on our CI jobs.

Currently Conda packaging smoke test verifies the imporatability of
`torchaudio.prototype.io`, which requires FFmpeg 4.

1. We list FFmpeg 4 as runtime requirements, but this means that
conda's dependency resolver takes FFmpeg 4 into consideration.
FFmpeg 5 was release this year, and we can expect that user base
will move to FFmpeg gradually. If user environment has some constraint
on FFmpeg, torchaudio will have conflict and it will prevent users
from install torchaudio.

2. In #2377 the way optional dependency is checked/initialized is changed,
so this Conda smoke test will no longer check the integrity with FFmpeg libraries.

To solve the issues above, this commit moves the part that tests integrity with
FFmpeg libraries to the smoke test we define on CircleCI.

Pull Request resolved: https://github.com/pytorch/audio/pull/2381

Reviewed By: carolineechen

Differential Revision: D36323706

Pulled By: mthrok

fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3

9877f544

Ignore TempDir clean up error (#2379) · f35ad461

moto authored May 11, 2022

Summary:
On CircleCI, Windows unittests are failing for Python 3.7 with
`PermissionError` at the end of test when it cleans up temporary
directory.

According to the discussion https://github.com/python/cpython/issues/74168,
this is caused by a known issue with `shutil.rmtree`.

In the above thread it is advised to simply ignore the error as it
is not guaranteed that temp directories are cleaned up.

This commit follows the same path and simply ignore the error
so that our CI gets back to green.

Pull Request resolved: https://github.com/pytorch/audio/pull/2379

Reviewed By: carolineechen

Differential Revision: D36305595

Pulled By: mthrok

fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5

f35ad461

10 May, 2022 5 commits

Add ConvEmformer module (#2358) · 2c79b55a

hwangjeff authored May 10, 2022

Summary:
Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.

Continuation of https://github.com/pytorch/audio/issues/2324.

Pull Request resolved: https://github.com/pytorch/audio/pull/2358

Reviewed By: nateanl, xiaohui-zhang

Differential Revision: D36137992

Pulled By: hwangjeff

fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063

2c79b55a

Fix return dtype in MVDR module (#2376) · 2f4eb4ac

Zhaoheng Ni authored May 10, 2022

Summary:
Address https://github.com/pytorch/audio/issues/2375
The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input.

Pull Request resolved: https://github.com/pytorch/audio/pull/2376

Reviewed By: hwangjeff

Differential Revision: D36280851

Pulled By: nateanl

fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112

2f4eb4ac

Add RTFMVDR module (#2368) · 4b021ae3

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
The input arguments are:
- multi-channel spectrum.
- RTF vector of the target speech
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.
The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2368

Reviewed By: carolineechen

Differential Revision: D36214940

Pulled By: nateanl

fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937

4b021ae3

Add diagonal_loading optional to rtf_power (#2369) · da1e83cc

Zhaoheng Ni authored May 10, 2022

Summary:
When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.

This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2369

Reviewed By: carolineechen

Differential Revision: D36204130

Pulled By: nateanl

fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420

da1e83cc

Add SoudenMVDR module (#2367) · aed5eb88

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are:
- multi-channel spectrum.
- PSD matrix of target speech.
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2367

Reviewed By: hwangjeff

Differential Revision: D36198015

Pulled By: nateanl

fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527

aed5eb88

06 May, 2022 1 commit

Refactor smoke test executions (#2365) · 6a8a28bb

moto authored May 06, 2022

Summary:
The smoke test jobs simply perform `import torchaudio` to check
if the package artifacts are sane.

Originally, the CI was executing it in the root directory.
This was fine unless the source code is checked out.
When source code is checked out, performing `import torchaudio` in
root directory would import source torchaudio directory, instead of the
installed package.

This error is difficult to notice, so this commit introduces common script to
perform the smoke test, while moving out of root directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2365

Reviewed By: carolineechen

Differential Revision: D36202069

Pulled By: mthrok

fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf

6a8a28bb

26 Apr, 2022 1 commit

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

18 Apr, 2022 1 commit

Add QUESST14 dataset (#2290) · aebcf6af

Caroline Chen authored Apr 18, 2022

Summary:
implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py)

modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline

Pull Request resolved: https://github.com/pytorch/audio/pull/2290

Reviewed By: nateanl

Differential Revision: D35692551

Pulled By: carolineechen

fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c

aebcf6af

12 Apr, 2022 1 commit

Add Conformer RNN-T model prototype (#2322) · b0c8e239

hwangjeff authored Apr 11, 2022

Summary:
Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
- Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
- Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
- Introduces tests for `conformer_rnnt_model`.
- Adds docs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2322

Reviewed By: xiaohui-zhang

Differential Revision: D35565987

Pulled By: hwangjeff

fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789

b0c8e239

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

01 Apr, 2022 1 commit

Loosen atol for melscale batch test for Windows (#2305) · d65a0f3e

moto authored Mar 31, 2022

Summary:
The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows.

https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272

```
>       self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 28 / 196608 (0.0%)
E       Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed)
E       Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed)
```

The value of atol==1e-08 seems very strict but all the other batch
consistency tests are passing.

The violation is for very small number of samples, which looks
suspicious, but I think it is okay to reduce it to `1e-06` for Windows.

`1e-06` is still more strict than the majority of the comparison tests we have.

Pull Request resolved: https://github.com/pytorch/audio/pull/2305

Reviewed By: hwangjeff

Differential Revision: D35298056

Pulled By: mthrok

fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7

d65a0f3e

31 Mar, 2022 2 commits

Randomize initial phase of sinusoid data in test (#2301) · c6c6b689

moto authored Mar 31, 2022

Summary:
This commit update `get_sinusoid` function in test utility so that
when a multi channel is requested, non-primal channel have randomized
initial phase.

This adds some variety in test data which should not break the tests.
Currently `get_sinusoid` returns identical waveforms for all the channels.
This multi channel support was added just to mock the input data so that
it is easy to test features with multi-channel inputs, so tests should not be
expecting the all channels to be identical.

When working on numerical parity, it is more useful if the raw waveforms
are somewhat different.

Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel
<img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2301

Reviewed By: hwangjeff

Differential Revision: D35291689

Pulled By: mthrok

fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd

c6c6b689

Move Kaldi comp tests to corresponding module (#2303) · ec552b69

moto authored Mar 31, 2022

Summary:
Tests on `torchaudio.compliance.kaldi` were scattered at different places.
This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/`
directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2303

Reviewed By: nateanl

Differential Revision: D35288400

Pulled By: mthrok

fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46

ec552b69

25 Mar, 2022 1 commit

Add Pretrained LM Support for Decoder (#2275) · 34c0d115

Caroline Chen authored Mar 24, 2022

Summary:
add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2275

Reviewed By: mthrok

Differential Revision: D35115418

Pulled By: carolineechen

fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0

34c0d115

22 Mar, 2022 1 commit

Add download utility specialized for torchaudio (#2283) · 64b98521

moto authored Mar 22, 2022

Summary:
In recent updates, torchaudio added features that download assets/models from
download.pytorch.org/torchaudio.

To reduce the code duplication, the implementations uses utilities from
``torch.hub``, but still, there are patterns repeated in implementing
the fetch mechanism, notably cache and local file path handling.

This commit introduces the utility function that handles
download/cache/local path management that can be used for
fetching pre-trained model data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2283

Reviewed By: carolineechen

Differential Revision: D35050469

Pulled By: mthrok

fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b

64b98521

04 Mar, 2022 2 commits

Flush and reset internal state after seek (#2264) · 7e1afc40

moto authored Mar 04, 2022

Summary:
This commit adds the following behavior to `seek` so that `seek`
works after a frame is decoded.

1. Flush the decoder buffer.
2. Recreate filter graphs (so that internal state is re-initialized)
3. Discard the buffered tensor. (decoded chunks)

Also it disallows negative values for seek timestamp.

Pull Request resolved: https://github.com/pytorch/audio/pull/2264

Reviewed By: carolineechen

Differential Revision: D34497826

Pulled By: mthrok

fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d

7e1afc40

Make Streamer fail if an invalid option is provided (#2263) · 04875eef

moto authored Mar 04, 2022

Summary:
`torchaudio.prototype.io.Streamer` class takes context dependant options
as `option` argument in the form of mappings of strings.

Currently there is no check if the provided options were valid for
the given input.

This commit adds the check and raise an error if an invalid erro is given.

This is analogous to `ffmpeg` command error handling.

```
$ ffmpeg -foo
...
Unrecognized option 'foo'.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2263

Reviewed By: hwangjeff

Differential Revision: D34495111

Pulled By: mthrok

fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f

04875eef

26 Feb, 2022 2 commits

Enable ffmpeg prototyep unit test (#2261) · 955ffb47

Moto Hira authored Feb 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2261

Enables prototype ffmpeg io tests in fbcode.

Reviewed By: nateanl

Differential Revision: D33698353

fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036

955ffb47

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

25 Feb, 2022 5 commits

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d

Add rtf_evd method to torchaudio.functional (#2230) · 86fe4fa7

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds `rtf_evd` method to `torchaudio.functional`.
The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
The input argument is the power spectral density (PSD) matrix of the target speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2230

Reviewed By: mthrok

Differential Revision: D34474188

Pulled By: nateanl

fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134

86fe4fa7

Add mvdr_weights_rtf to torchaudio.functional (#2229) · 3566ffc5

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference.
The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2229

Reviewed By: mthrok

Differential Revision: D34474119

Pulled By: nateanl

fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2

3566ffc5

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

Add psd method to torchaudio.functional (#2227) · 07bd1aa3

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``psd`` method to ``torchaudio.functional``.
It computes the power spectral density (PSD) matrix of the complex-valued spectrum.
The method also supports normalization of Time-Frequency mask.

Pull Request resolved: https://github.com/pytorch/audio/pull/2227

Reviewed By: mthrok

Differential Revision: D34473908

Pulled By: nateanl

fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9

07bd1aa3