Commits · bd319959cd504b718685e720cbd4f47408ee1835 · OpenDAS / Torchaudio

11 Apr, 2022 1 commit

Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959

moto authored Apr 11, 2022

Summary:
This commit makes the FFmpeg integration support FFmpeg 5.0

In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
so that they deal with constant version of `AVInputFormat`.

> 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
>  Constified the pointers to AVInputFormats and AVOutputFormats
>  in AVFormatContext, avformat_alloc_output_context2(),
>  av_find_input_format(), av_probe_input_format(),
>  av_probe_input_format2(), av_probe_input_format3(),
>  av_probe_input_buffer2(), av_probe_input_buffer(),
>  avformat_open_input(), av_guess_format() and av_guess_codec().
>  Furthermore, constified the AVProbeData in av_probe_input_format(),
>  av_probe_input_format2() and av_probe_input_format3().

https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260

Pull Request resolved: https://github.com/pytorch/audio/pull/2326

Reviewed By: carolineechen

Differential Revision: D35551380

Pulled By: mthrok

fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666

bd319959

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

06 Apr, 2022 2 commits

Support GroupNorm and re-ordering Convolution/MHA in Conformer (#2320) · eb23a242

Xiaohui Zhang authored Apr 06, 2022

Summary:
Add an option to use GroupNorm rather than BatchNorm1d, and another option to re-order Convolution/MHA modules in Conformer model.

Pull Request resolved: https://github.com/pytorch/audio/pull/2320

Reviewed By: hwangjeff

Differential Revision: D35422112

Pulled By: xiaohui-zhang

fbshipit-source-id: 360a8aaa37b883b0f656da2e4f654e86688ac270

eb23a242

Add an option to use Tanh instead of ReLU in RNNT joiner (#2319) · 16958d5b

Xiaohui Zhang authored Apr 06, 2022

Summary:
Add an option to use Tanh instead of ReLU in RNNT joiner, which enables better training performance sometimes.

 ---

Pull Request resolved: https://github.com/pytorch/audio/pull/2319

Reviewed By: hwangjeff

Differential Revision: D35422122

Pulled By: xiaohui-zhang

fbshipit-source-id: c6a0f8b25936e47081110af046b57d0e8751f9a2

16958d5b

05 Apr, 2022 2 commits

Disable multiprocessing when dumping features in hubert preprocessing (#2311) · f7afe29e

Zhaoheng Ni authored Apr 05, 2022

Summary:
The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2311

Reviewed By: mthrok

Differential Revision: D35393813

Pulled By: nateanl

fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11

f7afe29e

Raise error for resampling int waveform (#2318) · 11328d23

Caroline Chen authored Apr 05, 2022

Summary:
Resolves https://github.com/pytorch/audio/issues/2294

Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type.

Pull Request resolved: https://github.com/pytorch/audio/pull/2318

Reviewed By: mthrok

Differential Revision: D35379276

Pulled By: carolineechen

fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0

11328d23

04 Apr, 2022 2 commits

Use pretrained LM API for decoder example (#2317) · 66185e00

Caroline Chen authored Apr 04, 2022

Summary:
update example ASR pipeline to use the recently added pretrained LM API for decoding

Pull Request resolved: https://github.com/pytorch/audio/pull/2317

Reviewed By: mthrok

Differential Revision: D35361354

Pulled By: carolineechen

fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46

66185e00

Fix arguments in CTC decoding script (#2315) · 4a749e2d

Zhaoheng Ni authored Apr 04, 2022

Summary:
Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser.

Pull Request resolved: https://github.com/pytorch/audio/pull/2315

Reviewed By: carolineechen

Differential Revision: D35357678

Pulled By: nateanl

fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92

4a749e2d

01 Apr, 2022 5 commits

Fix loading checkpoint in hubert preprocessing (#2310) · 87f0d198

Zhaoheng Ni authored Apr 01, 2022

Summary:
When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default.

Pull Request resolved: https://github.com/pytorch/audio/pull/2310

Reviewed By: mthrok

Differential Revision: D35316903

Pulled By: nateanl

fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed

87f0d198

Update GNU config files to support `arm64-apple` system (#2307) · 3ed39e15

moto authored Apr 01, 2022

Summary:
This commit
1. Updates the config.guess and config.sub files and
2. applies them to all the third party libraries that use them.

This resolves the following build failure on M1 mac with newer SDK.

On MacBookPro with M1 chip, with the recent OS update, something
about the development environment has been changed (probably newer
version of XCode) and the build stopeed working with the following
errors from third party dependencies.

```
checking build system type... Invalid configuration ‘arm64-apple-darwin20.0.0': machine ‘arm64-apple' not recognized
```

note: config files are taken from https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2307

Reviewed By: nateanl

Differential Revision: D35318273

Pulled By: mthrok

fbshipit-source-id: 746ac51dd1816767aa78b88445f76a29acfd29e8

3ed39e15

Put CONDA_PREFIX second priority of ffmpeg search path (#2312) · 6a418a89

moto authored Apr 01, 2022

Summary:
Change the cmake logic to search CONDA_PREFIX before falling back
to the other default paths and system paths.

1. FFMPEG_ROOT
2. CONDA_PREFIX
3. Other locations (Package managers and system paths)

For users with regular conda installation, ffmpeg from conda should
be picked automatically.
If anyone wants to specify the ffmpeg, then can set FFMPEG_ROOT
variable to the location of desired installation.

Pull Request resolved: https://github.com/pytorch/audio/pull/2312

Reviewed By: hwangjeff

Differential Revision: D35317383

Pulled By: mthrok

fbshipit-source-id: 52aef8f3f7f0f8f1eaf7a89a2d1ccfb6265e2c50

6a418a89

Refactor the internal of transforms module (#2309) · 72f9a4e3

Moto Hira authored Apr 01, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2309

For upcoming improved Kaldi features which are comprised of multiple classes / functions, put all the transforms implementations in dedicated directory.

Reviewed By: nateanl

Differential Revision: D35303682

fbshipit-source-id: 5bc8c07ef639683008c0f76ffe56e3941f772659

72f9a4e3

Loosen atol for melscale batch test for Windows (#2305) · d65a0f3e

moto authored Mar 31, 2022

Summary:
The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows.

https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272

```
>       self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 28 / 196608 (0.0%)
E       Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed)
E       Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed)
```

The value of atol==1e-08 seems very strict but all the other batch
consistency tests are passing.

The violation is for very small number of samples, which looks
suspicious, but I think it is okay to reduce it to `1e-06` for Windows.

`1e-06` is still more strict than the majority of the comparison tests we have.

Pull Request resolved: https://github.com/pytorch/audio/pull/2305

Reviewed By: hwangjeff

Differential Revision: D35298056

Pulled By: mthrok

fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7

d65a0f3e

31 Mar, 2022 2 commits

Randomize initial phase of sinusoid data in test (#2301) · c6c6b689

moto authored Mar 31, 2022

Summary:
This commit update `get_sinusoid` function in test utility so that
when a multi channel is requested, non-primal channel have randomized
initial phase.

This adds some variety in test data which should not break the tests.
Currently `get_sinusoid` returns identical waveforms for all the channels.
This multi channel support was added just to mock the input data so that
it is easy to test features with multi-channel inputs, so tests should not be
expecting the all channels to be identical.

When working on numerical parity, it is more useful if the raw waveforms
are somewhat different.

Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel
<img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2301

Reviewed By: hwangjeff

Differential Revision: D35291689

Pulled By: mthrok

fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd

c6c6b689

Move Kaldi comp tests to corresponding module (#2303) · ec552b69

moto authored Mar 31, 2022

Summary:
Tests on `torchaudio.compliance.kaldi` were scattered at different places.
This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/`
directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2303

Reviewed By: nateanl

Differential Revision: D35288400

Pulled By: mthrok

fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46

ec552b69

30 Mar, 2022 3 commits

Use zlib v1.2.12 with GitHub source (#2300) · 050b2fb4

Zhaoheng Ni authored Mar 30, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2300

Reviewed By: xiaohui-zhang

Differential Revision: D35258323

Pulled By: nateanl

fbshipit-source-id: 4b9f86600399ba0f5ec47f1c402968a812aa557d

050b2fb4

make sure inputs live on CPU for ctc decoder (#2289) · cfa5a383

Xiaohui Zhang authored Mar 30, 2022

Summary:
Addressing the issue https://github.com/pytorch/audio/issues/2274:
Raise Runtime errors when the input tensors to the CTC decoder are GPU tensors since the CTC decoder only runs on CPU. Also update the data type check to use "raise" rather than "assert".

 ---
Pull Request resolved: https://github.com/pytorch/audio/pull/2289

Reviewed By: mthrok

Differential Revision: D35255630

Pulled By: xiaohui-zhang

fbshipit-source-id: d6c6e88d9ad4b9690bb741557fa9a9504e60872e

cfa5a383

Use sourceforge url to fetch zlib (#2297) · 03badcd3

Zhaoheng Ni authored Mar 30, 2022

Summary:
This PR addresses https://github.com/pytorch/audio/issues/2295 by updating `zlib`'s url to the one on sourceforge.net.
`zlib` 1.2.11 source code is removed from the official site. According to https://zlib.net, ```Due to the bug fixes, any installations of 1.2.11 should be replaced with 1.2.12.```
sourceforge preserves the older versions thus is more stable. The PR keep 1.2.11 as currently there is no 1.2.12 on sourceforge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2297

Reviewed By: mthrok

Differential Revision: D35251361

Pulled By: nateanl

fbshipit-source-id: 174c2c2e1c34bef9799bbacfd1e12c8ff13ff15d

03badcd3

26 Mar, 2022 1 commit

Update decoder pretrained lm docs (#2291) · 46ed2b98

Caroline Chen authored Mar 26, 2022

Summary:
`build_docs` test is failing on CI with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`, but with local build:

<img width="902" alt="Screen Shot 2022-03-25 at 4 02 53 PM" src="https://user-images.githubusercontent.com/16568633/160157472-c91ff9b2-a2be-4c5d-959e-53b9f45425c6.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2291

Reviewed By: mthrok

Differential Revision: D35147098

Pulled By: carolineechen

fbshipit-source-id: 682b3800d0ed5c56b402d83f221136725051ba7e

46ed2b98

25 Mar, 2022 3 commits

Update README around version compatibility matrix (#2293) · 21c1ab7e

moto authored Mar 25, 2022

Summary:
Following the issue https://github.com/pytorch/text/issues/1662, add more clarification on LTS.
Also tidy up a bit by moving older versions in to details.

cc Nayef211

 ---

<img width="794" alt="Screen Shot 2022-03-25 at 2 30 49 PM" src="https://user-images.githubusercontent.com/855818/160203327-acc5cbcb-ca86-43ee-b59f-48795b9e676c.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2293

Reviewed By: hwangjeff

Differential Revision: D35159211

Pulled By: mthrok

fbshipit-source-id: 18908c62440fc02773634c2700020fc407893dd3

21c1ab7e

Pin jinja2 version for build_docs (#2292) · d484516e

Caroline Chen authored Mar 25, 2022

Summary:
`build_docs` CircleCI currently failing with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`. Pin Jinja2<3.1 to resolve this issue, see https://github.com/sphinx-doc/sphinx/issues/10291#issuecomment-1078046986

Pull Request resolved: https://github.com/pytorch/audio/pull/2292

Reviewed By: mthrok

Differential Revision: D35148397

Pulled By: carolineechen

fbshipit-source-id: 963efe2fcdee13dead4a4d542c903913c6eaa505

d484516e

Add Pretrained LM Support for Decoder (#2275) · 34c0d115

Caroline Chen authored Mar 24, 2022

Summary:
add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2275

Reviewed By: mthrok

Differential Revision: D35115418

Pulled By: carolineechen

fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0

34c0d115

24 Mar, 2022 2 commits

Update CTC decoder docs and add citation (#2278) · 05592dff

Caroline Chen authored Mar 24, 2022

Summary:
rendered:
- [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html)
- [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2278

Reviewed By: mthrok

Differential Revision: D35097734

Pulled By: carolineechen

fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c

05592dff

Add notes about prototype features in tutorials (#2288) · 8844fbb7

moto authored Mar 23, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288

Reviewed By: hwangjeff

Differential Revision: D35099492

Pulled By: mthrok

fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f

8844fbb7

22 Mar, 2022 3 commits

Revise the parameterization of third party libraries (#2282) · 7444f568

moto authored Mar 22, 2022

Summary:
Originally, the global property TORCHAUDIO_THIRD_PARTIES was introduced
to handle the optional third party dependencies that can change based on
the build config.

After revising the CMake, it turned out this is not really necessary,
as our torchaudio/csrc/CMakeLists.txt properly branches out for
conditional dependencies. Rather we should leave the global scope untouched.

Pull Request resolved: https://github.com/pytorch/audio/pull/2282

Reviewed By: hwangjeff

Differential Revision: D35059838

Pulled By: mthrok

fbshipit-source-id: ed3557eaa9a669e4466d64893beab5089eca78b8

7444f568

Add download utility specialized for torchaudio (#2283) · 64b98521

moto authored Mar 22, 2022

Summary:
In recent updates, torchaudio added features that download assets/models from
download.pytorch.org/torchaudio.

To reduce the code duplication, the implementations uses utilities from
``torch.hub``, but still, there are patterns repeated in implementing
the fetch mechanism, notably cache and local file path handling.

This commit introduces the utility function that handles
download/cache/local path management that can be used for
fetching pre-trained model data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2283

Reviewed By: carolineechen

Differential Revision: D35050469

Pulled By: mthrok

fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b

64b98521

Fix calculation of SNR value in tutorial (#2285) · 8395fe65

Hagen Wierstorf authored Mar 22, 2022

Summary:
The calculation of the SNR in tha data augmentation examples seems to be wrong to me:

![image](https://user-images.githubusercontent.com/173624/159487032-c60470c6-ef8e-48a0-ad5e-a117fcb8d606.png)

If we start from the definition of the signal-to-noise ratio using the root mean square value we get:

```
SNR = 20 log10 ( rms(scale * speech) / rms(noise) )
```
this can be transformed to
```
scale = 10^(SNR/20) rms(noise) / rms(speech)
```
In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have
```
rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2)
```
this would lead us to:
```
10^(SNR/20) = e^(SNR / 10)
```
which is not true.

Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`.

For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41.

Pull Request resolved: https://github.com/pytorch/audio/pull/2285

Reviewed By: nateanl

Differential Revision: D35047737

Pulled By: mthrok

fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3

8395fe65

17 Mar, 2022 1 commit

[Doc] fix typo and backlink (#2281) · 1c3403ea

moto authored Mar 17, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281

Reviewed By: carolineechen

Differential Revision: D34939494

Pulled By: mthrok

fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d

1c3403ea

10 Mar, 2022 3 commits

Fix typos and remove comments (#2270) · 4b47412e

moto authored Mar 10, 2022

Summary:
Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202

Pull Request resolved: https://github.com/pytorch/audio/pull/2270

Reviewed By: hwangjeff

Differential Revision: D34793460

Pulled By: mthrok

fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723

4b47412e

Fix type for lm parameter in decoder (#2273) · 8a885191

Caroline Chen authored Mar 10, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2273

Reviewed By: mthrok

Differential Revision: D34799335

Pulled By: carolineechen

fbshipit-source-id: d0eea79448efdbd84758a3f433ab9350b4c94e91

8a885191

Update version table in README (#2272) · ee68fcc5

Zhaoheng Ni authored Mar 10, 2022

Summary:
Add torchaudio 0.11.0 version to the table.

Pull Request resolved: https://github.com/pytorch/audio/pull/2272

Reviewed By: carolineechen

Differential Revision: D34790836

Pulled By: nateanl

fbshipit-source-id: af9ec1a4b470b04b793f39d12dbf722d67c62fce

ee68fcc5

08 Mar, 2022 1 commit

Add HuBERT-feature support in preprocessing of HuBERT training (#2143) · c4f12526

Zhaoheng Ni authored Mar 08, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2143

Reviewed By: carolineechen

Differential Revision: D34722238

Pulled By: nateanl

fbshipit-source-id: 72809c9db91c94d8e853c80ed8522eeffe5ff136

c4f12526

06 Mar, 2022 1 commit

Fix Kaldi submodule integration (#2269) · a92ae368

moto authored Mar 06, 2022

Summary:
When building Kaldi submodule, it requires to run `get_version.sh`, so that version header is available.
It was pointed that the script should run with `bash`, instead of `sh`.

Fixes https://github.com/pytorch/audio/issues/2268

Pull Request resolved: https://github.com/pytorch/audio/pull/2269

Reviewed By: carolineechen

Differential Revision: D34667726

Pulled By: mthrok

fbshipit-source-id: 761b82c54b58af2bfb2836cbe18c9708f853f1e1

a92ae368

04 Mar, 2022 2 commits

Flush and reset internal state after seek (#2264) · 7e1afc40

moto authored Mar 04, 2022

Summary:
This commit adds the following behavior to `seek` so that `seek`
works after a frame is decoded.

1. Flush the decoder buffer.
2. Recreate filter graphs (so that internal state is re-initialized)
3. Discard the buffered tensor. (decoded chunks)

Also it disallows negative values for seek timestamp.

Pull Request resolved: https://github.com/pytorch/audio/pull/2264

Reviewed By: carolineechen

Differential Revision: D34497826

Pulled By: mthrok

fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d

7e1afc40

Make Streamer fail if an invalid option is provided (#2263) · 04875eef

moto authored Mar 04, 2022

Summary:
`torchaudio.prototype.io.Streamer` class takes context dependant options
as `option` argument in the form of mappings of strings.

Currently there is no check if the provided options were valid for
the given input.

This commit adds the check and raise an error if an invalid erro is given.

This is analogous to `ffmpeg` command error handling.

```
$ ffmpeg -foo
...
Unrecognized option 'foo'.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2263

Reviewed By: hwangjeff

Differential Revision: D34495111

Pulled By: mthrok

fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f

04875eef

27 Feb, 2022 1 commit

Simplify setup_env.sh (#2265) · 17c6af7f

Nikita Shulga authored Feb 27, 2022

Summary:
Make them more aligned with ones in
https://github.com/pytorch/vision/blob/main/.circleci/unittest/linux/scripts/setup_env.sh

This is preliminary step towards eradicating unneeded conda-forge dependencies, see https://github.com/pytorch/audio/pull/2260

Pull Request resolved: https://github.com/pytorch/audio/pull/2265

Reviewed By: mthrok

Differential Revision: D34499635

Pulled By: malfet

fbshipit-source-id: f87a3e4568aeeab9c6787a777c3231153c4539f0

17c6af7f

26 Feb, 2022 3 commits

Enable ffmpeg prototyep unit test (#2261) · 955ffb47

Moto Hira authored Feb 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2261

Enables prototype ffmpeg io tests in fbcode.

Reviewed By: nateanl

Differential Revision: D33698353

fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036

955ffb47

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

25 Feb, 2022 1 commit

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d