Commits · b7d2d92895c20fb8f24e09ef6dbf0709a83d99fc · OpenDAS / Torchaudio

28 Jul, 2023 1 commit

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

26 Jul, 2023 1 commit

Move env util (#3499) · da212020

moto authored Jul 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3499

Differential Revision: D47803654

Pulled By: mthrok

fbshipit-source-id: 2b916fa66d84c91c01b4dfe6dd5ee3501159f451

da212020

25 Jul, 2023 1 commit

Disable some tests that need libsox (#3494) · 49e9ed94

moto authored Jul 25, 2023

Summary:
In preparation for https://github.com/pytorch/audio/pull/3082

Disable those FFmpeg tests that depend on sox CLI. These tests need to be updated or removed so as not to use sox CLI.

Auto-skip some sox tests if decoder/encoder are not available

Pull Request resolved: https://github.com/pytorch/audio/pull/3494

Differential Revision: D47761948

Pulled By: mthrok

fbshipit-source-id: 3a48d7f280f8376a48d223947dd41a7cdc8cbc30

49e9ed94

17 Jul, 2023 1 commit

Ensure StreamReader returns tensors with requires_grad is False (#3467) · 44b92062

moto authored Jul 17, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3467

Differential Revision: D47482388

Pulled By: mthrok

fbshipit-source-id: abff36491dc28b83270673860d6457a084b1327d

44b92062

12 Jul, 2023 1 commit

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4

10 Jul, 2023 1 commit

Update package smoke test (#3465) · 589de109

moto authored Jul 10, 2023

Summary:
1. Update smoke test script to change directory so that there is no `torchaudio` directory in CWD when smoke test is being executed.
2. Disable the part of smoke test which requires FFmpeg for wheel. The preparation for https://github.com/pytorch/test-infra/pull/4358

Pull Request resolved: https://github.com/pytorch/audio/pull/3465

Reviewed By: nateanl

Differential Revision: D47345117

Pulled By: mthrok

fbshipit-source-id: 95aad0a22922d44ee9a24a05d9ece85166b8c17e

589de109

05 Jul, 2023 1 commit

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

21 Jun, 2023 1 commit

Introduce chroma spectrogram transform (#3427) · 70968293

Jeff Hwang authored Jun 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3427

Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms.

Reviewed By: mthrok

Differential Revision: D46547418

fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960

70968293

14 Jun, 2023 1 commit

Add resample option to AudioEffector (#3374) · 406e9c8d

moto authored Jun 14, 2023

Summary:
Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate.

Pull Request resolved: https://github.com/pytorch/audio/pull/3374

Differential Revision: D46235358

Pulled By: mthrok

fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4

406e9c8d

09 Jun, 2023 1 commit

Disable HF integration test (#3431) · f5d7635e

moto authored Jun 09, 2023

Summary:
The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test.

See https://github.com/pytorch/audio/issues/3430

Pull Request resolved: https://github.com/pytorch/audio/pull/3431

Differential Revision: D46592883

Pulled By: mthrok

fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6

f5d7635e

08 Jun, 2023 2 commits

Introduce chroma filter bank function (#3395) · dfd0c5fd

Jeff Hwang authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3395

Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.

Reviewed By: mthrok

Differential Revision: D46307672

fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5

dfd0c5fd

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

06 Jun, 2023 3 commits

Fix style issue (#3410) · 27aa52fb

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

Revert D46126226: Update forced_align method to only support batch Tensors · bbc13b9a

Moto Hira authored Jun 06, 2023

Differential Revision:
D46126226

Original commit changeset: 42cb52b19d91

Original Phabricator Diff: D46126226

fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6

bbc13b9a

Update forced_align method to only support batch Tensors (#3365) · 5f17d81c

Zhaoheng Ni authored Jun 06, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3365

Reviewed By: vineelpratap

Differential Revision: D46126226

fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2

5f17d81c

02 Jun, 2023 1 commit

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

01 Jun, 2023 3 commits

[BC-breaking] Remove file-like object support from sox_io backend (#3035) · bc54ac8a

moto authored Jun 01, 2023

Summary:
This commit removes file-like obejct support so that we can remove custom patch

The motivation and plan is outlined in https://github.com/pytorch/audio/issues/2950.

Pull Request resolved: https://github.com/pytorch/audio/pull/3035

Reviewed By: hwangjeff

Differential Revision: D44695647

Pulled By: mthrok

fbshipit-source-id: 13af0234e288c041bc7b490e1f967f85ce7eb8ec

bc54ac8a

Fix style issue (#3398) · c7ac1aff

moto authored Jun 01, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3398

Reviewed By: nateanl

Differential Revision: D46354862

Pulled By: mthrok

fbshipit-source-id: b86dcdfeff8ed9db87b0b78eca20f6f18117e97e

c7ac1aff

Refactor arg mapping in ffmpeg save function (#3387) · b99e5f46

moto authored May 31, 2023

Summary:
The arguments of TorchAudio's save function ("format", "bits_per_sample" and "encoding")
are not one-to-one mapping to the arguments of FFmpeg encoding.

For example, to use vorbis codec, FFmpeg expects "ogg" container/extension with "vorbis"
encoder. It does not recognize "vorbis" extension like TorchAudio (libsox) does.

This commit refactors the logic to parse/map the arguments.

As a result it now properly works with vorbis and mp3 extension.

Pull Request resolved: https://github.com/pytorch/audio/pull/3387

Reviewed By: hwangjeff

Differential Revision: D46328787

Pulled By: mthrok

fbshipit-source-id: 36f993952a062bfec58a8b51be6aa86297571f90

b99e5f46

30 May, 2023 1 commit

Disable failing GPU unit test (#3384) · caf3ac07

atalman authored May 30, 2023

Summary:
Disable failing GPU unit test.
See associated issue: https://github.com/pytorch/audio/issues/3376

Pull Request resolved: https://github.com/pytorch/audio/pull/3384

Reviewed By: mthrok

Differential Revision: D46279324

Pulled By: atalman

fbshipit-source-id: 3a606bb992e0261451f48d1fb458e054f7fd5583

caf3ac07

27 May, 2023 1 commit

Fix AudioEffector for mulaw (#3372) · af932cc7

moto authored May 26, 2023

Summary:
When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform.

Pull Request resolved: https://github.com/pytorch/audio/pull/3372

Reviewed By: hwangjeff

Differential Revision: D46234772

Pulled By: mthrok

fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552

af932cc7

26 May, 2023 3 commits

Fix encoding g722 format (#3373) · 1b05ca7e

moto authored May 26, 2023

Summary:
g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up.

Pull Request resolved: https://github.com/pytorch/audio/pull/3373

Reviewed By: hwangjeff

Differential Revision: D46233181

Pulled By: mthrok

fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c

1b05ca7e

Temporarily remove test for extract_features (#3378) · 05649ca3

Zhaoheng Ni authored May 26, 2023

Summary:
The tests failed for several bundles. Remove them and will re-add once the root cause is figured out.

Pull Request resolved: https://github.com/pytorch/audio/pull/3378

Reviewed By: atalman

Differential Revision: D46230884

Pulled By: nateanl

fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92

05649ca3

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa

24 May, 2023 1 commit

Update smoke test (#3346) · 71b2634b

moto authored May 24, 2023

Summary:
* Delay the import of torchaudio until the CLI options are parsed.
* Add option to set log level to DEBUG so that it's easy to see the issue with external libraries.

Pull Request resolved: https://github.com/pytorch/audio/pull/3346

Reviewed By: nateanl

Differential Revision: D46022546

Pulled By: mthrok

fbshipit-source-id: 9f988bbd770c2fd2bb260c3cfe02b238a9da2808

71b2634b

23 May, 2023 3 commits

[BugFix] Fix extract_features method for WavLM models (#3350) · 7d0f3369

Zhaoheng Ni authored May 23, 2023

Summary:
resolve https://github.com/pytorch/audio/issues/3347

`position_bias` is ignored in `extract_features` method, this doesn't affect Wav2Vec2 or HuBERT models, but it changes the output of transformer layers (except the first layer) in WavLM model. This PR fixes it by adding `position_bias` to the method.

Pull Request resolved: https://github.com/pytorch/audio/pull/3350

Reviewed By: mthrok

Differential Revision: D46112148

Pulled By: nateanl

fbshipit-source-id: 3d21aa4b32b22da437b440097fd9b00238152596

7d0f3369

[Nova] MacOS Unittests on Nova (#3324) · fce54fd1

Omkar Salpekar authored May 23, 2023

Summary:
As discussed in the [Torchaudio Migration Proposal](https://docs.google.com/document/d/1PF8biwiGzsjzfEBM78mlLiRrkcsGsvuYkeqkI66Ym8A/edit), this PR moves MacOS unittest job to Nova tooling. Note that this does not touch anything within the existing CircleCI job at the moment.

Passing job: https://github.com/pytorch/audio/actions/runs/4932497525/jobs/8815581251?pr=3324

Pull Request resolved: https://github.com/pytorch/audio/pull/3324

Reviewed By: atalman, mthrok

Differential Revision: D46113524

Pulled By: osalpekar

fbshipit-source-id: d048d300489f992fa187628cb6744d95ab4fb68a

fce54fd1

Fix cuda test failure (#3363) · fa59855f

Zhaoheng Ni authored May 23, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3361

When adding FunctionalCUDAOnlyTest, the class should inherit from `TestBaseMixin` instead of `Functional`

Pull Request resolved: https://github.com/pytorch/audio/pull/3363

Reviewed By: atalman, osalpekar

Differential Revision: D46112084

Pulled By: nateanl

fbshipit-source-id: 67c6472fda98cb718e0fc53ab248beda745feab5

fa59855f

22 May, 2023 1 commit

Fix CPU kernel of forced_align function (#3354) · 8a893fb3

Zhaoheng Ni authored May 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3354

when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0.

Reviewed By: xiaohui-zhang

Differential Revision: D46059971

fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8

8a893fb3

20 May, 2023 1 commit

[audio][PR] Add forced_align function to torchaudio (#3348) · e7935cff

Zhaoheng Ni authored May 19, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3348

The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame.

Reviewed By: vineelpratap, xiaohui-zhang

Differential Revision: D45867265

fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb

e7935cff

17 May, 2023 1 commit

Add 420p10le CPU support to StreamReader (#3332) · c12f4734

moto authored May 16, 2023

Summary:
This commit add support to decode YUV420P010LE format.

The image tensor returned by this format
- NCHW format (C == 3)
- int16 type
- value range [0, 2^10).

Note that the value range is different from what "hevc_cuvid" decoder
returns. "hevc_cuvid" decoder uses full range of int16 (internally,
it's uint16) to express the color (with some intervals), but the values
returned by CPU "hevc" decoder are with in [0, 2^10).

Address https://github.com/pytorch/audio/issues/3331

Pull Request resolved: https://github.com/pytorch/audio/pull/3332

Reviewed By: hwangjeff

Differential Revision: D45925097

Pulled By: mthrok

fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508

c12f4734

10 May, 2023 2 commits

[BC-Breaking] Switch to the backend dispatcher (#3241) · 4463fbdf

moto authored May 10, 2023

Summary:
This commit makes the code defaults to the backend dispatcher by default. Enabling backend dispatcher puts the FFmpeg-based I/O implementation on higher priority (if the corresponding FFmpeg is available), and allows individual function call to specify the backend.

See also https://github.com/pytorch/audio/issues/2950

Pull Request resolved: https://github.com/pytorch/audio/pull/3241

Reviewed By: hwangjeff

Differential Revision: D44709068

Pulled By: mthrok

fbshipit-source-id: 43aac3433f78a681df6669e9ac46e8ecf3beb1be

4463fbdf

[BC-Breaking] Update InverseMelScale solution (#3280) · 5a85a461

Zhaoheng Ni authored May 09, 2023

Summary:
Address https://github.com/pytorch/audio/issues/2643

- replace `SGD` optimization with `torch.linalg.lstsq` which is much faster.
- Add autograd test for `InverseMelScale`
- update other tests

Pull Request resolved: https://github.com/pytorch/audio/pull/3280

Reviewed By: hwangjeff

Differential Revision: D45679988

Pulled By: nateanl

fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59

5a85a461

09 May, 2023 1 commit

Fix batch consistency test for InverseBarkScale (#3322) · 51cc1cbf

Zhaoheng Ni authored May 09, 2023

Summary:
The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3322

Reviewed By: mthrok

Differential Revision: D45691769

Pulled By: nateanl

fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045

51cc1cbf

05 May, 2023 1 commit

Add SpecAugment transform (#3309) · 82febc59

Xiaohui Zhang authored May 05, 2023

Summary:
(2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this.

To solve these issues, here we
[done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
[done in this PR] Introducing SpecAugment transform.

Pull Request resolved: https://github.com/pytorch/audio/pull/3309

Reviewed By: nateanl

Differential Revision: D45592926

Pulled By: xiaohui-zhang

fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2

82febc59

04 May, 2023 1 commit

Extend mask_along_axis{,_iid} (#3289) · 74bd971a

Xiaohui Zhang authored May 04, 2023

Summary:
(1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design.

To solve these issues, here we
- Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.

The introduction of SpecAugment transform will be done in another PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/3289

Reviewed By: hwangjeff

Differential Revision: D45460357

Pulled By: xiaohui-zhang

fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3

74bd971a

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

12 Apr, 2023 2 commits

Allow overwrite temp data in ffmpeg test (#3263) · cc7b8bd4

moto authored Apr 11, 2023

Summary:
When `TORCHAUDIO_TEST_TEMP_DIR` is set,
all the unit test temporary data are stored in the  given directory.
Running unit tests multiple times reuses the
directory and the temporary files from the
previous test runs are found there.

FFmpeg save test writes reference data to the
temporary directory, but it is not given the
overwrite flag ("-y"), so it fails in such cases.

This commit fixes that.

Pull Request resolved: https://github.com/pytorch/audio/pull/3263

Reviewed By: hwangjeff

Differential Revision: D44859003

Pulled By: mthrok

fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b

cc7b8bd4

Specify backend directly in test (#3262) · 563e409c

moto authored Apr 11, 2023

Summary:
Preparation to land https://github.com/pytorch/audio/pull/3241

This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled.

Pull Request resolved: https://github.com/pytorch/audio/pull/3262

Reviewed By: hwangjeff

Differential Revision: D44897513

Pulled By: mthrok

fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32

563e409c