Commits · cea12eaf988cb4f14b88f057e87a5b1fcfe4c28c · OpenDAS / Torchaudio

07 Mar, 2023 1 commit

Fix Adam and AdamW initializers in wav2letter example (#3145) · cea12eaf

Maciej Torhan authored Mar 06, 2023

Summary:
In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer.

Pull Request resolved: https://github.com/pytorch/audio/pull/3145

Reviewed By: mthrok

Differential Revision: D43847713

Pulled By: nateanl

fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6

cea12eaf

06 Mar, 2023 1 commit

Refactor encoding process (#3146) · 8a9ab2a4

Moto Hira authored Mar 06, 2023

Summary:
After the series of simplification, audio/video encoding processes
can be merged, and it allows the gets rid of the boilerplate code.

Pull Request resolved: https://github.com/pytorch/audio/pull/3146

(Note: this ignores all push blocking failures!)

Reviewed By: xiaohui-zhang

Differential Revision: D43815640

fbshipit-source-id: 2a14e372b2cc75db7eeabc27d855a24c3f7d5063

8a9ab2a4

04 Mar, 2023 2 commits

Fix linux gpu tests (#3144) · b96a7ebb

Zhaoheng Ni authored Mar 04, 2023

Summary:
Environment variable `TORCHAUDIO_TEST_ALLOW_SKIP_IF_NO_MACOS ` needs to be added when running the bash script

Pull Request resolved: https://github.com/pytorch/audio/pull/3144

Reviewed By: mthrok

Differential Revision: D43807178

Pulled By: nateanl

fbshipit-source-id: 27c57d2efaed5519a12aa027967968895f357c67

b96a7ebb

Refactor audio conversion (#3143) · db4898f3

Moto Hira authored Mar 03, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3143

Similar to https://github.com/pytorch/audio/pull/3140,
only provide objects which are semantically related to the
operation performed by AudioConverter.

Reviewed By: xiaohui-zhang

Differential Revision: D43781012

fbshipit-source-id: 4795e20f56272af5cfda8a5f46083e60d1890c3e

db4898f3

03 Mar, 2023 3 commits

Simplify HW encoder object handling (#3138) · 26acdbff

moto authored Mar 03, 2023

Summary:
hw_device_ctx and hw_frame_ctx assigned to an AVCodecContext
object are owned by libavformat, and get freed in [av_codec_free](https://ffmpeg.org/doxygen/4.1/group__lavc__core.html#gaf869d0829ed607cec3a4a02a1c7026b3)
(actually in [avcodec_close](https://ffmpeg.org/doxygen/4.1/libavcodec_2utils_8c_source.html#l01069)),
so we do not need to keep the reference around.

Pull Request resolved: https://github.com/pytorch/audio/pull/3138

Reviewed By: nateanl

Differential Revision: D43738009

Pulled By: mthrok

fbshipit-source-id: 8c1f4217fa7b21dce872d12be9245056f3fc7537

26acdbff

Fix HW accelerated encoder (#3140) · 41e3b93d

Moto Hira authored Mar 03, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3140

https://github.com/pytorch/audio/pull/3120 introduced regression in GPU encoder.

This happened because previously source AVPixelFormat (expected channel order of
input tensor) and AVCodecContext (encoding format) in converter (module to copy
input tensor to buffer), even though converter does not need to konw about the
encoding format.

This commit fixes the issue and make sure that converter does not recieve
codec context.

Reviewed By: nateanl

Differential Revision: D43759162

fbshipit-source-id: f5f191cb54ecc82bd882aececdcae16921250261

41e3b93d

Skip playback tests on linux gpu machine (#3141) · d359f887

Zhaoheng Ni authored Mar 03, 2023

Summary:
`playback` function was added in https://github.com/pytorch/audio/issues/3026, the function only supports MacOS, hence the tests should be skipped on other OS. The PR skips the tests on linux gpu machines on Circle CI.

Pull Request resolved: https://github.com/pytorch/audio/pull/3141

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D43760546

Pulled By: nateanl

fbshipit-source-id: 606907127feee28a66f61baca000a8ef708f8086

d359f887

02 Mar, 2023 5 commits

Fix build (#3136) · 5fac7173

moto authored Mar 02, 2023

Summary:
Follow-up https://github.com/pytorch/audio/issues/3130

Pull Request resolved: https://github.com/pytorch/audio/pull/3136

Reviewed By: hwangjeff

Differential Revision: D43732991

Pulled By: mthrok

fbshipit-source-id: 2e8cb56d96e22546645c82eca362b3c4dcf9c78f

5fac7173

Fix doc build (#3125) · 1ed38095

moto authored Mar 01, 2023

Summary:
Fix build_doc job

https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8

- build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL.
- Fix bash cell syntax in HW tutorial
- Fix C++ doc
- Fix duplicated target name in streamwriter tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/3125

Reviewed By: xiaohui-zhang

Differential Revision: D43724078

Pulled By: mthrok

fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c

1ed38095

Extract audio conversion into separate class (#3130) · 9133f2a0

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3130

Similar to https://github.com/pytorch/audio/pull/3120
Adopt the generator style slicing conversion to audio encoding
process.

Reviewed By: nateanl

Differential Revision: D43685380

fbshipit-source-id: 3e95655783e5c5d768486f8af6e6b47b0072999b

9133f2a0

Fix PTS regression (#3131) · fbf05f28

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3131

In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable
is removed.

PTS can be incremented the same way, but the timing was wrong in #3122.
This commit fixes it.

Reviewed By: xiaohui-zhang

Differential Revision: D43712046

fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9

fbf05f28

Update slicing conversion code (#3129) · 898db8c7

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3129

- Add step parameter to support audio slicing
- Rename to `SlicingTensorConverter` (`Generator` is too generic.)

Reviewed By: xiaohui-zhang

Differential Revision: D43704926

fbshipit-source-id: c4bf0ff766e0ae1b5d46b159a6367492ef68f9cd

898db8c7

01 Mar, 2023 6 commits

Fix stylecheck in io (#3126) · b0faecb2

Zhaoheng Ni authored Mar 01, 2023

Summary:
`Dict` is not used. Fix styecheck by removing the import of `Dict`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3126

Reviewed By: mthrok

Differential Revision: D43699410

Pulled By: nateanl

fbshipit-source-id: 8d6b5335124903453387c488f96f297d6fe3c819

b0faecb2

Tweak OutputStream implementation (#3122) · fce6180c

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3122

- Remove manual tracking of num_frames
- Remove unnecessary dispatch in AudioOutputStream

Reviewed By: nateanl

Differential Revision: D43685746

fbshipit-source-id: a7e62a81549fb62ad0caa3b741655eba3bc5e250

fce6180c

Extract image conversions into separate class (#3120) · 0bf00d20

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3120

This commits extract image conversion ops into ImageTensorConverter class, and make it independent from OutputStream class.

ImageTensorConverter class implementes range-based for-loop interface, like

```
for (auto const& frame : ImageTensorConverter::convert(...)) {
    post_process_with_avframe(frame);
}
```

This allows to decouple encoder from image conversion.

Reviewed By: nateanl

Differential Revision: D43666296

fbshipit-source-id: 754efe677bc7695b3f138a6d076be2106e186b79

0bf00d20

Move I/O logging to C++ (#3123) · c9c8c7e1

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3123

Moving the I/O usage logging to C++, so that C++ usages are also covered.

Reviewed By: nateanl

Differential Revision: D43686567

fbshipit-source-id: ad357028dd69eedb8bc2a2482fe07e95757a3a62

c9c8c7e1

Fix windows tests (#3119) · 6a4a8200

Zhaoheng Ni authored Mar 01, 2023

Summary:
`sox` is not available on Windows machines. Add skip decorators to the sox related tests to skip running tests on Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/3119

Reviewed By: mthrok

Differential Revision: D43682754

Pulled By: nateanl

fbshipit-source-id: f69987dac8232a3569be83f096b32389bd8bda81

6a4a8200

Remove redundant device arg from VideoOutputStream constructor (#3121) · af493e4e

Moto Hira authored Feb 28, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3121

After careful review, it turned out device arg in VideoOutputStream
constructor and related helper functions can be replaced with
AVCodecContext::pix_fmt == AV_PIX_FMT_CUDA.

Reviewed By: xiaohui-zhang

Differential Revision: D43677801

fbshipit-source-id: f8f34f1aed46e223b44250d39cccc4cd26ecb458

af493e4e

28 Feb, 2023 3 commits

Decouple image conversion and OutputStream class (#3113) · 2381beec

Moto Hira authored Feb 28, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3113

Decouple the Tensor to AVFrame conversion process from encoding process.

Reviewed By: nateanl

Differential Revision: D43628942

fbshipit-source-id: e698f3150292567dbc23e7d6795ad58265f24780

2381beec

Use null filter in case no filter is used (#3109) · fd24af00

Moto Hira authored Feb 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3109

Change the logic around StreamWriter preprocessing.
Currently, no preprocessing is expressed as `nullptr` to `unique_ptr<FilterGraph>`.

This commit changes it to `[a]null` filter, which is just a pass through.
This makes a code a bit simpler, and serves better preparation for adding
filters for CUDA process.

Reviewed By: xiaohui-zhang

Differential Revision: D43593321

fbshipit-source-id: 9ca71c2c8bf652384a0f56b4c41b32d908f61201

fd24af00

Reduce code duplication in VideoOutputStream (#3108) · be3bd1ac

Moto Hira authored Feb 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3108

- Introduce process_frame method
- De-dupe validation logic

Reviewed By: xiaohui-zhang

Differential Revision: D43632390

fbshipit-source-id: 76b7ca0beb725acf686269c877a62e1256921b28

be3bd1ac

27 Feb, 2023 5 commits

Add SquimObjectiveBundle to prototype (#3103) · 46fae2fe

Zhaoheng Ni authored Feb 27, 2023

Summary:
Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3103

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D43611794

Pulled By: nateanl

fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d

46fae2fe

Move OutputStream init logic and simplify interface (#3105) · bc61f109

Moto Hira authored Feb 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3105

Refactor the construction of Audio/VideoOutputStream

Reviewed By: nateanl

Differential Revision: D43613013

fbshipit-source-id: 0e112cb1bab2658be68a368099ed00ef318ea4f1

bc61f109

Split Audio/VideoOutputStream source (#3106) · 5b0580ae

Moto Hira authored Feb 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3106

Refactor Audio/VideoOutputStream.

Reviewed By: nateanl

Differential Revision: D43613008

fbshipit-source-id: 36c62fe00903066982573866d07de4e79b34240d

5b0580ae

Extract Encoder from OutputStream (#3104) · 5cac8de3

Moto Hira authored Feb 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3104

Continuation of StreamWriter refactoring

This commit extract Encoder (+muxer) from OutputStream

Reviewed By: nateanl

Differential Revision: D43610887

fbshipit-source-id: 30a9862b1aabd5af331ce3f33a5815df1decbad1

5cac8de3

Refactor StreamWriter and extract encoding process (#3100) · 23231033

Moto Hira authored Feb 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3100

Refactor StreamWriter and move OutputStream to dedicated source, then
split them into separate audio/video class.

Reviewed By: nateanl

Differential Revision: D43587337

fbshipit-source-id: 0fdbd1f56a7200dc6849e95eb9678854f5d933b8

23231033

25 Feb, 2023 1 commit

Fix unit tests for griffinlim and Spectrogram (#3099) · 75fc9a46

Zhaoheng Ni authored Feb 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3099

Reviewed By: mthrok

Differential Revision: D43596866

Pulled By: nateanl

fbshipit-source-id: 43a139bf8ebdf3261414e2855aefc3b53df298ac

75fc9a46

24 Feb, 2023 5 commits

Add Wav2Vec2DataModule in self_supervised_learning training recipe (#3081) · fd778091

Vladislav Agafonov authored Feb 24, 2023

Summary:
Add `Wav2Vec2DataModule` in self_supervised_learning training recipe to support Wav2Vec2 pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/3081

Reviewed By: mthrok

Differential Revision: D43579239

Pulled By: nateanl

fbshipit-source-id: 3e935eb9a18ef0259a58940ae466cbdc3baf8494

fd778091

Add wav2vec2 loss function in self_supervised_learning training recipe (#3090) · c532f35c

Vladislav Agafonov authored Feb 24, 2023

Summary:
Add wav2vec2 loss function in the self_supervised_learning training recipe to support Wav2Vec2 pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/3090

Reviewed By: mthrok

Differential Revision: D43579220

Pulled By: nateanl

fbshipit-source-id: 4b52792b518ddc5b01c9660c90ceb3c4ad1f0237

c532f35c

Cleanup ffmpeg bidings (#3095) · b46628ba

moto authored Feb 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3095

Reviewed By: nateanl

Differential Revision: D43544998

Pulled By: mthrok

fbshipit-source-id: 4359cdbbdbee53084016a84129cb3d65900b0457

b46628ba

Bind StreamReader/Writer with PyBind11 (#3091) · b012b452

moto authored Feb 24, 2023

Summary:
This commit is kind of clean up and preparation for future
development.

We plan to pass around more complicated objects among
StreamReader and StreamWriter, and TorchBind is not expressive enough
for defining intermediate object, so we use PyBind11 for binding
StreamWriter.

Pull Request resolved: https://github.com/pytorch/audio/pull/3091

Reviewed By: xiaohui-zhang

Differential Revision: D43515714

Pulled By: mthrok

fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2

b012b452

Use autosummary for torchaudio.prototyoe.models documentation (#3084) · f6d1bc96

Zhaoheng Ni authored Feb 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3084

Reviewed By: mthrok

Differential Revision: D43550150

Pulled By: nateanl

fbshipit-source-id: 5c5e3d9461e375be202493e3399ff38ce5cd7690

f6d1bc96

23 Feb, 2023 5 commits

Replace c10::Dict with std::map in StreamReader/Writer (#3092) · c3310018

moto authored Feb 23, 2023

Summary:
This commit is kind of clean up and preparation for future development.

We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we want to use PyBind11 for binding StreamReader/Writer.

PyBind11 converts Python dict into std::map, while TorchBind converts it into c10::Dict. Because of this descrepancy, conversion from c10::Dict to std::map have to happen in multiple places, and this makes the binding code thicker as it requires to wrapper methods.

Using std::map reduces the number of wrapper methods / conversions, because the same method can be bound for file-like object and the others.

Pull Request resolved: https://github.com/pytorch/audio/pull/3092

Reviewed By: nateanl

Differential Revision: D43524808

Pulled By: mthrok

fbshipit-source-id: f7467c66ccd37dbf4abc337bbb18ffaac21a0058

c3310018

Add TCPGen context-biasing Conformer RNN-T (#2890) · 1ed330b5

G. Sun authored Feb 23, 2023

Summary:
This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing.

An example for Librispeech can be found in audio/examples/asr/librispeech_biasing.

Maintainer's note (mthrok):
It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple
could cause some issue without running the code, so the code is not changed, though the annotation uses tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2890

Reviewed By: nateanl

Differential Revision: D43171447

Pulled By: mthrok

fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e

1ed330b5

Remove Tensor binding from StreamReader (#3093) · d3c9295c

mthrok authored Feb 23, 2023

Summary:
Remove the Tensor input support from StreamReader

Follow up of https://github.com/pytorch/audio/pull/3086

Pull Request resolved: https://github.com/pytorch/audio/pull/3093

Reviewed By: xiaohui-zhang

Differential Revision: D43526066

Pulled By: mthrok

fbshipit-source-id: 57ba4866c413649173e1c2c3b23ba7de3231b7bc

d3c9295c

Deprecate the use of Tensor as a mean of passing byte string (#3086) · a26c2f27

moto authored Feb 22, 2023

Summary:
The same functionality can be achieved with passing io.BytesIO to the constructor.

Pull Request resolved: https://github.com/pytorch/audio/pull/3086

Reviewed By: nateanl

Differential Revision: D43500360

Pulled By: mthrok

fbshipit-source-id: 2c6f37d100f50553b283c75c04fe57c8f9c07dc9

a26c2f27

Update CTCDecoder static build deprecation message (#3089) · 3b75b74f

moto authored Feb 22, 2023

Summary:
1. Fix spacing.
2. Move it to after successful import
3. Add link to the announcement issue

Pull Request resolved: https://github.com/pytorch/audio/pull/3089

Reviewed By: nateanl, xiaohui-zhang

Differential Revision: D43514075

Pulled By: mthrok

fbshipit-source-id: 3b2a24c65c63dab8c12c9c6aa1942a8354b2c0f1

3b75b74f

22 Feb, 2023 3 commits

Rename SQUIM_OBJECTIVE model to SquimObjective (#3087) · b0155938

Zhaoheng Ni authored Feb 22, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3087

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D43509865

Pulled By: nateanl

fbshipit-source-id: 569cc2ee8edd9de0b7d255a1e1075ac812b26cc8

b0155938

Fix ConformerWav2Vec2PretrainModel (#3085) · b35a5fcf

Zhaoheng Ni authored Feb 22, 2023

Summary:
The negative sampling should be applied to unmasked features in masked indices, the PR fixes the logic in ConformerWav2Vec2PretrainModel.

Pull Request resolved: https://github.com/pytorch/audio/pull/3085

Reviewed By: mthrok

Differential Revision: D43488570

Pulled By: nateanl

fbshipit-source-id: 3820400d50b74216bb98ca6a40dc6a7acca01564

b35a5fcf

Add objective metric estimation model for speech enhancement (#3042) · 3267c7ed

Zhaoheng Ni authored Feb 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3042

Reviewed By: mthrok

Differential Revision: D43405932

Pulled By: nateanl

fbshipit-source-id: 88f6dabae35565b699230e9909b8f68f4a57f5c7

3267c7ed