Commits · 33485b8c2284d2b69766d9404421b2a884223d31 · OpenDAS / Torchaudio

05 Aug, 2022 3 commits

Add note for lexicon free decoder output (#2603) · 33485b8c

Caroline Chen authored Aug 05, 2022

Summary:
``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.

Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc

Pull Request resolved: https://github.com/pytorch/audio/pull/2603

Reviewed By: mthrok

Differential Revision: D38459709

Pulled By: carolineechen

fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934

33485b8c

Added example for SlidingWindowCmn transform (#2600) · 50bba1df

Ravi Makhija authored Aug 05, 2022

Summary:
Added example for `SlidingWindowCmn` transform as mentioned in issue https://github.com/pytorch/audio/issues/1564

Pull Request resolved: https://github.com/pytorch/audio/pull/2600

Reviewed By: mthrok

Differential Revision: D38395579

Pulled By: carolineechen

fbshipit-source-id: 44c5b7181789eedcaaa1d80149d5a1ab8de4c0ba

50bba1df

Added example for Vad transform (#2598) · bcf958f6

Ravi Makhija authored Aug 05, 2022

Summary:
Added example for Vad transform as mentioned in issue https://github.com/pytorch/audio/issues/1564

Pull Request resolved: https://github.com/pytorch/audio/pull/2598

Reviewed By: mthrok

Differential Revision: D38432103

Pulled By: carolineechen

fbshipit-source-id: 8f7e26c48d4ffb6bfe55bba6f9c7ee915e6edaef

bcf958f6

04 Aug, 2022 1 commit

Replace assert statements with raise in transforms (#2599) · 77a2baa8

Omkar Vichare authored Aug 03, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2599

Bootcamp task T127107566.
Replacing assert statements  with if ... then raise so can be run in optimized mode

Reviewed By: mthrok

Differential Revision: D38370108

fbshipit-source-id: 74eaf5b72c511b62ddbb8e0e3b0ed638ad49e4f2

77a2baa8

03 Aug, 2022 2 commits

Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2

Sean Kim authored Aug 03, 2022

Summary:
Add new model pretrained weights and tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2601

Reviewed By: carolineechen, nateanl

Differential Revision: D38396673

Pulled By: skim0514

fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1

6ecc11c2

An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a

bshall authored Aug 03, 2022

Summary:
I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
- I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
- I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
- I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
- I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?

I hope this is helpful! looking forward to hearing from you.

Pull Request resolved: https://github.com/pytorch/audio/pull/2472

Reviewed By: hwangjeff

Differential Revision: D38389155

Pulled By: carolineechen

fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904

946b180a

02 Aug, 2022 1 commit

ci: Fix anaconda uploading (#2581) · 8e0c2a3b

Eli Uriegas authored Aug 02, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2581



Also removes spurious lines of code that were erroring out silently
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: carolineechen

Differential Revision: D38336705

Pulled By: seemethere

fbshipit-source-id: 700a969a4bace7d9ca94a9db908b29f383b7d94e

8e0c2a3b

01 Aug, 2022 3 commits

Added example for Vol transform (#2597) · ccb2d6f2

Ravi Makhija authored Aug 01, 2022

Summary:
Added example for [Vol transform](https://pytorch.org/audio/stable/transforms.html#torchaudio.transforms.Vol) as mentioned in this issue https://github.com/pytorch/audio/issues/1564.

Also made a minor edit to the docstring for `class Vol` to fix a grammar typo and use more common verbiage.

Pull Request resolved: https://github.com/pytorch/audio/pull/2597

Reviewed By: nateanl, mthrok

Differential Revision: D38316433

Pulled By: carolineechen

fbshipit-source-id: 0be8fc505800a59acdab843813767acfdeac8243

ccb2d6f2

Fix typo - "dimension" (#2596) · e646de72

Ravi Makhija authored Aug 01, 2022

Summary:
Fixed minor typo in `Contributing.md`: "diemension" -> "dimension"

Pull Request resolved: https://github.com/pytorch/audio/pull/2596

Reviewed By: mthrok

Differential Revision: D38315517

Pulled By: carolineechen

fbshipit-source-id: 5e771f22a5be008d3be30b4699fb5cc5637c627d

e646de72

Update data augmentation tutorial (#2595) · f1443b8f

moto authored Aug 01, 2022

Summary:
In https://github.com/pytorch/audio/pull/2285, the SNR calculation was fixed,
but there was still one that was not fixed. This commit fixes it.

Also following the feedback https://github.com/pytorch/tutorials/issues/1930#issuecomment-1199741336, update the variable name.

Pull Request resolved: https://github.com/pytorch/audio/pull/2595

Reviewed By: carolineechen

Differential Revision: D38314672

Pulled By: mthrok

fbshipit-source-id: b2015e2709729190d97264aa191651b3af4ba856

f1443b8f

30 Jul, 2022 1 commit

Replace assert with raise in torchaudio.models (#2590) · e502df01

Ansh Nanda authored Jul 29, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2590

Converted assert checks for argument validation to if-else checks so that they are executed in optimized mode as well.

Reviewed By: mthrok

Differential Revision: D38211246

fbshipit-source-id: 922b5bcafe8214980e535527dd94c3345c1ff3e2

e502df01

29 Jul, 2022 4 commits

Update forced alignment tutorial (#2544) · c26b38b2

moto authored Jul 29, 2022

Summary:
1. Fix initialization.
Previously, the SOS token score was initialized to 0 across the time axis.
This was biasing the alignment to delay the start.
The proper way to delay the SOS is via blank token.
The new initilization takes the cumulated sum of blank scores.
2. Fill the end of trellis with Inf
Similar to the start, at the end where there remaining time frame is less
than the number of tokens, it is no longer possible to align the text, thus
we fill with Inf for better visualization.
3. Clean up asset management code.

Pull Request resolved: https://github.com/pytorch/audio/pull/2544

Reviewed By: nateanl

Differential Revision: D38276478

Pulled By: mthrok

fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299

c26b38b2

Enable CTC decoder in Windows (#2587) · 67cb420d

moto authored Jul 29, 2022

Summary:
This commit enables CTC decoder on Windows.

The functionality seems to work fine.
The tests are passing, the decoding tutorial runs fine.

The only difference to the Linux/macOS version is that
loading model in XZ compression format is not supported.

![289961785_399620772041679_7768117002438616376_n](https://user-images.githubusercontent.com/855818/181420923-cfbd8402-20de-4e63-b9e4-e39f9aa9fc50.png)

Pull Request resolved: https://github.com/pytorch/audio/pull/2587

Reviewed By: carolineechen, nateanl

Differential Revision: D38276490

Pulled By: mthrok

fbshipit-source-id: f2203b2235c5bbb0220fe560aaaf0e1d5530347a

67cb420d

Replace 'runtime_error' exception with 'TORCH_CHECK' in TorchAudio sox (#2592) · f234e51f

Javier Cardenete Morales authored Jul 29, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2592

std::runtime_error does not preserve the C++ stack trace, so it is unclear to users what went wrong internally.

PyTorch's TORCH_CHECK macro allows to print C++ stack trace when TORCH_SHOW_CPP_STACKTRACES environment variable is set to 1.

Reviewed By: mthrok

Differential Revision: D38219331

fbshipit-source-id: f51c27111077e927f97127f73f83a31b8e74f61f

f234e51f

Improve speech enhancement tutorial (#2527) · d6267031

Zhaoheng Ni authored Jul 29, 2022

Summary:
- The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
- Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
- FIx the figure in `rtf_power` subsection.
    - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
- Print PESQ, STOI, and SDR metric scores.

Pull Request resolved: https://github.com/pytorch/audio/pull/2527

Reviewed By: mthrok

Differential Revision: D38190218

Pulled By: nateanl

fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de

d6267031

28 Jul, 2022 7 commits

Add Union normalization parameter on spectrogram and inverse spectrogram (#2554) · 0fde7c57

Sean Kim authored Jul 28, 2022

Summary:
Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104

Pull Request resolved: https://github.com/pytorch/audio/pull/2554

Reviewed By: carolineechen, mthrok

Differential Revision: D38247554

Pulled By: skim0514

fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602

0fde7c57

Change docstring for easier understanding (#2570) · 338e3104

Sean Kim authored Jul 28, 2022

Summary:
Edit factory function's docstrings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2570

Reviewed By: carolineechen

Differential Revision: D38250369

Pulled By: skim0514

fbshipit-source-id: fa777e37d7cc517cf4ff1842d5585bf36558f50a

338e3104

Migrate CTC decoder code (#2580) · 39b6343d

moto authored Jul 28, 2022

Summary:
This commit gets rid of our copy of CTC decoder code and
replace it with upstream Flashlight-Text repo.

Pull Request resolved: https://github.com/pytorch/audio/pull/2580

Reviewed By: carolineechen

Differential Revision: D38244906

Pulled By: mthrok

fbshipit-source-id: d274240fc67675552d19ff35e9a363b9b9048721

39b6343d

Create tutorial for HDemucs (#2572) · 919fd0c4

Sean Kim authored Jul 28, 2022

Summary:
Add tutorial python file, draft PR, will continue to modify accordingly to feedback.

Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments

Pull Request resolved: https://github.com/pytorch/audio/pull/2572

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D38234001

Pulled By: skim0514

fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5

919fd0c4

Remove deprecated prototype alias (#2583) · 08395ba6

Vamsi Desu authored Jul 28, 2022

Summary:
CTC decoder and StreamReader are now in the main library.
This commit removes their aliases in `torchaudio.prototypes`

Pull Request resolved: https://github.com/pytorch/audio/pull/2583

Reviewed By: mthrok

Differential Revision: D38189314

fbshipit-source-id: c62209f2ad4f7052c6756a537b6fc509064e428c

08395ba6

Fix hubert fine-tuning recipe bugs (#2588) · 0092aa3c

Zhaoheng Ni authored Jul 28, 2022

Summary:
- The optimizer in fine-tuning recipe should also be `AdamW`. See https://github.com/pytorch/audio/pull/2412
- Fix the import of `DistributedBatchSampler` in hubert dataset
- Fix `dataset_path` in fine-tuning module.

Pull Request resolved: https://github.com/pytorch/audio/pull/2588

Reviewed By: carolineechen

Differential Revision: D38243423

Pulled By: nateanl

fbshipit-source-id: badc88ce9eddfd71270201a65ae89433fae2733f

0092aa3c

Refactor cmake (#2585) · d84ce3b2

moto authored Jul 28, 2022

Summary:
Extract the helper functions for defining library and extension so that they can be reused for building flashlight library and binding in https://github.com/pytorch/audio/issues/2580.

Pull Request resolved: https://github.com/pytorch/audio/pull/2585

Reviewed By: carolineechen

Differential Revision: D38233407

Pulled By: mthrok

fbshipit-source-id: 96f7c62a8b70bb3ff5caede9730165d54a55272f

d84ce3b2

27 Jul, 2022 3 commits

Replaced CHECK_ by TORCH_CHECK_ (#2582) · 04057fa6

Eli Uriegas authored Jul 27, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2582

CHECK_ were deprecated in upstream so we should replace them here as
well

Similar to https://github.com/pytorch/vision/pull/6322, relates to https://github.com/pytorch/pytorch/pull/82032

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, mthrok

Differential Revision: D38208356

Pulled By: seemethere

fbshipit-source-id: 6f42d517362f415e0775803514eee2628402918f

04057fa6

Replace assert with raise in prototypes.models (#2578) · 34ef7e9c

Son Dinh authored Jul 27, 2022

Summary:
This commit replaces the use of assert with `if ~ then raise` idiom,
So that they are executed even when Python is running in optimized mode.

Pull Request resolved: https://github.com/pytorch/audio/pull/2578

Reviewed By: mthrok

Differential Revision: D38158122

fbshipit-source-id: da561145a6e021238e9e9df10ab8d2d3a751fb69

34ef7e9c

Replace assert with raise (#2579) · 0f4e1e8c

Piyush Soni authored Jul 27, 2022

Summary:
`assert` is not executed when running in optimized mode.

This commit replaces all instances of "assert" in /fbcode/pytorch/audio/torchaudio/functional/functional.py

Pull Request resolved: https://github.com/pytorch/audio/pull/2579

Reviewed By: mthrok

Differential Revision: D38158280

fbshipit-source-id: f8d7fca1c8f9b3955c6ca312b16947eb12894d81

0f4e1e8c

26 Jul, 2022 5 commits

Fix argument validation in TorchAudio datasets (#2571) · 5bf73b59

Yu Shi authored Jul 26, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2571

Per T127106783, replace `assert` statement with `if _ then raise` statement to enforce the assertion even in optimized mode

Reviewed By: mthrok

Differential Revision: D38123481

fbshipit-source-id: 19321f7467bfd993b38bd9e44fcd01e5f5e64b87

5bf73b59

Dataset docstring change (#2575) · 379487de

Sean Kim authored Jul 25, 2022

Summary:
Quick docstring change, adding extra line to properly parse

Pull Request resolved: https://github.com/pytorch/audio/pull/2575

Reviewed By: mthrok

Differential Revision: D38138566

Pulled By: skim0514

fbshipit-source-id: fc1ed68ed0050e194944714c753fb35adc85b27e

379487de

Switch to flashlight decoder from upstream (#2557) · 075a7706

Moto Hira authored Jul 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2557

Allow the use of flahslight-decoder from upstream

Reviewed By: carolineechen

Differential Revision: D37983846

fbshipit-source-id: edb1b701bd18718b3b10cf51cc63d3924d4cc073

075a7706

New Pipeline edits for HDemucs (#2565) · 4c4da32c

Sean Kim authored Jul 25, 2022

Summary:
Created new branch and brought in commits due to rebasing issues, resolved conflicts on new branch, close old branch.

Pull Request resolved: https://github.com/pytorch/audio/pull/2565

Reviewed By: nateanl, mthrok

Differential Revision: D38131189

Pulled By: skim0514

fbshipit-source-id: 96531480cf50562944abb28d70879f21b4609f15

4c4da32c

Delay the import of kaldi_io (#2573) · 45f512f6

Abhinav Gupta authored Jul 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2573

Moved the call to kaldo_io to each function (instead of up top) to delay the call.

Reviewed By: mthrok

Differential Revision: D38108022

fbshipit-source-id: 4ba8cc6a942a00de83668bbb7e361d5ae8b773eb

45f512f6

25 Jul, 2022 3 commits

[BC-breaking] Fix momentum in transforms.GriffinLim (#2568) · 1634ed01

proxyphi authored Jul 25, 2022

Summary:
The momentum in GriffinLim transform is modified before being passed
to the functional. causing inconsistency between functional and transforms.

Fix this by making it pass through in transform.

Fixes https://github.com/pytorch/audio/issues/2567

Pull Request resolved: https://github.com/pytorch/audio/pull/2568

Reviewed By: nateanl

Differential Revision: D38117632

Pulled By: mthrok

fbshipit-source-id: 99754be4b3b6dea45ba115aaea9fb6d7285bc2c9

1634ed01

Integration test fix deleting temporary directory (#2569) · 8dcf06ac

Sean Kim authored Jul 25, 2022

Summary:
Previous Issue: --use-tmp-hub-dir expected the temp directories used to store large file to be deleted after each test case, but pytest erases directories after 3 full test sessions. This commit fixes by manually deleting a new subdirectory created in each test case. https://github.com/pytorch/audio/pull/2565#discussion_r929007101

Pull Request resolved: https://github.com/pytorch/audio/pull/2569

Reviewed By: nateanl

Differential Revision: D38117848

Pulled By: skim0514

fbshipit-source-id: 3767cb8df1238fd6218f6aaa58d5d583cea72699

8dcf06ac

Fix build_docs job (#2543) · 81780c95

moto authored Jul 25, 2022

Summary:
This commit fix build_docs job timeout by pinning `resampy=0.2.2`.

For some mysterious reason, `resampy=0.3.1` causes slowdown of unrelated code. https://github.com/bmcfee/resampy/issues/106

Pull Request resolved: https://github.com/pytorch/audio/pull/2543

Reviewed By: carolineechen

Differential Revision: D38115003

Pulled By: mthrok

fbshipit-source-id: 67cd1c73dd4adb3091e0b88aaf5c31de0dd4b87e

81780c95

22 Jul, 2022 2 commits

Add dimension and shape check (#2563) · b1f510fa

Sean Kim authored Jul 22, 2022

Summary:
Don't allow users to input incorrect dimensions

Pull Request resolved: https://github.com/pytorch/audio/pull/2563

Reviewed By: carolineechen

Differential Revision: D38074360

Pulled By: skim0514

fbshipit-source-id: 7bcae515706eb358ca6f68c50c7c0ccace1c3f95

b1f510fa

Add documents for SourceSeparationBundle (#2559) · 6cee56ab

Zhaoheng Ni authored Jul 22, 2022

Summary:
- Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
- Add citation of Libri2Mix dataset in the bundle documentation.
- url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.

Pull Request resolved: https://github.com/pytorch/audio/pull/2559

Reviewed By: carolineechen

Differential Revision: D38036116

Pulled By: nateanl

fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836

6cee56ab

21 Jul, 2022 4 commits

fix resample (#2561) · c18a103b

Sean Kim authored Jul 21, 2022

Summary:
Added back device in case of tensor creation

Pull Request resolved: https://github.com/pytorch/audio/pull/2561

Reviewed By: mthrok

Differential Revision: D38035351

Pulled By: skim0514

fbshipit-source-id: bdea07cbb34d0aa487187cded1a5636da6623d96

c18a103b

Fix fall back failure in sox_io backend (#2560) · 4778c2e5

Jumon Nozaki authored Jul 21, 2022

Summary:
Fix the fallback function of load fileobj function in sox_io backend.

The typo in the fallback function prevents showing the intended error message.

Pull Request resolved: https://github.com/pytorch/audio/pull/2560

Reviewed By: carolineechen, nateanl

Differential Revision: D38035077

Pulled By: mthrok

fbshipit-source-id: 53c91c0569c7e7bba611aed6ea748dbd2f323221

4778c2e5

ci: Update macos runners to AWS self hosted (#2556) · f0088599

Eli Uriegas authored Jul 21, 2022

Summary:
Updates the runner to the latest apple silicon machines we have that
also run on macOS 12.4

Similar to https://github.com/pytorch/vision/pull/6290

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Pull Request resolved: https://github.com/pytorch/audio/pull/2556

Reviewed By: atalman, mthrok

Differential Revision: D37999959

Pulled By: seemethere

fbshipit-source-id: 01d2ff01e48dcc0c4e33ed81758886fa19642aa3

f0088599

Add SourceSeparationBundle to prototype (#2440) · 83362580

Zhaoheng Ni authored Jul 20, 2022

Summary:
- Add SourceSeparationBundle class for source separation pipeline
- Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset.
- Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value.

Pull Request resolved: https://github.com/pytorch/audio/pull/2440

Reviewed By: mthrok

Differential Revision: D37997646

Pulled By: nateanl

fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe

83362580

20 Jul, 2022 1 commit

Speed up resample with kernel generation modification (#2553) · 5c6e602c

Sean Kim authored Jul 20, 2022

Summary:
Modification from pull request https://github.com/pytorch/audio/issues/2415 to improve resample.

Benchmarked for a 89% time reduction, tested in comparison to original resample method.

Pull Request resolved: https://github.com/pytorch/audio/pull/2553

Reviewed By: carolineechen

Differential Revision: D37997533

Pulled By: skim0514

fbshipit-source-id: ef4b719450ac26794db6ea01f9882509f4fda5cf

5c6e602c