- 18 Aug, 2022 5 commits
-
-
Ravi Makhija authored
Summary: Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2635 Reviewed By: carolineechen Differential Revision: D38830318 Pulled By: nateanl fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83
-
moto authored
Summary: Google Colab now has torchaudio 0.12 pre-installed. This commit removes the note about nightly build. Pull Request resolved: https://github.com/pytorch/audio/pull/2632 Reviewed By: carolineechen Differential Revision: D38827632 Pulled By: mthrok fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb
-
moto authored
Summary: Resolves the following warnings ``` /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation. /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation. /torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found. /torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2630 Reviewed By: nateanl Differential Revision: D38816632 Pulled By: mthrok fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635
-
moto authored
Summary: This commit fixes the issue with the recent Sphinx-Gallery update. Also it pins the versions of Sphinx-related packages. Before: <img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png"> After: https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2629 Reviewed By: carolineechen Differential Revision: D38816417 Pulled By: mthrok fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852
-
moto authored
Summary: Resolves the following warning ``` /torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short. :hidden:`Loudness` ----------------- ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2627 Reviewed By: carolineechen Differential Revision: D38814802 Pulled By: mthrok fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749
-
- 16 Aug, 2022 4 commits
-
-
Zhaoheng Ni authored
Summary: To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2618 Reviewed By: carolineechen Differential Revision: D38744137 Pulled By: nateanl fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd
-
Ravi Makhija authored
Summary: Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2615 Reviewed By: carolineechen Differential Revision: D38743117 Pulled By: nateanl fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207
-
Ravi Makhija authored
Summary: Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2616 Reviewed By: carolineechen Differential Revision: D38743145 Pulled By: nateanl fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486
-
Andrey Talman authored
Summary: Similar to https://github.com/pytorch/vision/pull/6218 Fixing MacOS builds Pull Request resolved: https://github.com/pytorch/audio/pull/2622 Reviewed By: weiwangmeta Differential Revision: D38722983 Pulled By: atalman fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85
-
- 15 Aug, 2022 3 commits
-
-
Andrey Talman authored
Summary: Same as: https://github.com/pytorch/vision/pull/6422 Testing: ``` export ANACONDA_PATH=$(conda info --base)/bin echo $ANACONDA_PATH /opt/homebrew/Caskroom/miniconda/base/bin $ANACONDA_PATH/anaconda -V anaconda Command line client (version 1.10.0) ``` Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/audio/pull/2621 Reviewed By: weiwangmeta, seemethere Differential Revision: D38714324 Pulled By: atalman fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e
-
moto authored
Summary: The link to version selector has been absolute link, which had been a trap when reviewing gh-pages deployment from folk. This commit changes that to relative link. Pull Request resolved: https://github.com/pytorch/audio/pull/2605 Test Plan: - https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html - https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html Reviewed By: carolineechen, nateanl Differential Revision: D38695645 Pulled By: mthrok fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2
-
Zhaoheng Ni authored
Summary: `ctc_decoder` has become beta, remove it from prototype documents. Pull Request resolved: https://github.com/pytorch/audio/pull/2617 Reviewed By: hwangjeff Differential Revision: D38706869 Pulled By: nateanl fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02
-
- 12 Aug, 2022 1 commit
-
-
Andrey Talman authored
Summary: Introducing pytorch-cuda metapackage Same as: https://github.com/pytorch/vision/pull/6371 Following PR: https://github.com/pytorch/builder/pull/1094 Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge. Pull Request resolved: https://github.com/pytorch/audio/pull/2612 Reviewed By: hwangjeff, seemethere, nateanl Differential Revision: D38633332 Pulled By: atalman fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501
-
- 11 Aug, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise. Pull Request resolved: https://github.com/pytorch/audio/pull/2608 Reviewed By: nateanl Differential Revision: D38557141 Pulled By: hwangjeff fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0
-
- 10 Aug, 2022 3 commits
-
-
hwangjeff authored
Summary: https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script. Pull Request resolved: https://github.com/pytorch/audio/pull/2611 Reviewed By: carolineechen Differential Revision: D38578892 Pulled By: hwangjeff fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac
-
Kunal Upadya authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2609 Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases. Reviewed By: mthrok Differential Revision: D38515029 fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9
-
Sean Kim authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607 Reviewed By: carolineechen, nateanl Differential Revision: D38522606 Pulled By: skim0514 fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668
-
- 09 Aug, 2022 1 commit
-
-
Caroline Chen authored
Summary: Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs. The `ctc_decoder` API is as follows - To decode with KenLM, pass in KenLM language model path to `lm` variable - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`. - To decode without a language model, set `lm` to `None` (default) Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM. Follow ups: - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory cc jacobkahn Pull Request resolved: https://github.com/pytorch/audio/pull/2528 Reviewed By: mthrok Differential Revision: D38243802 Pulled By: carolineechen fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
-
- 08 Aug, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606 Reviewed By: nateanl Differential Revision: D38502666 Pulled By: carolineechen fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a
-
- 05 Aug, 2022 4 commits
-
-
hwangjeff authored
Summary: Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT. Pull Request resolved: https://github.com/pytorch/audio/pull/2602 Reviewed By: nateanl, mthrok Differential Revision: D38450771 Pulled By: hwangjeff fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b
-
Caroline Chen authored
Summary: ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial. Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc Pull Request resolved: https://github.com/pytorch/audio/pull/2603 Reviewed By: mthrok Differential Revision: D38459709 Pulled By: carolineechen fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
-
Ravi Makhija authored
Summary: Added example for `SlidingWindowCmn` transform as mentioned in issue https://github.com/pytorch/audio/issues/1564 Pull Request resolved: https://github.com/pytorch/audio/pull/2600 Reviewed By: mthrok Differential Revision: D38395579 Pulled By: carolineechen fbshipit-source-id: 44c5b7181789eedcaaa1d80149d5a1ab8de4c0ba
-
Ravi Makhija authored
Summary: Added example for Vad transform as mentioned in issue https://github.com/pytorch/audio/issues/1564 Pull Request resolved: https://github.com/pytorch/audio/pull/2598 Reviewed By: mthrok Differential Revision: D38432103 Pulled By: carolineechen fbshipit-source-id: 8f7e26c48d4ffb6bfe55bba6f9c7ee915e6edaef
-
- 04 Aug, 2022 1 commit
-
-
Omkar Vichare authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2599 Bootcamp task T127107566. Replacing assert statements with if ... then raise so can be run in optimized mode Reviewed By: mthrok Differential Revision: D38370108 fbshipit-source-id: 74eaf5b72c511b62ddbb8e0e3b0ed638ad49e4f2
-
- 03 Aug, 2022 2 commits
-
-
Sean Kim authored
Summary: Add new model pretrained weights and tests Pull Request resolved: https://github.com/pytorch/audio/pull/2601 Reviewed By: carolineechen, nateanl Differential Revision: D38396673 Pulled By: skim0514 fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1
-
bshall authored
Summary: I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details: - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`). - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything. - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature. - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support? I hope this is helpful! looking forward to hearing from you. Pull Request resolved: https://github.com/pytorch/audio/pull/2472 Reviewed By: hwangjeff Differential Revision: D38389155 Pulled By: carolineechen fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
-
- 02 Aug, 2022 1 commit
-
-
Eli Uriegas authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2581 Also removes spurious lines of code that were erroring out silently Signed-off-by:
Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: carolineechen Differential Revision: D38336705 Pulled By: seemethere fbshipit-source-id: 700a969a4bace7d9ca94a9db908b29f383b7d94e
-
- 01 Aug, 2022 3 commits
-
-
Ravi Makhija authored
Summary: Added example for [Vol transform](https://pytorch.org/audio/stable/transforms.html#torchaudio.transforms.Vol) as mentioned in this issue https://github.com/pytorch/audio/issues/1564. Also made a minor edit to the docstring for `class Vol` to fix a grammar typo and use more common verbiage. Pull Request resolved: https://github.com/pytorch/audio/pull/2597 Reviewed By: nateanl, mthrok Differential Revision: D38316433 Pulled By: carolineechen fbshipit-source-id: 0be8fc505800a59acdab843813767acfdeac8243
-
Ravi Makhija authored
Summary: Fixed minor typo in `Contributing.md`: "diemension" -> "dimension" Pull Request resolved: https://github.com/pytorch/audio/pull/2596 Reviewed By: mthrok Differential Revision: D38315517 Pulled By: carolineechen fbshipit-source-id: 5e771f22a5be008d3be30b4699fb5cc5637c627d
-
moto authored
Summary: In https://github.com/pytorch/audio/pull/2285, the SNR calculation was fixed, but there was still one that was not fixed. This commit fixes it. Also following the feedback https://github.com/pytorch/tutorials/issues/1930#issuecomment-1199741336, update the variable name. Pull Request resolved: https://github.com/pytorch/audio/pull/2595 Reviewed By: carolineechen Differential Revision: D38314672 Pulled By: mthrok fbshipit-source-id: b2015e2709729190d97264aa191651b3af4ba856
-
- 30 Jul, 2022 1 commit
-
-
Ansh Nanda authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2590 Converted assert checks for argument validation to if-else checks so that they are executed in optimized mode as well. Reviewed By: mthrok Differential Revision: D38211246 fbshipit-source-id: 922b5bcafe8214980e535527dd94c3345c1ff3e2
-
- 29 Jul, 2022 4 commits
-
-
moto authored
Summary: 1. Fix initialization. Previously, the SOS token score was initialized to 0 across the time axis. This was biasing the alignment to delay the start. The proper way to delay the SOS is via blank token. The new initilization takes the cumulated sum of blank scores. 2. Fill the end of trellis with Inf Similar to the start, at the end where there remaining time frame is less than the number of tokens, it is no longer possible to align the text, thus we fill with Inf for better visualization. 3. Clean up asset management code. Pull Request resolved: https://github.com/pytorch/audio/pull/2544 Reviewed By: nateanl Differential Revision: D38276478 Pulled By: mthrok fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
-
moto authored
Summary: This commit enables CTC decoder on Windows. The functionality seems to work fine. The tests are passing, the decoding tutorial runs fine. The only difference to the Linux/macOS version is that loading model in XZ compression format is not supported.  Pull Request resolved: https://github.com/pytorch/audio/pull/2587 Reviewed By: carolineechen, nateanl Differential Revision: D38276490 Pulled By: mthrok fbshipit-source-id: f2203b2235c5bbb0220fe560aaaf0e1d5530347a
-
Javier Cardenete Morales authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2592 std::runtime_error does not preserve the C++ stack trace, so it is unclear to users what went wrong internally. PyTorch's TORCH_CHECK macro allows to print C++ stack trace when TORCH_SHOW_CPP_STACKTRACES environment variable is set to 1. Reviewed By: mthrok Differential Revision: D38219331 fbshipit-source-id: f51c27111077e927f97127f73f83a31b8e74f61f
-
Zhaoheng Ni authored
Summary: - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech. - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram. - FIx the figure in `rtf_power` subsection. - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`. - Print PESQ, STOI, and SDR metric scores. Pull Request resolved: https://github.com/pytorch/audio/pull/2527 Reviewed By: mthrok Differential Revision: D38190218 Pulled By: nateanl fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
-
- 28 Jul, 2022 5 commits
-
-
Sean Kim authored
Summary: Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104 Pull Request resolved: https://github.com/pytorch/audio/pull/2554 Reviewed By: carolineechen, mthrok Differential Revision: D38247554 Pulled By: skim0514 fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602
-
Sean Kim authored
Summary: Edit factory function's docstrings. Pull Request resolved: https://github.com/pytorch/audio/pull/2570 Reviewed By: carolineechen Differential Revision: D38250369 Pulled By: skim0514 fbshipit-source-id: fa777e37d7cc517cf4ff1842d5585bf36558f50a
-
moto authored
Summary: This commit gets rid of our copy of CTC decoder code and replace it with upstream Flashlight-Text repo. Pull Request resolved: https://github.com/pytorch/audio/pull/2580 Reviewed By: carolineechen Differential Revision: D38244906 Pulled By: mthrok fbshipit-source-id: d274240fc67675552d19ff35e9a363b9b9048721
-
Sean Kim authored
Summary: Add tutorial python file, draft PR, will continue to modify accordingly to feedback. Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments Pull Request resolved: https://github.com/pytorch/audio/pull/2572 Reviewed By: carolineechen, nateanl, mthrok Differential Revision: D38234001 Pulled By: skim0514 fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5
-
Vamsi Desu authored
Summary: CTC decoder and StreamReader are now in the main library. This commit removes their aliases in `torchaudio.prototypes` Pull Request resolved: https://github.com/pytorch/audio/pull/2583 Reviewed By: mthrok Differential Revision: D38189314 fbshipit-source-id: c62209f2ad4f7052c6756a537b6fc509064e428c
-