"csrc/vscode:/vscode.git/clone" did not exist on "333351c7c85760df23f5009b60c8b5b198f00d4c"
- 05 Aug, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT. Pull Request resolved: https://github.com/pytorch/audio/pull/2602 Reviewed By: nateanl, mthrok Differential Revision: D38450771 Pulled By: hwangjeff fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b
-
- 03 Aug, 2022 2 commits
-
-
Sean Kim authored
Summary: Add new model pretrained weights and tests Pull Request resolved: https://github.com/pytorch/audio/pull/2601 Reviewed By: carolineechen, nateanl Differential Revision: D38396673 Pulled By: skim0514 fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1
-
bshall authored
Summary: I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details: - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`). - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything. - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature. - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support? I hope this is helpful! looking forward to hearing from you. Pull Request resolved: https://github.com/pytorch/audio/pull/2472 Reviewed By: hwangjeff Differential Revision: D38389155 Pulled By: carolineechen fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
-
- 29 Jul, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech. - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram. - FIx the figure in `rtf_power` subsection. - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`. - Print PESQ, STOI, and SDR metric scores. Pull Request resolved: https://github.com/pytorch/audio/pull/2527 Reviewed By: mthrok Differential Revision: D38190218 Pulled By: nateanl fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
-
- 28 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Add tutorial python file, draft PR, will continue to modify accordingly to feedback. Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments Pull Request resolved: https://github.com/pytorch/audio/pull/2572 Reviewed By: carolineechen, nateanl, mthrok Differential Revision: D38234001 Pulled By: skim0514 fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5
-
- 26 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Created new branch and brought in commits due to rebasing issues, resolved conflicts on new branch, close old branch. Pull Request resolved: https://github.com/pytorch/audio/pull/2565 Reviewed By: nateanl, mthrok Differential Revision: D38131189 Pulled By: skim0514 fbshipit-source-id: 96531480cf50562944abb28d70879f21b4609f15
-
- 25 Jul, 2022 1 commit
-
-
moto authored
Summary: This commit fix build_docs job timeout by pinning `resampy=0.2.2`. For some mysterious reason, `resampy=0.3.1` causes slowdown of unrelated code. https://github.com/bmcfee/resampy/issues/106 Pull Request resolved: https://github.com/pytorch/audio/pull/2543 Reviewed By: carolineechen Differential Revision: D38115003 Pulled By: mthrok fbshipit-source-id: 67cd1c73dd4adb3091e0b88aaf5c31de0dd4b87e
-
- 22 Jul, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`. - Add citation of Libri2Mix dataset in the bundle documentation. - url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string. Pull Request resolved: https://github.com/pytorch/audio/pull/2559 Reviewed By: carolineechen Differential Revision: D38036116 Pulled By: nateanl fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836
-
- 19 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Factory functions have been added to HDemucs class and test the implementation within the testing files. Pull Request resolved: https://github.com/pytorch/audio/pull/2547 Reviewed By: carolineechen Differential Revision: D37948600 Pulled By: skim0514 fbshipit-source-id: 7ac4e4a71519450cfbbc24ff7d7e70521f676040
-
- 12 Jul, 2022 2 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2313 Reviewed By: carolineechen, nateanl Differential Revision: D37799552 Pulled By: mthrok fbshipit-source-id: 12e27fccb7098f3142e9ca0b748c71325cd324ee
-
Sean Kim authored
Summary: Draft PR with initial model implementation with minor changes from previous implementation Pull Request resolved: https://github.com/pytorch/audio/pull/2506 Reviewed By: nateanl Differential Revision: D37762671 Pulled By: skim0514 fbshipit-source-id: b7dc0a6ef725d6ae6d76c23c882623f7d339977c
-
- 07 Jul, 2022 1 commit
-
-
moto authored
Summary: Following the formatter changes heppened in fbcode, this commit update the linter config. Pull Request resolved: https://github.com/pytorch/audio/pull/2389 Reviewed By: hwangjeff Differential Revision: D37659649 Pulled By: mthrok fbshipit-source-id: 1c52ff93f0b10cb2e7303d2ad13b2d65ffccfcb0
-
- 27 Jun, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: This PR adds two dataset classes of VoxCeleb1 corpus. - `VoxCeleb1Identification` Each data sample contains the waveform, sample rate, speaker id, and the file id. - `VoxCeleb1Verification` Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids. Pull Request resolved: https://github.com/pytorch/audio/pull/2349 Reviewed By: carolineechen Differential Revision: D35927921 Pulled By: nateanl fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449
-
- 21 Jun, 2022 1 commit
-
-
Sean Kim authored
Summary: Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks. Pull Request resolved: https://github.com/pytorch/audio/pull/2484 Reviewed By: carolineechen, nateanl Differential Revision: D37250556 Pulled By: skim0514 fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51
-
- 20 Jun, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2480 Reviewed By: nateanl Differential Revision: D37249571 Pulled By: carolineechen fbshipit-source-id: caefeec4253c91f2579655a0c1735edaeed51be9
-
- 08 Jun, 2022 2 commits
-
-
moto authored
Summary: https://output.circle-artifacts.com/output/job/75187a52-b0d8-4cac-89f3-24e10889a36a/artifacts/0/docs/hw_acceleration_tutorial.html 1. Update HW decoding tutorial to include file-like object 1. Add note about unseekable object int streaming API tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/2408 Reviewed By: hwangjeff Differential Revision: D36632702 Pulled By: mthrok fbshipit-source-id: 17be2fb8528cb1d2d1ee11901b6a95c512466feb
-
moto authored
Summary: The Streaming API tutorial has gotten long, so this commit split it into two. Pull Request resolved: https://github.com/pytorch/audio/pull/2446 Reviewed By: hwangjeff Differential Revision: D36987513 Pulled By: mthrok fbshipit-source-id: 13e3aad74c0d0e654c39c0eeceffca1a00b0dac4
-
- 04 Jun, 2022 1 commit
-
-
moto authored
Summary: Undesired logs are one of the loudest UX complains we get. Yet, loading media files involves uncertainty which is difficult to debug without debug log. This commit introduces utility functions to configure logging level so that we can ask users to enable it when they encounter an issue, while defaulting to non-verbose option. Pull Request resolved: https://github.com/pytorch/audio/pull/2439 Reviewed By: hwangjeff, xiaohui-zhang Differential Revision: D36903763 Pulled By: mthrok fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987
-
- 01 Jun, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2411 Reviewed By: carolineechen Differential Revision: D36663904 Pulled By: nateanl fbshipit-source-id: c6a7dd530c9cfbb58b7121ebe02db6ae293cc2d0
-
Caroline Chen authored
Summary: Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module. hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well?? Pull Request resolved: https://github.com/pytorch/audio/pull/2410 Reviewed By: mthrok Differential Revision: D36784521 Pulled By: carolineechen fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
-
- 24 May, 2022 2 commits
-
-
moto authored
Summary: Follow-up of https://github.com/pytorch/audio/issues/2407, the <script> was not properly closed on pages other than tutorials Pull Request resolved: https://github.com/pytorch/audio/pull/2409 Reviewed By: carolineechen Differential Revision: D36632668 Pulled By: mthrok fbshipit-source-id: 9c0409a8011d77f8689e2dcdc1bd9844d3d31f79
-
moto authored
Summary: This commit fixes multiple issues with documentation. https://output.circle-artifacts.com/output/job/23245537-e57b-4b9d-9b81-b3df20996d1f/artifacts/0/docs/tutorials/audio_resampling_tutorial.html 1. Duplicated requirejs The nbsphinx extension introduced in https://github.com/pytorch/audio/pull/2393 pulled a requirejs which caused the initialization script to halt. As a result, the right side bar was left uninitialized. 2. Undefined variable error It turned out that PyTorch's theme expected the downstream projects to define `collapsedSections` variable. Currently console log shows `collapsedSections is not defined`. As a result of this fix, we start to see the + symbol on left side. 3. Fix the behavior of default expand Tweaks the right-side bar initialization behavior so that expand-all only happens once, not at every resize. 4. Overwrite the link to GitHub The `GitHub` tab in main-menu always linked PyTorch core. This commit adds overwrite to torchaudio page Pull Request resolved: https://github.com/pytorch/audio/pull/2407 Reviewed By: carolineechen Differential Revision: D36612904 Pulled By: mthrok fbshipit-source-id: 56aa7623a8925a241cf4790ac77a87424ad9237c
-
- 23 May, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose. It contains "10 min", "1 hour", "10 hour" splits. Pull Request resolved: https://github.com/pytorch/audio/pull/2302 Reviewed By: mthrok Differential Revision: D36388188 Pulled By: nateanl fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249
-
- 20 May, 2022 1 commit
-
-
moto authored
Summary: This commit adds tutorial to enable/use NVDEC with Stream API. https://output.circle-artifacts.com/output/job/19e66a4b-1819-4804-8834-d38e6c80c4fd/artifacts/0/docs/hw_acceleration_tutorial.html Because the use of NVDEC requires build / install FFmpeg from source, this tutorial was authored on Google Colab, tailored to its environment. The tutorial here is the result of the notebook execution, with the link to the publicly accessible Google Colab notebook. Pull Request resolved: https://github.com/pytorch/audio/pull/2393 Reviewed By: hwangjeff Differential Revision: D36404408 Pulled By: mthrok fbshipit-source-id: 9c820d3db4d06c5b343ecad0708489125ca06948
-
- 17 May, 2022 1 commit
-
-
moto authored
Summary: This commit updates the `window.sideMenus.handleRightMenu`, so that subsections are expanded on tutorials by default. https://output.circle-artifacts.com/output/job/98508917-87df-4666-9958-c70683b3245d/artifacts/0/docs/tutorials/audio_io_tutorial.html Tutorial subsections are important because they have anchors so allow us to get the link to the specific figures / audio samples. When responding issues/questions and when there is a corresponding code snippet in tutorial, it is often easy to answer with links to the tutorial. However, by default the tutorial page collapses right side bar, and I have to click the small "+" symbols to navigate to the subsection, and the state of expansion does not persist across the page refresh. This has been a pain point since we updated the Sphinx version to 3 in https://github.com/pytorch/audio/pull/1685. Pull Request resolved: https://github.com/pytorch/audio/pull/2397 Reviewed By: xiaohui-zhang Differential Revision: D36429745 Pulled By: mthrok fbshipit-source-id: 97a5ae9270e68f8e88f0bca766d5a2c1839634e3
-
- 13 May, 2022 1 commit
-
-
moto authored
Summary: This commit moves the Streaming API out of prototype module. * The related classes are renamed as following - `Streamer` -> `StreamReader`. - `SourceStream` -> `StreamReaderSourceStream` - `SourceAudioStream` -> `StreamReaderSourceAudioStream` - `SourceVideoStream` -> `StreamReaderSourceVideoStream` - `OutputStream` -> `StreamReaderOutputStream` This change is preemptive measurement for the possibility to add `StreamWriter` API. * Replace BUILD_FFMPEG build arg with USE_FFMPEG We are not building FFmpeg, so USE_FFMPEG is more appropriate --- After https://github.com/pytorch/audio/issues/2377 Remaining TODOs: (different PRs) - [ ] Introduce `is_ffmpeg_binding_available` function. - [ ] Refactor C++ code: - Rename `Streamer` to `StreamReader`. - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`. - Rename `prototype.cpp` to `stream_reader_binding.cpp`. - Introduce `stream_reader` directory. - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381) Pull Request resolved: https://github.com/pytorch/audio/pull/2378 Reviewed By: carolineechen Differential Revision: D36359299 Pulled By: mthrok fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
-
- 10 May, 2022 4 commits
-
-
hwangjeff authored
Summary: Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241. Continuation of https://github.com/pytorch/audio/issues/2324. Pull Request resolved: https://github.com/pytorch/audio/pull/2358 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D36137992 Pulled By: hwangjeff fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. The input arguments are: - multi-channel spectrum. - RTF vector of the target speech - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2368 Reviewed By: carolineechen Differential Revision: D36214940 Pulled By: nateanl fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are: - multi-channel spectrum. - PSD matrix of target speech. - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2367 Reviewed By: hwangjeff Differential Revision: D36198015 Pulled By: nateanl fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371 Reviewed By: xiaohui-zhang Differential Revision: D36246167 Pulled By: carolineechen fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4
-
- 26 Apr, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
Zhaoheng Ni authored
Summary: The `LibriMix` dataset is missing on the [documentation webpage](https://pytorch.org/audio/stable/datasets.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2351 Reviewed By: carolineechen Differential Revision: D35926695 Pulled By: nateanl fbshipit-source-id: 168aed3bb15510d1b1ec57d77727932e481aca48
-
- 21 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2339 Reviewed By: nateanl Differential Revision: D35806529 Pulled By: hwangjeff fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
-
- 18 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py) modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline Pull Request resolved: https://github.com/pytorch/audio/pull/2290 Reviewed By: nateanl Differential Revision: D35692551 Pulled By: carolineechen fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c
-
- 12 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following: - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances. - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`. - Introduces tests for `conformer_rnnt_model`. - Adds docs. Pull Request resolved: https://github.com/pytorch/audio/pull/2322 Reviewed By: xiaohui-zhang Differential Revision: D35565987 Pulled By: hwangjeff fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
-
- 08 Apr, 2022 1 commit
-
-
moto authored
Summary: Add badges of supported properties and devices to functionals and transforms. This commit adds `.. devices::` and `.. properties::` directives to sphinx. APIs with these directives will have badges (based off of shields.io) which link to the page with description of these features. Continuation of https://github.com/pytorch/audio/issues/2316 Excluded dtypes for further improvement, and actually added badges to most of functional/transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2321 Reviewed By: hwangjeff Differential Revision: D35489063 Pulled By: mthrok fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
-
- 26 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: `build_docs` test is failing on CI with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`, but with local build: <img width="902" alt="Screen Shot 2022-03-25 at 4 02 53 PM" src="https://user-images.githubusercontent.com/16568633/160157472-c91ff9b2-a2be-4c5d-959e-53b9f45425c6.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2291 Reviewed By: mthrok Differential Revision: D35147098 Pulled By: carolineechen fbshipit-source-id: 682b3800d0ed5c56b402d83f221136725051ba7e
-
- 25 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: `build_docs` CircleCI currently failing with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`. Pin Jinja2<3.1 to resolve this issue, see https://github.com/sphinx-doc/sphinx/issues/10291#issuecomment-1078046986 Pull Request resolved: https://github.com/pytorch/audio/pull/2292 Reviewed By: mthrok Differential Revision: D35148397 Pulled By: carolineechen fbshipit-source-id: 963efe2fcdee13dead4a4d542c903913c6eaa505
-
- 24 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: rendered: - [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2278 Reviewed By: mthrok Differential Revision: D35097734 Pulled By: carolineechen fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c
-
- 26 Feb, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: This PR adds ``apply_beamforming`` method to ``torchaudio.functional``. The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum. Pull Request resolved: https://github.com/pytorch/audio/pull/2232 Reviewed By: mthrok Differential Revision: D34474561 Pulled By: nateanl fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
-