- 08 Nov, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss. If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function. The following should produce the same output: ``` rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True) ``` ``` log_probs = torch.nn.functional.log_softmax(logits, dim=-1) rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False) ``` testing -- unit tests + get same results on the conformer rnnt recipe Pull Request resolved: https://github.com/pytorch/audio/pull/2798 Reviewed By: xiaohui-zhang Differential Revision: D41083523 Pulled By: carolineechen fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
-
hwangjeff authored
Summary: Adds `torch.nn.Module`-based implementations for convolution and FFT convolution. Pull Request resolved: https://github.com/pytorch/audio/pull/2811 Reviewed By: carolineechen Differential Revision: D40881937 Pulled By: hwangjeff fbshipit-source-id: bfe8969e6178ad4f58981efd4b2720ac006be8de
-
- 04 Nov, 2022 1 commit
-
-
moto authored
Summary: StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption. This commit fixes it by properly computing time_base from frame rate. Address https://github.com/pytorch/audio/issues/2830 Pull Request resolved: https://github.com/pytorch/audio/pull/2831 Reviewed By: carolineechen Differential Revision: D41036084 Pulled By: mthrok fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609
-
- 31 Oct, 2022 1 commit
-
-
Joao Gomes authored
Summary: cc mthrok Implements precise seek and seek to any frame in torchaudio Pull Request resolved: https://github.com/pytorch/audio/pull/2737 Reviewed By: mthrok Differential Revision: D40546716 Pulled By: jdsgomes fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e
-
- 28 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Introduces argument 'mode' for convolution functions, following SciPy's convention. Pull Request resolved: https://github.com/pytorch/audio/pull/2801 Reviewed By: nateanl Differential Revision: D40805405 Pulled By: hwangjeff fbshipit-source-id: 8f0006ffe9e3945b4b17f44c4cfa1adb265c20ef
-
- 26 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Initializer parameter `onesided` isn't relevant to `MelSpectrogram` — it should always be `True`. In fact, the module already assumes `onesided == True` in the filterbank it generates and fails in its forward pass when `onesided == False`. Accordingly, this PR makes param `onesided` optional and adds a deprecation warning that's fired when the param is provided. Pull Request resolved: https://github.com/pytorch/audio/pull/2797 Reviewed By: carolineechen, xiaohui-zhang Differential Revision: D40731238 Pulled By: hwangjeff fbshipit-source-id: 6eea8eb9d4a85a805162e03ad91682a1946f92cd
-
- 25 Oct, 2022 1 commit
-
-
moto authored
Summary: Addresses https://github.com/pytorch/audio/issues/2790. Previously AVPacket objects had duration==0. `av_interleaved_write_frame` function was inferring the duration of packets by comparing them against the next ones but It could not infer the duration of the last packet, as there is no subsequent frame, thus was omitting it from the final data. This commit fixes it by explicitly setting packet duration = 1 (one frame) only for video. (audio AVPacket contains multiple samples, so it's different. To ensure the correctness for audio, the tests were added.) Pull Request resolved: https://github.com/pytorch/audio/pull/2789 Reviewed By: xiaohui-zhang Differential Revision: D40627439 Pulled By: mthrok fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320
-
- 19 Oct, 2022 2 commits
-
-
Caroline Chen authored
Summary: add ability to load only improvised or only scripted utterances. Pull Request resolved: https://github.com/pytorch/audio/pull/2778 Reviewed By: nateanl Differential Revision: D40511865 Pulled By: carolineechen fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775 Reviewed By: carolineechen Differential Revision: D40481144 Pulled By: nateanl fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e
-
- 12 Oct, 2022 1 commit
-
-
Caroline Chen authored
Summary: a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci cc atalman Pull Request resolved: https://github.com/pytorch/audio/pull/2758 Reviewed By: mthrok Differential Revision: D40290535 Pulled By: carolineechen fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57
-
- 11 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738 Reviewed By: carolineechen Differential Revision: D40238099 Pulled By: nateanl fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e
-
- 10 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Besides the unit test, the PR also addresses these issues: - The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use. - If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target. Pull Request resolved: https://github.com/pytorch/audio/pull/2659 Reviewed By: carolineechen Differential Revision: D40229227 Pulled By: nateanl fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235
-
- 09 Oct, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732 Reviewed By: nateanl Differential Revision: D40186996 Pulled By: nateanl fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4
-
- 07 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524. Pull Request resolved: https://github.com/pytorch/audio/pull/2740 Reviewed By: nateanl Differential Revision: D40168639 Pulled By: nateanl fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24
-
- 21 Sep, 2022 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2694 This commit adds Tensor type as input to `StreamReader`. The Tensor is interpreted as byte string buffer. Reviewed By: hwangjeff Differential Revision: D39467630 fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e
-
- 14 Sep, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673 Reviewed By: mthrok Differential Revision: D39507612 Pulled By: carolineechen fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53
-
- 13 Sep, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669 Reviewed By: carolineechen, mthrok Differential Revision: D39433560 Pulled By: nateanl fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb
-
- 12 Sep, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668 Reviewed By: nateanl, mthrok Differential Revision: D39433671 Pulled By: carolineechen fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c
-
- 01 Sep, 2022 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648 Reviewed By: nateanl Differential Revision: D38976874 Pulled By: mthrok fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864
-
- 24 Aug, 2022 1 commit
-
-
moto authored
Summary: This commit adds FFmpeg-based encoder StreamWriter class. StreamWriter is pretty much the opposite of StreamReader class, and it supports; * Encoding audio / still image / video * Exporting to local file / streaming protocol / devices etc... * File-like object support (in later commit) * HW video encoding (in later commit) See also: https://fburl.com/gslide/z85kn5a9 (Meta internal) Pull Request resolved: https://github.com/pytorch/audio/pull/2628 Reviewed By: nateanl Differential Revision: D38816650 Pulled By: mthrok fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8
-
- 11 Aug, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise. Pull Request resolved: https://github.com/pytorch/audio/pull/2608 Reviewed By: nateanl Differential Revision: D38557141 Pulled By: hwangjeff fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0
-
- 09 Aug, 2022 1 commit
-
-
Caroline Chen authored
Summary: Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs. The `ctc_decoder` API is as follows - To decode with KenLM, pass in KenLM language model path to `lm` variable - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`. - To decode without a language model, set `lm` to `None` (default) Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM. Follow ups: - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory cc jacobkahn Pull Request resolved: https://github.com/pytorch/audio/pull/2528 Reviewed By: mthrok Differential Revision: D38243802 Pulled By: carolineechen fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
-
- 05 Aug, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT. Pull Request resolved: https://github.com/pytorch/audio/pull/2602 Reviewed By: nateanl, mthrok Differential Revision: D38450771 Pulled By: hwangjeff fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b
-
- 03 Aug, 2022 2 commits
-
-
Sean Kim authored
Summary: Add new model pretrained weights and tests Pull Request resolved: https://github.com/pytorch/audio/pull/2601 Reviewed By: carolineechen, nateanl Differential Revision: D38396673 Pulled By: skim0514 fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1
-
bshall authored
Summary: I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details: - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`). - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything. - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature. - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support? I hope this is helpful! looking forward to hearing from you. Pull Request resolved: https://github.com/pytorch/audio/pull/2472 Reviewed By: hwangjeff Differential Revision: D38389155 Pulled By: carolineechen fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
-
- 28 Jul, 2022 2 commits
-
-
Sean Kim authored
Summary: Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104 Pull Request resolved: https://github.com/pytorch/audio/pull/2554 Reviewed By: carolineechen, mthrok Differential Revision: D38247554 Pulled By: skim0514 fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602
-
Sean Kim authored
Summary: Edit factory function's docstrings. Pull Request resolved: https://github.com/pytorch/audio/pull/2570 Reviewed By: carolineechen Differential Revision: D38250369 Pulled By: skim0514 fbshipit-source-id: fa777e37d7cc517cf4ff1842d5585bf36558f50a
-
- 26 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Created new branch and brought in commits due to rebasing issues, resolved conflicts on new branch, close old branch. Pull Request resolved: https://github.com/pytorch/audio/pull/2565 Reviewed By: nateanl, mthrok Differential Revision: D38131189 Pulled By: skim0514 fbshipit-source-id: 96531480cf50562944abb28d70879f21b4609f15
-
- 25 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Previous Issue: --use-tmp-hub-dir expected the temp directories used to store large file to be deleted after each test case, but pytest erases directories after 3 full test sessions. This commit fixes by manually deleting a new subdirectory created in each test case. https://github.com/pytorch/audio/pull/2565#discussion_r929007101 Pull Request resolved: https://github.com/pytorch/audio/pull/2569 Reviewed By: nateanl Differential Revision: D38117848 Pulled By: skim0514 fbshipit-source-id: 3767cb8df1238fd6218f6aaa58d5d583cea72699
-
- 22 Jul, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`. - Add citation of Libri2Mix dataset in the bundle documentation. - url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string. Pull Request resolved: https://github.com/pytorch/audio/pull/2559 Reviewed By: carolineechen Differential Revision: D38036116 Pulled By: nateanl fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836
-
- 21 Jul, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Add SourceSeparationBundle class for source separation pipeline - Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset. - Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value. Pull Request resolved: https://github.com/pytorch/audio/pull/2440 Reviewed By: mthrok Differential Revision: D37997646 Pulled By: nateanl fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe
-
- 19 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Factory functions have been added to HDemucs class and test the implementation within the testing files. Pull Request resolved: https://github.com/pytorch/audio/pull/2547 Reviewed By: carolineechen Differential Revision: D37948600 Pulled By: skim0514 fbshipit-source-id: 7ac4e4a71519450cfbbc24ff7d7e70521f676040
-
- 12 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Draft PR with initial model implementation with minor changes from previous implementation Pull Request resolved: https://github.com/pytorch/audio/pull/2506 Reviewed By: nateanl Differential Revision: D37762671 Pulled By: skim0514 fbshipit-source-id: b7dc0a6ef725d6ae6d76c23c882623f7d339977c
-
- 07 Jul, 2022 1 commit
-
-
moto authored
Summary: This commit add support for `"yuv444p"` type as output format of StreamReader. Pull Request resolved: https://github.com/pytorch/audio/pull/2516 Reviewed By: hwangjeff Differential Revision: D37659715 Pulled By: mthrok fbshipit-source-id: eae9b5590d8f138a6ebf3808c08adfe068f11a2b
-
- 06 Jul, 2022 1 commit
-
-
Caroline Chen authored
Summary: fluent dataset test currently fails on windows, due to new line generation in csv writer in testing and incorrect path parsing in dataset impl. Pull Request resolved: https://github.com/pytorch/audio/pull/2510 Reviewed By: carolineechen Differential Revision: D37573203 Pulled By: mthrok fbshipit-source-id: 4868bc649690c7e596b002686c6128ce735d3564
-
- 28 Jun, 2022 1 commit
-
-
moto authored
Summary: Small clean up in ffmpeg binding code. 1. Make `get_option_dict` and `clean_up_dict` public utility 2. Merge the exception into `clean_up_dict` 3. Get rid of custom string join function and use `c10::Join`. Pull Request resolved: https://github.com/pytorch/audio/pull/2507 Reviewed By: hwangjeff Differential Revision: D37466022 Pulled By: mthrok fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969
-
- 27 Jun, 2022 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2511 Reviewed By: nateanl Differential Revision: D37461021 Pulled By: mthrok fbshipit-source-id: 6f894c02bbefc5afda0f9584d26ad785f7c71ee4
-
Zhaoheng Ni authored
Summary: In https://github.com/pytorch/audio/issues/2283, torchaudio's downloading function is updated to reduce code duplication. The links in `EMFORMER_RNNT_BASE_LIBRISPEECH` are updated, but the ones in prototype pipelines are not. This PR addresses it by updating the download links of `EMFORMER_RNNT_BASE_MUSTC` and `EMFORMER_RNNT_BASE_TEDLIUM3` in prototype. Corresponding integration tests are added as well. Pull Request resolved: https://github.com/pytorch/audio/pull/2444 Reviewed By: mthrok Differential Revision: D37389178 Pulled By: nateanl fbshipit-source-id: 46598dd71c95be47d1e1b54cef89ea51d280e17a
-
moto authored
Summary: Follow-up of https://github.com/pytorch/audio/issues/2464. Add utility function to fetch the versions of FFmpeg. Pull Request resolved: https://github.com/pytorch/audio/pull/2467 Reviewed By: carolineechen Differential Revision: D37028006 Pulled By: mthrok fbshipit-source-id: 72adce1e6b43985760ce55b715b0e59af5244fdb
-
Zhaoheng Ni authored
Summary: This PR adds two dataset classes of VoxCeleb1 corpus. - `VoxCeleb1Identification` Each data sample contains the waveform, sample rate, speaker id, and the file id. - `VoxCeleb1Verification` Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids. Pull Request resolved: https://github.com/pytorch/audio/pull/2349 Reviewed By: carolineechen Differential Revision: D35927921 Pulled By: nateanl fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449
-