- 29 Apr, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS). Pull Request resolved: https://github.com/pytorch/audio/pull/3279 Reviewed By: hwangjeff Differential Revision: D45415404 Pulled By: nateanl fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 18 Apr, 2023 1 commit
-
-
nateanl authored
Summary: The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement. Pull Request resolved: https://github.com/pytorch/audio/pull/3036 Reviewed By: hwangjeff Differential Revision: D45061841 Pulled By: nateanl fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4
-
- 31 Mar, 2023 1 commit
-
-
Nouran Ali authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3222 Reviewed By: nateanl Differential Revision: D44539424 Pulled By: mthrok fbshipit-source-id: 8fbcb5f9918c9930c939bcd448493fa5cf604545
-
- 29 Mar, 2023 1 commit
-
-
moto authored
Summary: There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it. Pull Request resolved: https://github.com/pytorch/audio/pull/3214 Reviewed By: nateanl Differential Revision: D44504030 Pulled By: mthrok fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef
-
- 28 Mar, 2023 1 commit
-
-
nateanl authored
Summary: Fix https://github.com/pytorch/audio/issues/3211 Pull Request resolved: https://github.com/pytorch/audio/pull/3212 Reviewed By: mthrok Differential Revision: D44472523 Pulled By: nateanl fbshipit-source-id: eb519b0045e7518ad13863a53271745a80d89a21
-
- 16 Mar, 2023 1 commit
-
-
jiyuntu-eero authored
Summary: Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`. Pull Request resolved: https://github.com/pytorch/audio/pull/3172 Reviewed By: mthrok Differential Revision: D44090889 Pulled By: nateanl fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264
-
- 07 Mar, 2023 1 commit
-
-
Maciej Torhan authored
Summary: In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer. Pull Request resolved: https://github.com/pytorch/audio/pull/3145 Reviewed By: mthrok Differential Revision: D43847713 Pulled By: nateanl fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6
-
- 02 Mar, 2023 1 commit
-
-
moto authored
Summary: Fix build_doc job https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8 - build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL. - Fix bash cell syntax in HW tutorial - Fix C++ doc - Fix duplicated target name in streamwriter tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/3125 Reviewed By: xiaohui-zhang Differential Revision: D43724078 Pulled By: mthrok fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c
-
- 24 Feb, 2023 2 commits
-
-
Vladislav Agafonov authored
Summary: Add `Wav2Vec2DataModule` in self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3081 Reviewed By: mthrok Differential Revision: D43579239 Pulled By: nateanl fbshipit-source-id: 3e935eb9a18ef0259a58940ae466cbdc3baf8494
-
Vladislav Agafonov authored
Summary: Add wav2vec2 loss function in the self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3090 Reviewed By: mthrok Differential Revision: D43579220 Pulled By: nateanl fbshipit-source-id: 4b52792b518ddc5b01c9660c90ceb3c4ad1f0237
-
- 23 Feb, 2023 1 commit
-
-
G. Sun authored
Summary: This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing. An example for Librispeech can be found in audio/examples/asr/librispeech_biasing. Maintainer's note (mthrok): It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple could cause some issue without running the code, so the code is not changed, though the annotation uses tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2890 Reviewed By: nateanl Differential Revision: D43171447 Pulled By: mthrok fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e
-
- 16 Feb, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: The `BucketizeBatchSampler` may return different iter_list in different node if `shuffle` is `True`, which will cause DPP training hang forever. `shuffle` in `DistributedSampler` only happens in initialization, which means it will assign the same subset to replicas in all training epochs. The PR fixes the two above issues. cc arlofaria Pull Request resolved: https://github.com/pytorch/audio/pull/3068 Reviewed By: mthrok Differential Revision: D43372110 Pulled By: nateanl fbshipit-source-id: a162728406ae995e05d2a07cfc2444fb76cf345e
-
Zhaoheng Ni authored
Summary: In https://github.com/pytorch/audio/issues/2873, layer normalization is applied to waveforms for SSL models trained on large scale datasets. The word error rate is significantly reduced after the change. The PR updates the results for the affected models. Without the change in https://github.com/pytorch/audio/issues/2873, here is the WER result table: | Model | dev-clean | dev-other | test-clean | test-other | |:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:| | [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) | 10.59| 15.62| 9.58| 16.33| | [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) | 2.80| 6.01| 2.82| 6.34| | [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) | 2.36| 4.43| 2.41| 4.96| | [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) | 1.85| 3.46| 2.09| 3.89| | [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) | 2.21| 3.40| 2.26| 4.05| After applying layer normalization, here is the updated result: | Model | dev-clean | dev-other | test-clean | test-other | |:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:| | [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) | 6.77| 10.03| 6.87| 10.51| | [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) | 2.19| 4.55| 2.32| 4.64| | [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) | 1.78| 3.51| 2.03| 3.68| | [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) | 1.77| 3.32| 2.03| 3.68| | [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) | 1.73| 2.72| 1.90| 3.16| Pull Request resolved: https://github.com/pytorch/audio/pull/3070 Reviewed By: mthrok Differential Revision: D43365313 Pulled By: nateanl fbshipit-source-id: 34a60ad2e5eb1299da64ef88ff0208ec8ec76e91
-
- 15 Feb, 2023 1 commit
-
-
hwangjeff authored
Summary: Updates tutorial "Audio Data Augmentation" to use two of the newly introduced data augmentation operators in beta: `torchaudio.functional.fftconvolve` and `torchaudio.functional.add_noise`. Pull Request resolved: https://github.com/pytorch/audio/pull/3062 Reviewed By: mthrok Differential Revision: D43298120 Pulled By: hwangjeff fbshipit-source-id: 09ca736a5c67242568515d600b7d31eab32c2df1
-
- 14 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: - Rename the current `ssl` example to `self_supervised_learning` - Add README to demonstrate how to run the recipe with hubert task Pull Request resolved: https://github.com/pytorch/audio/pull/3060 Reviewed By: mthrok Differential Revision: D43287868 Pulled By: nateanl fbshipit-source-id: 10352682485ef147ca32f4c4c9f9cde995444aa0
-
- 30 Jan, 2023 1 commit
-
-
Yan Li authored
Summary: Currently there will be a few errors when this tutorial is run with a CUDA device. The reasons being: - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU). - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`). Pull Request resolved: https://github.com/pytorch/audio/pull/3017 Reviewed By: mthrok Differential Revision: D42828526 Pulled By: nateanl fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
-
- 19 Jan, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model. Pull Request resolved: https://github.com/pytorch/audio/pull/2876 Reviewed By: hwangjeff Differential Revision: D42617414 Pulled By: nateanl fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
-
hwangjeff authored
Summary: In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead. Pull Request resolved: https://github.com/pytorch/audio/pull/2981 Reviewed By: mthrok Differential Revision: D42507228 Pulled By: hwangjeff fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
-
- 17 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The mel spectrograms in the TTS tutorial are upside down. The PR fixes it by using `origin="lower"` in imshow. Pull Request resolved: https://github.com/pytorch/audio/pull/2989 Reviewed By: mthrok Differential Revision: D42538349 Pulled By: nateanl fbshipit-source-id: 4388103a49bdfabf1705c1f979d44ecedd5c910a
-
- 16 Jan, 2023 1 commit
-
-
Robin Scheibler authored
Summary: The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following. 1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset 2. Corrects `args.data_dir` to `args.root_dir` in eval.py 3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`) Pull Request resolved: https://github.com/pytorch/audio/pull/2987 Reviewed By: xiaohui-zhang Differential Revision: D42536992 Pulled By: nateanl fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
-
- 13 Jan, 2023 1 commit
-
-
moto authored
Summary: Per the suggestion by nateanl, adding the visualization of feature fed to ASR. <img width="688" alt="Screen Shot 2023-01-12 at 8 19 59 PM" src="https://user-images.githubusercontent.com/855818/212215190-23be7553-4c04-40d9-944e-3ee2ff69c49b.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2974 Reviewed By: nateanl Differential Revision: D42484088 Pulled By: mthrok fbshipit-source-id: 2c839492869416554eac04aa06cd12078db21bd7
-
- 30 Dec, 2022 1 commit
-
-
moto authored
Summary: Artifact: [subtractive_synthesis_tutorial](https://output.circle-artifacts.com/output/job/4c1ce33f-834d-48e0-ba89-2e91acdcb572/artifacts/0/docs/tutorials/subtractive_synthesis_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2934 Reviewed By: carolineechen Differential Revision: D42284945 Pulled By: mthrok fbshipit-source-id: d255b8e8e2a601a19bc879f9e1c38edbeebaf9b3
-
- 17 Dec, 2022 1 commit
-
-
moto authored
Summary: Adds filter design tutorial, which demonstrates `sinc_impulse_response` and `frequency_impulse_response`. Example: - [filter_design_tutorial](https://output.circle-artifacts.com/output/job/bd22c615-9215-4b17-a52c-b171a47f646c/artifacts/0/docs/tutorials/filter_design_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2894 Reviewed By: xiaohui-zhang Differential Revision: D42117658 Pulled By: mthrok fbshipit-source-id: f7dd04980e8557bb6f0e0ec26ac2c7f53314ea16
-
- 16 Dec, 2022 1 commit
-
-
Caroline Chen authored
Summary: resolves https://github.com/pytorch/audio/issues/2891 Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged. Pull Request resolved: https://github.com/pytorch/audio/pull/2922 Reviewed By: mthrok Differential Revision: D42083619 Pulled By: carolineechen fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
-
- 29 Nov, 2022 1 commit
-
-
moto authored
Summary: This commit adds the tutorial for additive synthesis, using torchaudio's prototype DSP ops. [Review here](https://output.circle-artifacts.com/output/job/3dc83322-832a-4272-9c13-df752c97b660/artifacts/0/docs/tutorials/additive_synthesis_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2877 Reviewed By: carolineechen Differential Revision: D41585425 Pulled By: mthrok fbshipit-source-id: b81283b90e4779c8054fd030a1d8c3d39d676bbd
-
- 28 Nov, 2022 1 commit
-
-
moto authored
Summary: This commits add tutorial for oscillator_bank and adsr_envelope, which will be a basis for DDSP. - [Review here](https://output.circle-artifacts.com/output/job/cf1d3001-88e5-418b-8cf8-ae22b4445dba/artifacts/0/docs/tutorials/oscillator_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2862 Reviewed By: carolineechen Differential Revision: D41559503 Pulled By: mthrok fbshipit-source-id: 3f1689186db7d246de14f228fc2f91bf37db98cd
-
- 17 Nov, 2022 1 commit
-
-
vasiliy authored
Summary: This code was added by https://github.com/pytorch/audio/commit/4d0095a528412cfec2a549204fc01d9ebb15df7a Seems that the original code had a typo? Pull Request resolved: https://github.com/pytorch/audio/pull/2858 Test Plan: ``` // the import of `mustc` now succeeds, previously crashed python examples/asr/emformer_rnnt/global_stats.py --model-type librispeech --dataset-path /home/vasiliy/local/librispeech/ ``` Reviewed By: carolineechen Differential Revision: D41355663 Pulled By: nateanl fbshipit-source-id: 92507e529d41b984b9dd400ad24a55d130372b7d
-
- 16 Nov, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2847 In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training. Pull Request resolved: https://github.com/pytorch/audio/pull/2854 Reviewed By: carolineechen Differential Revision: D41343486 Pulled By: nateanl fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
- 17 Oct, 2022 1 commit
-
-
moto authored
Summary: * Refactor benchmark script * Rename `time` variable to avoid (potential) conflicting with time module * Fix `beta` parameter in benchmark (it was not used previously) * Use `timeit` module for benchmark * Add plot * Move the comment on result at the end * Add link to an explanation of aliasing https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2773 Reviewed By: carolineechen Differential Revision: D40421337 Pulled By: mthrok fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a
-
- 14 Oct, 2022 2 commits
-
-
moto authored
Summary: In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command. It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html <img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png"> This commit fixes it by closing the figure. Pull Request resolved: https://github.com/pytorch/audio/pull/2771 Reviewed By: nateanl Differential Revision: D40382076 Pulled By: mthrok fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a
-
nateanl authored
Summary: The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:  In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded. Pull Request resolved: https://github.com/pytorch/audio/pull/2769 Reviewed By: carolineechen Differential Revision: D40358382 Pulled By: nateanl fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e
-
- 13 Oct, 2022 2 commits
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762 Reviewed By: mthrok Differential Revision: D40332603 Pulled By: carolineechen fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251
-
moto authored
Summary: Adding and updating author information. Pull Request resolved: https://github.com/pytorch/audio/pull/2764 Reviewed By: carolineechen Differential Revision: D40332427 Pulled By: mthrok fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a
-
- 12 Oct, 2022 2 commits
-
-
Nikita Shulga authored
Summary: `publishe`->`published` Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published` Pull Request resolved: https://github.com/pytorch/audio/pull/2761 Reviewed By: carolineechen Differential Revision: D40313042 Pulled By: malfet fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
-
Zhaoheng Ni authored
Summary: following pr https://github.com/pytorch/audio/issues/2716 - For preprocessing - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model. - For pre-training - Normalize the loss based on the total number of masked frames across all GPUs. - Use mixed precision training. fp16 is not well supported in pytorch_lightning. - Log accuracies of masked/unmasked frames during training. - Clip the gradients with norm `10.0`. - For ASR fine-tuning - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio. - Use mixed precision training. - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe. - Update the WER results on LibriSpeech dev and test sets. | | WER% (Viterbi)| WER% (KenLM) | |:-----------------:|--------------:|--------------:| | dev-clean | 10.9 | 4.2 | | dev-other | 17.5 | 9.4 | | test-clean | 10.9 | 4.4 | | test-other | 17.8 | 9.5 | Pull Request resolved: https://github.com/pytorch/audio/pull/2744 Reviewed By: carolineechen Differential Revision: D40282322 Pulled By: nateanl fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
-
- 07 Oct, 2022 1 commit
-
-
moto authored
Summary: Specifying multiple object in `:minigallery:` directive shows duplicated tutorials. This commit fixes it by listing tutorials based on module used. https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html Before: <img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png"> After: <img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2736 Reviewed By: carolineechen Differential Revision: D40160247 Pulled By: carolineechen fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477
-
- 06 Oct, 2022 1 commit
-
-
moto authored
Summary: Add a tutorial for basic usage of torchaudio.io.StreamWriter. https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2698 Reviewed By: carolineechen Differential Revision: D40133007 Pulled By: carolineechen fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623
-
- 05 Oct, 2022 1 commit
-
-
moto authored
Summary: * Port downstream change https://github.com/pytorch/tutorials/pull/2060 * Fix inter-tutorial links and references Pull Request resolved: https://github.com/pytorch/audio/pull/2733 Reviewed By: hwangjeff Differential Revision: D40086902 Pulled By: hwangjeff fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8
-