- 29 Oct, 2022 1 commit
-
-
moto authored
-
- 20 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2780 Pull Request resolved: https://github.com/pytorch/audio/pull/2781 Reviewed By: carolineechen, mthrok Differential Revision: D40556794 Pulled By: nateanl fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e
-
- 19 Oct, 2022 4 commits
-
-
Caroline Chen authored
Summary: add ability to load only improvised or only scripted utterances. Pull Request resolved: https://github.com/pytorch/audio/pull/2778 Reviewed By: nateanl Differential Revision: D40511865 Pulled By: carolineechen fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539
-
Caroline Chen authored
Summary: previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data. Pull Request resolved: https://github.com/pytorch/audio/pull/2777 Reviewed By: nateanl Differential Revision: D40480605 Pulled By: carolineechen fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103
-
Zhaoheng Ni authored
Summary: The file structure of VoxCeleb1 is as follows: ``` root/ └── wav/ └── speaker_id folders ``` Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders. This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users. Pull Request resolved: https://github.com/pytorch/audio/pull/2776 Reviewed By: carolineechen Differential Revision: D40483707 Pulled By: nateanl fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d -
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775 Reviewed By: carolineechen Differential Revision: D40481144 Pulled By: nateanl fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e
-
- 18 Oct, 2022 1 commit
-
-
nateanl authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774 Reviewed By: carolineechen Differential Revision: D40445274 Pulled By: nateanl fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d
-
- 17 Oct, 2022 1 commit
-
-
moto authored
Summary: * Refactor benchmark script * Rename `time` variable to avoid (potential) conflicting with time module * Fix `beta` parameter in benchmark (it was not used previously) * Use `timeit` module for benchmark * Add plot * Move the comment on result at the end * Add link to an explanation of aliasing https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2773 Reviewed By: carolineechen Differential Revision: D40421337 Pulled By: mthrok fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a
-
- 14 Oct, 2022 2 commits
-
-
moto authored
Summary: In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command. It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html <img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png"> This commit fixes it by closing the figure. Pull Request resolved: https://github.com/pytorch/audio/pull/2771 Reviewed By: nateanl Differential Revision: D40382076 Pulled By: mthrok fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a
-
nateanl authored
Summary: The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:  In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded. Pull Request resolved: https://github.com/pytorch/audio/pull/2769 Reviewed By: carolineechen Differential Revision: D40358382 Pulled By: nateanl fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e
-
- 13 Oct, 2022 5 commits
-
-
moto authored
Summary: * Document `__call__` instead of `__init__` * List CTCHypothesis first as it is used in combination with CTCDecoder * Fix indentation of score method docstring Pull Request resolved: https://github.com/pytorch/audio/pull/2766 Reviewed By: carolineechen Differential Revision: D40349388 Pulled By: mthrok fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c
-
Nikita Shulga authored
Summary: `publishe`->`published` Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published` Pull Request resolved: https://github.com/pytorch/audio/pull/2761 Reviewed By: carolineechen Differential Revision: D40313042 Pulled By: malfet fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762 Reviewed By: mthrok Differential Revision: D40332603 Pulled By: carolineechen fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251
-
Caroline Chen authored
Summary: GTZAN download link is no longer working, so the torchaudio download functionality for GTZAN does not work properly, per https://github.com/pytorch/audio/issues/2743. Add a note in the docs to reflect this discovery. Pull Request resolved: https://github.com/pytorch/audio/pull/2763 Reviewed By: nateanl, mthrok Differential Revision: D40315071 Pulled By: carolineechen fbshipit-source-id: 3250326c45d227546a9c62b33ba890199ad19242
-
moto authored
Summary: Adding and updating author information. Pull Request resolved: https://github.com/pytorch/audio/pull/2764 Reviewed By: carolineechen Differential Revision: D40332427 Pulled By: mthrok fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a
-
- 12 Oct, 2022 4 commits
-
-
Zhaoheng Ni authored
Summary: following pr https://github.com/pytorch/audio/issues/2716 - For preprocessing - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model. - For pre-training - Normalize the loss based on the total number of masked frames across all GPUs. - Use mixed precision training. fp16 is not well supported in pytorch_lightning. - Log accuracies of masked/unmasked frames during training. - Clip the gradients with norm `10.0`. - For ASR fine-tuning - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio. - Use mixed precision training. - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe. - Update the WER results on LibriSpeech dev and test sets. | | WER% (Viterbi)| WER% (KenLM) | |:-----------------:|--------------:|--------------:| | dev-clean | 10.9 | 4.2 | | dev-other | 17.5 | 9.4 | | test-clean | 10.9 | 4.4 | | test-other | 17.8 | 9.5 | Pull Request resolved: https://github.com/pytorch/audio/pull/2744 Reviewed By: carolineechen Differential Revision: D40282322 Pulled By: nateanl fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
-
Caroline Chen authored
Summary: a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci cc atalman Pull Request resolved: https://github.com/pytorch/audio/pull/2758 Reviewed By: mthrok Differential Revision: D40290535 Pulled By: carolineechen fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57
-
Zhaoheng Ni authored
Summary: This PR improves the Wav2Vec2/HuBERT model regarding model pre-training. - The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames. Compared the performance after two epochs with 16 GPUs. - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11. - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04. - After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed. - In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen). Other improvements within training scripts will be included in a separate PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2716 Reviewed By: xiaohui-zhang Differential Revision: D39832189 Pulled By: nateanl fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27
-
Andrey Talman authored
* Fix torchaudio build channel * Fix channel
-
- 11 Oct, 2022 6 commits
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2751 Reviewed By: nateanl Differential Revision: D40267874 Pulled By: carolineechen fbshipit-source-id: 4e45a02c650ed65c05cde82289a400a3be877927
-
Andrey Talman authored
-
atalman authored
Summary: Fix windows python 3.8 loading path Pull Request resolved: https://github.com/pytorch/audio/pull/2747 Reviewed By: nateanl Differential Revision: D40264326 Pulled By: nateanl fbshipit-source-id: f4a24757de7b48c63a7481034eb11fc3ff174327
-
Caroline Chen authored
-
Caroline Chen authored
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738 Reviewed By: carolineechen Differential Revision: D40238099 Pulled By: nateanl fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e
-
- 10 Oct, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: Besides the unit test, the PR also addresses these issues: - The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use. - If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target. Pull Request resolved: https://github.com/pytorch/audio/pull/2659 Reviewed By: carolineechen Differential Revision: D40229227 Pulled By: nateanl fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235
-
Zhaoheng Ni authored
Summary: The docstring of `wav2vec2` argument is wrong. Fix it in this PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2746 Reviewed By: carolineechen Differential Revision: D40225995 Pulled By: nateanl fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da
-
- 09 Oct, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732 Reviewed By: nateanl Differential Revision: D40186996 Pulled By: nateanl fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4
-
- 08 Oct, 2022 1 commit
-
-
moto authored
Summary: * Add HW encoding to HW tutorial https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS Pull Request resolved: https://github.com/pytorch/audio/pull/2739 Reviewed By: hwangjeff Differential Revision: D40197086 Pulled By: hwangjeff fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db
-
- 07 Oct, 2022 3 commits
-
-
hwangjeff authored
Summary: Updates sox info docstring to account for mp3 frame count handling fix introduced in https://github.com/pytorch/audio/issues/2740. Pull Request resolved: https://github.com/pytorch/audio/pull/2742 Reviewed By: nateanl Differential Revision: D40189846 Pulled By: nateanl fbshipit-source-id: d6371418d7d4867dd0b97ee72ebf846d5c93dc30
-
hwangjeff authored
Summary: Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524. Pull Request resolved: https://github.com/pytorch/audio/pull/2740 Reviewed By: nateanl Differential Revision: D40168639 Pulled By: nateanl fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24
-
moto authored
Summary: Specifying multiple object in `:minigallery:` directive shows duplicated tutorials. This commit fixes it by listing tutorials based on module used. https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html Before: <img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png"> After: <img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2736 Reviewed By: carolineechen Differential Revision: D40160247 Pulled By: carolineechen fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477
-
- 06 Oct, 2022 3 commits
-
-
moto authored
Summary: Add a tutorial for basic usage of torchaudio.io.StreamWriter. https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2698 Reviewed By: carolineechen Differential Revision: D40133007 Pulled By: carolineechen fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623
-
atalman authored
Summary: Torchaudio load libary path fix for windows and python = 3.8 Fixes: https://github.com/pytorch/audio/issues/2726 Fixes following issue: ``` >>> import torchaudio Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\__init__.py", line 1, in <module> from torchaudio import ( # noqa: F401 File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 128, in <module> _init_extension() File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 98, in _init_extension _load_lib("libtorchaudio") File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib torch.ops.load_library(path) File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torch\_ops.py", line 573, in load_library ctypes.CDLL(path) File "C:\Users\atalman\miniconda3\envs\mywin38\lib\ctypes\__init__.py", line 373, in __init__ self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module 'C:\Users\atalman\miniconda3\envs\mywin38\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax. >>> ``` Caused by dlls not being found in the conda environment ``` C:\Users\atalman\miniconda3\envs\mywin38\bin\ ``` While this environment is set correctly in PATH its ignored with Python = 3.8 Please refer to: https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python Pull Request resolved: https://github.com/pytorch/audio/pull/2735 Reviewed By: carolineechen Differential Revision: D40112293 Pulled By: carolineechen fbshipit-source-id: c7fc9bb49fc3ec4a2855c6ea473f36808103ed1e
-
Ivan Zaitsev authored
Summary: The goal is to to reduce the number of job failures due to timeouts, see https://app.circleci.com/pipelines/github/pytorch/audio/12882/workflows/f99da1a5-32e6-4bac-8ceb-fbf36d693e2d/jobs/936363?invite=true#step-105-105 for example. Pull Request resolved: https://github.com/pytorch/audio/pull/2734 Reviewed By: weiwangmeta, atalman Differential Revision: D40077578 fbshipit-source-id: 573f43a4d088a7086fa6925ac5ba1fdd1e8f39ec
-
- 05 Oct, 2022 1 commit
-
-
moto authored
Summary: * Port downstream change https://github.com/pytorch/tutorials/pull/2060 * Fix inter-tutorial links and references Pull Request resolved: https://github.com/pytorch/audio/pull/2733 Reviewed By: hwangjeff Differential Revision: D40086902 Pulled By: hwangjeff fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8
-
- 03 Oct, 2022 3 commits
-
-
moto authored
Summary: https://output.circle-artifacts.com/output/job/213c71c8-c9b5-4516-af92-a2f8dab2c9fd/artifacts/0/docs/tutorials/streamwriter_advanced.html Pull Request resolved: https://github.com/pytorch/audio/pull/2708 Reviewed By: carolineechen Differential Revision: D40013310 Pulled By: mthrok fbshipit-source-id: 7226b021ce2fe951b3bf0bd41e93a6bbcf696124
-
moto authored
Summary: Adopt `:autosummary:` to various modules * torchaudio.compliance.kaldi * torchaudio.sox_effects * torchaudio.utils Pull Request resolved: https://github.com/pytorch/audio/pull/2664 Reviewed By: nateanl Differential Revision: D39841873 Pulled By: mthrok fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac -
Zhaoheng Ni authored
Summary: The MuST-C reference is added in https://github.com/pytorch/audio/pull/2689. This PR adds the citation to the RNNT pipeline documentation. Pull Request resolved: https://github.com/pytorch/audio/pull/2728 Reviewed By: carolineechen Differential Revision: D39990882 Pulled By: nateanl fbshipit-source-id: 011057952dd8aa30a4cb7c7af0ac75123e329d7e
-
- 01 Oct, 2022 1 commit
-
-
Sergii Dymchenko authored
Summary: The file looks hopelessly outdated. Pull Request resolved: https://github.com/pytorch/audio/pull/2730 Reviewed By: mthrok Differential Revision: D39993805 Pulled By: kit1980 fbshipit-source-id: f5ad97c83873061175455cc7b129ec71a9ec3d7d
-