- 16 Nov, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
- 15 Nov, 2022 3 commits
-
-
Grigory Sizov authored
Summary: Closes T136364380, follow-up to https://github.com/pytorch/audio/issues/2822 - Added "base", "base+", and "large" bundles for WavLM - Expanded `wav2vec2_pipeline_test.py` to include the new bundles - Added the new bundles to docs in `pipelines.rst` Pull Request resolved: https://github.com/pytorch/audio/pull/2833 Reviewed By: nateanl Differential Revision: D41194796 Pulled By: sgrigory fbshipit-source-id: bf8e96c05b6a81ac5c5a014c46adeeac12685328
-
Grigory Sizov authored
Summary: Closes T137506059 Replaces functional multi-head attention in `WavLMSelfAttention` with a module `torch.nn.MultiheadAttention`. The reason is that the latter uses native CPU/CUDA implementation ([BetterTransfomer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/)) under certain conditions, and can achieve significant speedup. It also simplifies the code in `WavLMSelfAttention` Note: the definition of `bias` parameter in `WavLMSelfAttention.forward` has changed slightly, because in `torch.nn.MultiheadAttention` there is no parameter controlling presence of bias for projections of `k`, `v`, and `q` independently. In WavLM we only use `bias=True`, so it won't have any effect for users of WavLM or tests Pull Request resolved: https://github.com/pytorch/audio/pull/2842 Reviewed By: nateanl Differential Revision: D41186166 Pulled By: sgrigory fbshipit-source-id: e791c68106ad89f96c1abf046de699cb8ec7b595
-
moto authored
Summary: * Add the new official torchaudio logo to documentation/README. * Add a page for download logo. https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/index.html <img width="1068" alt="Screen Shot 2022-11-14 at 10 30 27 AM" src="https://user-images.githubusercontent.com/855818/201738349-9e248f15-dce2-4931-9066-aa898a53d6ad.png"> https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/logo.html <img width="617" alt="Screen Shot 2022-11-14 at 10 30 47 AM" src="https://user-images.githubusercontent.com/855818/201738420-ad0fda2f-f310-4802-851c-bbdf6c84c045.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2802 Reviewed By: carolineechen Differential Revision: D41295277 Pulled By: mthrok fbshipit-source-id: 6615d00799c9611f875e8485459d800e350b3486
-
- 14 Nov, 2022 2 commits
-
-
moto authored
Summary: Removing LTS mention and packages from README as it is discontinued. Pull Request resolved: https://github.com/pytorch/audio/pull/2844 Reviewed By: hwangjeff, xiaohui-zhang Differential Revision: D41200886 Pulled By: mthrok fbshipit-source-id: 0da0afe68df51826075ce945cf0cf1de901e1c8f
-
Caroline Chen authored
Summary: follow up to https://github.com/pytorch/audio/issues/2823 - move bark spectrogram to prototype - decrease autograd test tolerance (passing on circle ci) - add diagram for bark fbanks cc jdariasl Pull Request resolved: https://github.com/pytorch/audio/pull/2843 Reviewed By: nateanl Differential Revision: D41199522 Pulled By: carolineechen fbshipit-source-id: 8e6c2e20fb7b14f39477683b3c6ed8356359a213
-
- 13 Nov, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2845 Pull Request resolved: https://github.com/pytorch/audio/pull/2846 Reviewed By: carolineechen Differential Revision: D41251624 Pulled By: nateanl fbshipit-source-id: 1a363d2314d6a452f35c109b9730da64ada5a2fd
-
- 11 Nov, 2022 1 commit
-
-
DanilBaibak authored
Summary: Added missed build workflows for MacOS and Linux: - [x] Linux conda - [x] MacOS conda This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows for the MacOS and Linux conda builds with Nova. We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova. Pull Request resolved: https://github.com/pytorch/audio/pull/2800 Reviewed By: osalpekar Differential Revision: D41181467 Pulled By: DanilBaibak fbshipit-source-id: a5c5d4dcfdd778b4045203f6016c20fb42daa01b
-
- 10 Nov, 2022 5 commits
-
-
moto authored
Summary: Currently `discard_before_pts=-1` is used to indicate no AVFrame should be skipped. It was reported that some corrupted video can have constant negative pts value. It is technically UB for such corrupted data, but still all the AVFrame should be decoded as long as `seek` is not used. This commit changes the decoder so that it processes AVFrame if `discard_before_pts==-1` disregard of AVFrame::pts value. Pull Request resolved: https://github.com/pytorch/audio/pull/2841 Reviewed By: hwangjeff Differential Revision: D41174442 Pulled By: mthrok fbshipit-source-id: e9d2fab4b0e2bc47146eda8e1dd377a74c087590
-
Omkar Salpekar authored
Summary: Adding Nova Reusable Workflow for M1 Wheels Build. Once this has been running well for a while, we can replace the old `build-m1-binaries.yml` workflow. Pull Request resolved: https://github.com/pytorch/audio/pull/2839 Reviewed By: DanilBaibak Differential Revision: D41195316 Pulled By: osalpekar fbshipit-source-id: f3754043f384b1645e5fcfaebf465f6839f72461
-
Omkar Salpekar authored
Summary: Adding Nova Reusable Workflow for M1 Conda Build. Once this has been running well for a while, we can replace the old `build-m1-binaries.yml` workflow. Pull Request resolved: https://github.com/pytorch/audio/pull/2840 Reviewed By: DanilBaibak Differential Revision: D41195298 Pulled By: osalpekar fbshipit-source-id: 14591b96e998aa43fa57e8e5b0b09d0ce4f4092e
-
Julián D. Arias-Londoño authored
Summary: I have added BarkScale transform, which can transform a regular Spectrogram into a BarkSpectrograms similar to MelScale. ahmed-fau opened this requirement in December 2021 with the number (https://github.com/pytorch/audio/issues/2103). The new functionality includes three different well-known approximations of the Bark scale. Pull Request resolved: https://github.com/pytorch/audio/pull/2823 Reviewed By: nateanl Differential Revision: D41162100 Pulled By: carolineechen fbshipit-source-id: b2670c4972e49c9ef424da5d5982576f7a4df831
-
Caroline Chen authored
Summary: internal comparison tests: D40080919 follow up PR for pretrained models https://github.com/pytorch/audio/issues/2827 Pull Request resolved: https://github.com/pytorch/audio/pull/2826 Reviewed By: nateanl Differential Revision: D41160061 Pulled By: carolineechen fbshipit-source-id: f3c478b28c235af53d1d8e21b573c53684a63ac4
-
- 09 Nov, 2022 2 commits
-
-
Grigory Sizov authored
Summary: Closes T136364380 Added [WavLM Model](https://github.com/microsoft/UniSpeech/tree/main/WavLM): - Added `WavLMSelfAttention` class (from [original implementation](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py)) and adjusted existing Encoder and Transformer classes to be compatible with it - Added factory functions `wavlm_model`, `wavlm_base`, `wavlm_large` to `models/wav2vec2/model.py` - Added bundles for base and large models to pipelines. **TODO**: pre-trained model weights are not yet uploaded to `download.pytorch.org`, permissions not granted yet. ## Tests - Expanded HuggingFace integration tests to cover WavLM. For there tests, added JSON configs for base and large models from HF ([base](https://huggingface.co/microsoft/wavlm-base/blob/main/config.json), [large](https://huggingface.co/microsoft/wavlm-large/blob/main/config.json)) into test assets - Expanded TorchScript and quantization tests to cover WavLM ## Comments There are a few workarounds I had to introduce: - Quantization tests for WavLM were breaking down at [`torch.cat`](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR466) ~~until I excluded the arguments of `torch.cat` from quantization [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR368-R369). I haven't found a better way to fix it, let me know if there is one~~ The reason for this seems to be that quantization replaces `.bias` and `.weight` attributes of a `Linear` module with methods. Since we are using weights and biases directly, the code was break. The final solution suggested by nateanl was to define attention weights and biases directly in `WavLMSelfAttention`, skipping the `Linear` layers - ~~WavLM uses position embedding in the first layer of encoder, but not in the subsequent ones. So [UniSpeech](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py#L342) and [HF](https://github.com/huggingface/transformers/blob/b047472650cba259621549ac27b18fd2066ce18e/src/transformers/models/wavlm/modeling_wavlm.py#L441-L442) implementations only create this embedding module in the layers where it's used. However, we can't do this here because it breaks TorchScript. So as a solution I add a dummy `Identity` module to `WavLMSelfAttention` when the actual embedding is not needed: [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR361-R368).~~ Thanks nateanl for resolving this! - I had to add dummy `position_bias` and `key_padding_mask` arguments to `SelfAttention.forward` to make TorchScript tests pass. Since both `SelfAttention` and `WavLMSelfAttention` are called from `EncoderLayer`, they need to have compatible signatures. Having a variable number of arguments with `**kwargs` or checking object class doesn't seem to work with TorchScript, so I instead made both types of attention accept `position_bias` and `key_padding_mask` arguments. Nit: do we still need to specify `__all__` if there are no wildcard imports in `__init__.py`, e.g. in `torchaudio/models/__init__.py`? Pull Request resolved: https://github.com/pytorch/audio/pull/2822 Reviewed By: nateanl Differential Revision: D41121855 Pulled By: sgrigory fbshipit-source-id: 9f4f787e5810010de4e74cb704063a26c66767d7
-
DanilBaibak authored
Summary: Added build wheels workflow for MacOS. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the MacOS Wheels with Nova. We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova. Pull Request resolved: https://github.com/pytorch/audio/pull/2782 Reviewed By: osalpekar Differential Revision: D41091271 Pulled By: DanilBaibak fbshipit-source-id: 906bcfecb26b5268a05163fa339909707f7de494
-
- 08 Nov, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss. If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function. The following should produce the same output: ``` rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True) ``` ``` log_probs = torch.nn.functional.log_softmax(logits, dim=-1) rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False) ``` testing -- unit tests + get same results on the conformer rnnt recipe Pull Request resolved: https://github.com/pytorch/audio/pull/2798 Reviewed By: xiaohui-zhang Differential Revision: D41083523 Pulled By: carolineechen fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
-
hwangjeff authored
Summary: Adds `torch.nn.Module`-based implementations for convolution and FFT convolution. Pull Request resolved: https://github.com/pytorch/audio/pull/2811 Reviewed By: carolineechen Differential Revision: D40881937 Pulled By: hwangjeff fbshipit-source-id: bfe8969e6178ad4f58981efd4b2720ac006be8de
-
- 04 Nov, 2022 1 commit
-
-
moto authored
Summary: StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption. This commit fixes it by properly computing time_base from frame rate. Address https://github.com/pytorch/audio/issues/2830 Pull Request resolved: https://github.com/pytorch/audio/pull/2831 Reviewed By: carolineechen Differential Revision: D41036084 Pulled By: mthrok fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609
-
- 03 Nov, 2022 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2825 Reviewed By: carolineechen Differential Revision: D40954522 Pulled By: mthrok fbshipit-source-id: 433fb856a74a340af4d49e5c65a6270f0b00c835
-
- 02 Nov, 2022 5 commits
-
-
Eli Uriegas authored
Summary: Makes it specific to which version of otool and install_name_tool we actually prefer since using the one from conda can produce inconsistent results Fixes https://github.com/pytorch/audio/issues/2806 Signed-off-by:
Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/audio/pull/2828 Reviewed By: malfet, mthrok Differential Revision: D40960633 Pulled By: seemethere fbshipit-source-id: 5010c06578f1efc4fe314f9a3ff47f18e14ad156
-
moto authored
Summary: PyTorch logo is included in pytorch doc theme, (and cannot be changed without custom CSS) so no need to have them here. Pull Request resolved: https://github.com/pytorch/audio/pull/2824 Reviewed By: carolineechen Differential Revision: D40954564 Pulled By: mthrok fbshipit-source-id: 5e9a91fddcc92c141baf1996f721c09c037fb003
-
Caroline Chen authored
Summary: Now that hybrid demucs is officially released as beta, remove it's temp prototype initialization support Pull Request resolved: https://github.com/pytorch/audio/pull/2817 Reviewed By: mthrok Differential Revision: D40908696 Pulled By: carolineechen fbshipit-source-id: bc87a4b7aeb27db00e10bdce91cd71688cb08769
-
moto authored
Summary: <img width="756" alt="Screen Shot 2022-11-01 at 3 32 58 PM" src="https://user-images.githubusercontent.com/855818/199173348-f463ae71-438c-4dad-a481-b65522a8e52f.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2812 Reviewed By: carolineechen Differential Revision: D40919942 Pulled By: mthrok fbshipit-source-id: 18e5a709c262fb0b15ada0d303f1d0dee033beb1
-
hwangjeff authored
Summary: Partly addresses https://github.com/pytorch/audio/issues/2686 and https://github.com/pytorch/audio/issues/2356. Currently, when the buffer used for file object decoding is insufficiently large, `torchaudio.load` returns a shorter waveform than expected. To deal with this, the user is expected to increase the buffer size via `torchaudio.utils.sox_utils.get_buffer_size`, but this does not influence the buffer used by the FFMpeg fallback. To fix this, this PR introduces changes that apply the buffer size set for the SoX backend to FFMpeg. As a follow-up, we should see whether it's possible to programmatically detect that the buffer's too small and flag it to the user. Pull Request resolved: https://github.com/pytorch/audio/pull/2810 Reviewed By: mthrok Differential Revision: D40906978 Pulled By: hwangjeff fbshipit-source-id: 256fe1da8b21610b05bea9a0e043f484f9ea2e76
-
- 01 Nov, 2022 1 commit
-
-
hwangjeff authored
Summary: Argument `mode` in `convolve` and `fftconvolve` is expected to be a string, but the docstrings incorrectly say bool. This PR fixes the docstrings accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2809 Reviewed By: nateanl Differential Revision: D40854464 Pulled By: hwangjeff fbshipit-source-id: 75b339ba34715723c93b91e7d48be2ed28bee115
-
- 31 Oct, 2022 1 commit
-
-
Joao Gomes authored
Summary: cc mthrok Implements precise seek and seek to any frame in torchaudio Pull Request resolved: https://github.com/pytorch/audio/pull/2737 Reviewed By: mthrok Differential Revision: D40546716 Pulled By: jdsgomes fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e
-
- 29 Oct, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2804 Reviewed By: nateanl Differential Revision: D40813412 Pulled By: carolineechen fbshipit-source-id: 8270bf17851b7424f51ecb8dbcbc2e1076efe333
-
- 28 Oct, 2022 2 commits
-
-
hwangjeff authored
Summary: Introduces argument 'mode' for convolution functions, following SciPy's convention. Pull Request resolved: https://github.com/pytorch/audio/pull/2801 Reviewed By: nateanl Differential Revision: D40805405 Pulled By: hwangjeff fbshipit-source-id: 8f0006ffe9e3945b4b17f44c4cfa1adb265c20ef
-
moto authored
Summary: This commit re-organizes the tutorials. 1. Put all the tutorials in the left bar and make the section **folded by default**. 2. Add pytorch/tutorials-like cards in index 3. Move feature classifications to a dedicated page. https://output.circle-artifacts.com/output/job/1f1a04a5-137e-428d-9da4-c46f59eeffa4/artifacts/0/docs/index.html <img width="1073" alt="Screen Shot 2022-10-28 at 7 34 29 AM" src="https://user-images.githubusercontent.com/855818/198410686-3ef40ad2-c9c9-443c-800e-6e51e1b6a491.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2767 Reviewed By: carolineechen Differential Revision: D40627547 Pulled By: mthrok fbshipit-source-id: 098b825f242e91919126014abdab27852304ae64
-
- 27 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds back docstring for `MelSpectrogram` initializer param `onesided`. Pull Request resolved: https://github.com/pytorch/audio/pull/2799 Reviewed By: mthrok Differential Revision: D40742691 Pulled By: hwangjeff fbshipit-source-id: 7e8088fefaafe7df57bb626b8b4e9ce5317bf3a7
-
- 26 Oct, 2022 2 commits
-
-
hwangjeff authored
Summary: Initializer parameter `onesided` isn't relevant to `MelSpectrogram` — it should always be `True`. In fact, the module already assumes `onesided == True` in the filterbank it generates and fails in its forward pass when `onesided == False`. Accordingly, this PR makes param `onesided` optional and adds a deprecation warning that's fired when the param is provided. Pull Request resolved: https://github.com/pytorch/audio/pull/2797 Reviewed By: carolineechen, xiaohui-zhang Differential Revision: D40731238 Pulled By: hwangjeff fbshipit-source-id: 6eea8eb9d4a85a805162e03ad91682a1946f92cd
-
moto authored
Summary: StreamProcessor is constructed on top of AVStream object, and attach streams defined by client code. This commit refactor the constructor and add_stream method signature so that `add_stream`'s signature is centered around the parameters required for filter construction. Pull Request resolved: https://github.com/pytorch/audio/pull/2791 Reviewed By: xiaohui-zhang Differential Revision: D40667979 Pulled By: mthrok fbshipit-source-id: 42220832f09a7895ede3cddf969d57feeb4ef7ec
-
- 25 Oct, 2022 1 commit
-
-
moto authored
Summary: Addresses https://github.com/pytorch/audio/issues/2790. Previously AVPacket objects had duration==0. `av_interleaved_write_frame` function was inferring the duration of packets by comparing them against the next ones but It could not infer the duration of the last packet, as there is no subsequent frame, thus was omitting it from the final data. This commit fixes it by explicitly setting packet duration = 1 (one frame) only for video. (audio AVPacket contains multiple samples, so it's different. To ensure the correctness for audio, the tests were added.) Pull Request resolved: https://github.com/pytorch/audio/pull/2789 Reviewed By: xiaohui-zhang Differential Revision: D40627439 Pulled By: mthrok fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320
-
- 21 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: The motivation of generating `artifact.tar.gz` in the `build_docs` job is to easily use it for adding documentation in each stable release. But it is committed into `gh-pages` branch which causes the git repository very huge (see https://github.com/pytorch/audio/issues/2783). This PR removes the tar file from the commit. Pull Request resolved: https://github.com/pytorch/audio/pull/2786 Reviewed By: carolineechen Differential Revision: D40591152 Pulled By: nateanl fbshipit-source-id: 47df60c2ec7bcdcc40e2b6078219b9397e6bfed1
-
- 20 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2780 Pull Request resolved: https://github.com/pytorch/audio/pull/2781 Reviewed By: carolineechen, mthrok Differential Revision: D40556794 Pulled By: nateanl fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e
-
- 19 Oct, 2022 5 commits
-
-
atalman authored
Summary: Bump version to 0.14 Pull Request resolved: https://github.com/pytorch/audio/pull/2779 Reviewed By: carolineechen Differential Revision: D40523034 Pulled By: atalman fbshipit-source-id: 325e6ffcac4763a7d83ba600c2c3d9eadae03c31
-
Caroline Chen authored
Summary: add ability to load only improvised or only scripted utterances. Pull Request resolved: https://github.com/pytorch/audio/pull/2778 Reviewed By: nateanl Differential Revision: D40511865 Pulled By: carolineechen fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539
-
Omkar Salpekar authored
Summary: Creating this fresh PR since we're reverting the older commit that removed build configs from the CircleCI file. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the Linux Wheels with Nova, upload them to GH artifacts (NOT to the actual nightly channels), and ensure that they produce the same binaries as CircleCI. TO CLARIFY: this does not upload anything to nightly channels, so this PR has not effect on any existing jobs or distributed binaries. We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova. Pull Request resolved: https://github.com/pytorch/audio/pull/2719 Reviewed By: hwangjeff, weiwangmeta Differential Revision: D39866440 Pulled By: osalpekar fbshipit-source-id: 9ebf0402214fcd97cc519801276d85d336617410
-
Omkar Salpekar authored
Summary: Create a standalone GitHub Actions workflow for Docstring Sync. This job (https://app.circleci.com/pipelines/github/pytorch/audio/12625/workflows/96223ad2-0fcd-4dae-a045-d530aaf9b55c/jobs/907466) currently depends on linux wheels builds, which creates a dependency that makes the migration to Nova trickier. This PR creates a fresh standalone workflow for this job that is triggered per-PR and before nightly/release cuts. Pull Request resolved: https://github.com/pytorch/audio/pull/2720 Reviewed By: izaitsevfb, seemethere Differential Revision: D39863574 Pulled By: osalpekar fbshipit-source-id: 8599dc006693242278857a3dedeb4fddc1eed14b
-
Zhaoheng Ni authored
Summary: The file structure of VoxCeleb1 is as follows: ``` root/ └── wav/ └── speaker_id folders ``` Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders. This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users. Pull Request resolved: https://github.com/pytorch/audio/pull/2776 Reviewed By: carolineechen Differential Revision: D40483707 Pulled By: nateanl fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d
-