Commits · 6b8102408453c4f7cec0a806b24930e60477c132 · OpenDAS / Torchaudio

18 Dec, 2020 1 commit

[BC-Breaking] Remove download and subdir from CommonVoice (#1082) · 6b810240

moto authored Dec 18, 2020

* Removes code for download logics 
* [BC-breaking] Changes the meaning of `root` argument to the exact directory of the dataset
* Deprecates the constructor arguments for download and subdirectory construction

6b810240

15 Dec, 2020 1 commit

Using Path and glob instead of walk_files (#1069) · d25a4ddf

Krishna Kalyan authored Dec 15, 2020



- yesno
- librispeech
- libritts
- speechcommands
Co-authored-by: krishnakalyan3 <skalyan@cloudera.com>
Co-authored-by: Vincent Quenneville-Belair <vincentqb@gmail.com>

d25a4ddf

11 Dec, 2020 2 commits
- Cherry-pick 'Disallow download=True in CommonVoice (#1076)' (#1080) · 19fc580d
  moto authored Dec 11, 2020
  
  19fc580d
- Revert "no longer download CommonVoice directly (#1018)" (#1079) · 366cef83
  moto authored Dec 11, 2020
```
This reverts commit 09a6fca1.
```
  366cef83
08 Dec, 2020 1 commit

Fbsync (#1038) · a2085b85

moto authored Dec 08, 2020

* Import torchaudio #1034 70f429a4

Summary: Import torchaudio #1027 0cf4b8a9

Reviewed By: vincentqb, cpuhrsch

Differential Revision: D24958707

fbshipit-source-id: d06dd6b59197cc2c16bec5a9012cbf33a172b6b3

* Import torchaudio #1066 4406a6bb

Summary: Import up to #1066

Reviewed By: cpuhrsch

Differential Revision: D25373068

fbshipit-source-id: 890d36a25259b93428b3037c3123ff5a2cacfa04

a2085b85

03 Dec, 2020 1 commit
- no longer download CommonVoice directly (#1018) · 09a6fca1
  Vincent QB authored Dec 03, 2020
```
no longer allow to download the dataset directly. deprecate: download and url. add language.
```
  09a6fca1
18 Nov, 2020 2 commits
- Add pathlib support for LIBRITTS and LIBRISPEECH (#1046) · b5c16d33
  Bhargav Kathivarapu authored Nov 19, 2020
  
  b5c16d33
- Add pathlib support for TEDLIUM (#1045) · 37b4e136
  Bhargav Kathivarapu authored Nov 19, 2020
  
  37b4e136
17 Nov, 2020 1 commit
- Add pathlib support for SPEECHCOMMANDS (#1039) · 550b6a30
  Bhargav Kathivarapu authored Nov 18, 2020
  
  550b6a30
16 Nov, 2020 2 commits
- Add pathlib.Path support to `gtzan` (#1032) · edaeda4f
  Kshiteej K authored Nov 16, 2020
```
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>
```
  edaeda4f
- Pathlib support for VCTK and LJSPEECH (#1028) · 55175003
  Bhargav Kathivarapu authored Nov 16, 2020
  
  55175003
13 Nov, 2020 3 commits
- Add pathlib.Path support to `commonvoice` (#1027) · 0cf4b8a9
  Kshiteej K authored Nov 14, 2020
  
  0cf4b8a9
- YesNo Dataset Pathlib change (#1015) · b9ee0139
  Bhargav Kathivarapu authored Nov 13, 2020
```
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>
```
  b9ee0139
- Add pathlib.Path support to `cmuarctic` (#1025) · 5630fe35
  Kshiteej K authored Nov 13, 2020
  
  5630fe35
27 Oct, 2020 1 commit
- Add SpeechCommands train/valid/test split (#966) · b34bc7d3
  Vincent QB authored Oct 27, 2020
  
  b34bc7d3
13 Oct, 2020 1 commit
- Make VCTK_092 return regular type for the consistency (#949) · 2c07658b
  moto authored Oct 13, 2020
  
  2c07658b
09 Oct, 2020 1 commit
- fix tedlium load_audio (#934) · 4f0acc58
  Vincent QB authored Oct 09, 2020
```
and add test on other backend.
```
  4f0acc58
02 Oct, 2020 1 commit
- Update docstrings/documentations of all the datasets (#931) · e3d1d746
  moto authored Oct 02, 2020
  
  e3d1d746
15 Sep, 2020 1 commit
- Add tedlium dataset (#882) · 914a846d
  Jaime Ferrando Huertas authored Sep 15, 2020
  
  914a846d
20 Aug, 2020 1 commit

Update VCTK_092 interface and add tests (#875) · 2205cc9e

JianwuXu authored Aug 20, 2020

* Tweak docstring, audio_ext, load method signature and constructor of VCTK_092

* Add test for VCTK_092 dataset.

2205cc9e

19 Aug, 2020 1 commit

Add VCTK_092 dataset (#812) · 4bfebd85

Abhishek Dubey authored Aug 19, 2020



* Added version 0.92 of VCTK dataset
Signed-off-by: Abhishek Dubey <abhi.dubey011999@gmail.com>

4bfebd85

27 Jul, 2020 1 commit
- Add unit test for LJSpeech dataset (#826) · 6fcbff9c
  Lawrence Chen authored Jul 27, 2020
```
Co-authored-by: lawrencechen <lawrencechen@devvm3189.vll0.facebook.com>
```
  6fcbff9c
23 Jul, 2020 2 commits
- Make walk_files traverse in alphabetical and breadth-first order (#814) · 1def3fa9
  moto authored Jul 23, 2020
  
  1def3fa9
- Make GTZAN dataset sorted and use on-the-fly data in GTZAN test (#819) · 68f6a6a0
  moto authored Jul 23, 2020
  
  68f6a6a0
20 Jul, 2020 1 commit

Add LibriTTS dataset (#790) · 4b8aad7a

jimchen90 authored Jul 20, 2020



* Add libritts

Add LibriTTS dataset draft

* Add libritts

Use two separate ids for utterance_id.

* Update output form

Use full_id as utterance_id.

* Update format

Add space and test black format

* Update test method

* Add audio and text test

Generate audio and test files on-the-fly in test 

* Update format

* Fix test error and remove assets libritts

The test error is fixed by sorting the file in 4th element instead of 2nd element in samples. Since the files are generated on-the-fly, so the the libritts files in assets are removed.

* Add seed in `get_whitenoise` function

* Change utterance to text

Change `_utterance` to `_text`.
Co-authored-by: Ji Chen <jimchen90@devfair0160.h2.fair>

4b8aad7a

17 Jul, 2020 1 commit

Changed GTZAN so that it only traverses filenames belonging to the dataset (#791) · 47eb1e6a

Emmanouil Theofanis Chourdakis authored Jul 17, 2020

* Addressed review issues in PR #668

* Changed GTZAN so that it only traverses filenames belonging to the dataset

Now, instead of walking the whole directory and subdirectories of the dataset
GTZAN only looks for files under a `genre`/`genre`.`5 digit number`.wav format, where `genre` is an allowed GTZAN genre label.
This allows moving or removing files from the dataset (e.g. for fixing duplication or mislabeling issues).

47eb1e6a

10 Jun, 2020 1 commit

Add cmu_arctic dataset (#710) · 55b5c80c

jimchen90 authored Jun 10, 2020



* Add cmu_arctic dataset

* add dataset name

* update audio test file with whitenoise.wav file

* add test text file

* update text method and file name

* update comment

* change datasets order in doc

* add line length
Co-authored-by: Ji Chen <jimchen90@devfair0160.h2.fair>

55b5c80c

02 Jun, 2020 1 commit

Added the popular GTZAN dataset: (#668) · b0367251

Emmanouil Theofanis Chourdakis authored Jun 03, 2020



* Added the popular GTZAN dataset:

* Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets.
* Added the appropriate test function in test_datasets.py.
* Added the GTZAN class in the datasets.rst documentation file.

* Addressed review issues in PR #668

* Added dummy noise .wav in `test/assets/`
* Removed transforms of input and output from the dataset
  `__init__` function, as well as the corresponding methods.
* Replaced rendundant `filtered` and `subset` methods from
  class initialization and also changed the corresponding
  assertion message.

* Fixed E303: too many blank lines error

* Added GTZAN to __init__.__all__

* Fixed incorrectly not importing GTZAN

* removed duplicate warning

* lint
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

b0367251

21 Apr, 2020 2 commits

Fix inline typing for mypy (#544) · 9d40302d

Tomás Osório authored Apr 21, 2020

* fix inline typing for mypy

* fix flake8

* change check position

* fix for py3.5

* fix for py3.5

* change to inline typing

* add inline typing

9d40302d

Add datasets checksum (#499) · c53ceb84

Bhargav Kathivarapu authored Apr 21, 2020

* add checksums

* checksum function changes

* function Docstring change

* checksums moved to Dataset Modules

c53ceb84

07 Apr, 2020 1 commit

Inline typing utils dataset (#522) · 4e5ee9f1

Tomás Osório authored Apr 07, 2020



* add inline typing to utils Dataset

* add inline typing to common_utils

* add missing inline typing

* add typing to kwarg

* add missing inline typing

* update docstring

* undo indentation
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

4e5ee9f1

06 Apr, 2020 1 commit

Datasets inline typing (#511) · 3695a0ef

Tomás Osório authored Apr 06, 2020



* add CommonDataset Inline typing

* inline Typing librispeech

* add inline typing ljspeech

* add inline typing speechcommands

* add inline typing to vctk

* add inline typing yesno

* apply type to __getitem__
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

3695a0ef

03 Apr, 2020 1 commit

Fix common voice dataset (#498) · 9b288109

Tomás Osório authored Apr 03, 2020

* fix download

* fix reading tsv archive

* add new languages

* maintain same structure as other datasets

* update CommonVoice Tests

* fix

* change directory name

* remove extra line

9b288109

02 Apr, 2020 1 commit

[BC breaking] fix issue with VCTK dataset (#484) · eb5b5a02

Tomás Osório authored Apr 02, 2020



* fix issue with VCTK dataset

* update docstring

* filter out folder p315

* add hidden except_folder has hidden variable

* maintain structure

* lint

* remove space
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

eb5b5a02

01 Apr, 2020 1 commit
- Replace six with python3 version (#486) · d069fb9f
  Bhargav Kathivarapu authored Apr 01, 2020
  
  d069fb9f
22 Feb, 2020 1 commit

Adding Speech Command Dataset (#437) · 4d58bc46

Tomás Osório authored Feb 22, 2020



* add speechcommand dataset and test

* prepend the full path to each result

* add missing param on docstring in walk_files

* add file to run tests on SpeechCommand Dataset

* reduce logic

* update test on SpeechCommands

* correct the indentation on docstring walk_files

* flake8 compliance

* change tuple type returned. move path split logic in load item.

* typo in name.

* redundant file path.

* filter background noise.
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

4d58bc46

20 Feb, 2020 1 commit
- LJ Speech dataset (#439) · 32bae85c
  Taras Sereda authored Feb 20, 2020
```
* LJ Speech dataset

* refactoring

as per @vincentqb's suggestions
```
  32bae85c
12 Feb, 2020 1 commit
- adding dev-other. (#433) · ffeee199
  Vincent QB authored Feb 12, 2020
  
  ffeee199
13 Jan, 2020 1 commit
- Upgrading to UserWarning so that the user gets the warning. (#402) · 45498f26
  Vincent QB authored Jan 13, 2020
  
  45498f26
27 Dec, 2019 1 commit

Fix several errors in tests run by Travis (#380) · 9801caf6

Karl Ostmo authored Dec 27, 2019

* Declare file encoding to support special characters

* fix missing utf_8_encoder error in Travis tests

* Py 2.7 backwards-compat iterator

* ensure integer argument to torch.nn.functional.pad

* cast match.ceil result as integer

9801caf6