README.md 17.3 KB
Newer Older
sunzhq2's avatar
sunzhq2 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# egs (Examples)

## How to use?
See: https://espnet.github.io/espnet/tutorial.html

## Overview of example information

| Directory name          | Corpus name                                                  | Task                                       | Language       | URL                                                          | Note                          |
| ----------------------- | ------------------------------------------------------------ | ------------------------------------------ | -------------- | ------------------------------------------------------------ | ----------------------------- |
||||
| aesrc2020               | Accented English Speech Recognition Challenge 2020           | ASR                                        | EN             | https://arxiv.org/abs/2102.10233                                   |                               |
| aidatatang_200zh        | Aidatatang_200zh A free Chinese Mandarin speech corpus       | ASR                                        | ZH             | http://www.openslr.org/62/                                   |                               |
| aishell                 | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus       | ASR                                        | ZH             | http://www.aishelltech.com/kysjcp                            |                               |
| aishell2                | AISHELL-2 Open Source Mandarin Speech Corpus                 | ASR                                        | ZH             | http://www.aishelltech.com/aishell_2                                                         |
| ami                     | The AMI Meeting Corpus                                       | ASR                                        | EN             | http://groups.inf.ed.ac.uk/ami/corpus/                       |                               |
| an4                     | CMU AN4 database                                             | ASR/TTS                                    | EN             | http://www.speech.cs.cmu.edu/databases/an4/                  |                               |
| arctic                  | CMU ARCTIC databases                                         | TTS, VC                                    | EN, EN -> EN   | http://www.festvox.org/cmu_arctic/                           |                               |
| aurora4                 | Aurora-4 database                                            | ASR                                        | EN             | http://aurora.hsnr.de/aurora-4.html                          |                               |
| babel                   | IARPA Babel corups                                           | ASR                                        | ~20 Languages  | https://www.iarpa.gov/index.php/research-programs/babel      |                               |
| blizzard_2017           | Blizzard Challenge 2017                                      | TTS                                        | EN             | https://www.synsig.org/index.php/Blizzard_Challenge_2017     |                               |
| chime4                  | The 4th CHiME Speech Separation and Recognition Challenge    | ASR/Multichannel ASR                       | EN             | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/      |                               |
| chime5                  | The 5th CHiME Speech Separation and Recognition Challenge    | ASR                                        | EN             | http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME5/index.html                |                               |
| chime6                  | The 6th CHiME Speech Separation and Recognition Challenge    | ASR                                        | EN             | https://chimechallenge.github.io/chime6/                |                               |
| cmu_wilderness          | CMU Wilderness Multilingual Speech Dataset                   | Multilingual ASR                           | ~100 Languages | https://github.com/festvox/datasets-CMU_Wilderness           |                               |
| commonvoice             | The Mozilla Common Voice                                     | ASR                                        | 13 Languages   | https://voice.mozilla.org/datasets                           |                               |
| covost2                 | CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus | ASR/Machine Translation/Speech Translation                                        | 15+21 Language pairs   | https://github.com/facebookresearch/covost                           |                               |
| csj                     | Corpus of Spontaneous Japanese                               | ASR                                        | JP             | https://pj.ninjal.ac.jp/corpus_center/csj/en/                |                               |
| csmsc                   | Chinese Standard Mandarin Speech Copus                       | TTS                                        | ZH             | https://www.data-baker.com/open_source.html                  |                               |
| dipco                   | Dinner Party Corpus                                          | ASR                                        | EN             | https://arxiv.org/abs/1909.13447                             |                               |
| dirha_wsj               | Distant-speech Interaction for Robust Home Applications      | Multi-Array ASR                            | EN             | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj|                               |
| fisher_callhome_spanish | Fisher and CALLHOME Spanish--English Speech Translation      | ASR/Machine Translation/Speech Translation | ES->EN         | https://catalog.ldc.upenn.edu/LDC2014T23                     |                               |
| fisher_swbd             | Fisher English Training Speech, Switchboard-1 Release 2      | ASR                                        | EN             | https://catalog.ldc.upenn.edu/LDC2004S13, https://catalog.ldc.upenn.edu/LDC2005S13, https://catalog.ldc.upenn.edu/LDC97S62 |                               |
| hkust                   | HKUST Mandarin Telephone Speech                              | ASR                                        | ZH             | [https://catalog.ldc.upenn.edu/LDC2005S15, https://catalog.ldc.upenn.edu/LDC2005T32](https://catalog.ldc.upenn.edu/LDC2005S15) |                               |
| how2                    | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/Machine Translation/Speech Translation | EN->PT     | https://github.com/srvk/how2-dataset                         |                               |
| hub4_spanish            | 1997 Spanish Broadcast News Speech (HUB4-NE)                 | ASR                                        | ES             | https://catalog.ldc.upenn.edu/LDC98S74, https://catalog.ldc.upenn.edu/LDC98T29 |                               |
| iwslt16                 | International Workshop on Spoken Language Translation 2016   | Machine Translation | EN->DE         | https://wit3.fbk.eu/mt.php?release=2016-01 |                               |
| iwslt18                 | International Workshop on Spoken Language Translation 2018   | ASR/Machine Translation/Speech Translation | EN->DE         | https://sites.google.com/site/iwsltevaluation2018/Lectures-task |                               |
| iwslt19                 | International Workshop on Spoken Language Translation 2019   | ASR/Speech Translation                     | EN->DE         | https://sites.google.com/view/iwslt-evaluation-2019/speech-translation    |
| iwslt21                 | International Workshop on Spoken Language Translation 2021   | ASR/Machine Translation/Speech Translation | EN->DE         | https://iwslt.org/2021/offline |
| iwslt21_low_resource    | International Workshop on Spoken Language Translation 2021   | ASR/Speech Translation | SWA->EN & SWC->FR         | https://iwslt.org/2021/low-resource  |
| jesc                    | Japanese-English Subtitle Corpus                             | Machine Translation                        | EN->JP         | https://nlp.stanford.edu/projects/jesc/                              |                         |
| jnas                    | ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS) | ASR/TTS                              | JP             | http://research.nii.ac.jp/src/JNAS.html                      |                               |
| jsalt18e2e              | Multilingual End-to-end ASR for Incomplete Data Benchmark    | Multilingual ASR                           | ~20 Languages  | https://www.clsp.jhu.edu/workshops/18-workshop/multilingual-end-end-asr-incomplete-data/ | babel+                        |
| jsut                    | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS                                  | JP             | https://sites.google.com/site/shinnosuketakamichi/publication/jsut |                               |
| jvs                     | JVS (Japanese versatile speech) corpus                         | TTS                                      | JP             | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus |                               |
| ksponspeech             | KsponSpeech (Korean spontaneous speech) corpus                         | ASR                                      | KR             | https://aihub.or.kr/aidata/105 |                               |
| li10                    | Lanugage-Independent ASR task (10 languages)                 | Multilingual ASR                           | ~10 Languages  | https://www.merl.com/publications/docs/TR2017-182.pdf        | csj+hkust+voxforge(7lang)+wsj |
| li42                    | Corpora Combination with 42 languages                        | Multilingual ASR                           | ~42 Languages  |                                                              | aishell+aurora4+babel+chime4+commonvoice+csj+fisher_callhome_spanish+fisher_swbd+hkust+voxforge+wsj |
| libri_trans             | Translation Augmented LibriSpeech Corpus                     | ASR/Machine Translation/Speech Translation |                | https://persyval-platform.univ-grenoble-alpes.fr/DS91/detaildataset |                               |
| librispeech             | LibriSpeech ASR corpus                                       | ASR                                        | EN             | http://www.openslr.org/12                                    |                               |
| libritts                | LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech | TTS                                      | EN             | http://www.openslr.org/60/                                   |                               |
| ljspeech                | The LJ Speech Dataset                                        | TTS                                        | EN             | https://keithito.com/LJ-Speech-Dataset/                      |                               |
| lrs2                     | The Lip Reading Sentences 2 Dataset                            | ASR                                       | ENG             | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html                      |                               |
| lrs                     | The Lip Reading Sentences 2 and 3 Dataset                            | AVSR                                       | ENG             | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html  https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html                     |                               |
| m_ailabs                | The M-AILABS Speech Dataset                                  | TTS                                        | ~5 languages   | https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/    |
| mucs_2021               | MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages   | ASR/Code Switching          | HI, MR, OR, TA, TE, GU, HI-EN, BN-EN | https://navana-tech.github.io/MUCS2021/data.html                    |                               |
| mtedx                   | Multilingual TEDx | ASR/Machine Translation/Speech Translation | 13 Language pairs | http://www.openslr.org/100/                         |
| must_c                  | Must-C Multilingual Speech Translation Corpus | ASR/Machine Translation/Speech Translation                | EN->{DE, ES, FR, IT, NL, PT, RO, RU} | https://ict.fbk.eu/must-c/                    |                               |                          |
| must_c_v2               | Must-C Multilingual Speech Translation Corpus | ASR/Machine Translation/Speech Translation                | EN->DE         | https://ict.fbk.eu/must-c/ https://iwslt.org/2021/offline    | More talks that result in 20k more audio/text segments. Improved cleaning strategies able to better discard low-quality triplets. TED talks of MuST-C v2 were downloaded from the YouTube TED channel. |
| puebla_nahuatl                  | The Puebla-Nahuatl Corpus                                | ASR                                    | Nahuatl        | http://www.openslr.org/89              |                                                     |
| reverb                  | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR                          | EN             | https://reverb2014.dereverberation.com/                      |                               |
| ru_open_stt             | Russian Open Speech To Text (STT/ASR) Dataset                | ASR                                        | RU             | https://github.com/snakers4/open_stt                         |                               |
| swbd                    | The Switchboard corpus                                       | ASR                                        | EN             | https://catalog.ldc.upenn.edu/LDC97S62                       |                               |
| tedlium2                | TED-LIUM corpus release 2                                    | ASR                                        | EN             | https://www.openslr.org/19/, http://www.lrec-conf.org/proceedings/lrec2014/pdf/1104_Paper.pdf |                               |
| tedlium3                | TED-LIUM corpus release 3                                    | ASR                                        | EN             | http://www.openslr.org/51/, https://arxiv.org/pdf/1805.04699 |                               |
| timit                   | TIMIT Acoustic-Phonetic Continuous Speech Corpus             | ASR                                        | EN             | https://catalog.ldc.upenn.edu/LDC93S1
| timit_ssc               | Silent Speech Challenge                                      | ASR                                        | EN             | https://catalog.ldc.upenn.edu/LDC93S1/, https://ftp.espci.fr/pub/sigma/                          | Features extracted from ultra sound image and lip motion video. Train set and test set transcripts are from TIMIT corpus and WSJ corpus respectively |
| tweb                    | The World English Bible                                      | TTS                                        | EN             | https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset                      |                               |
| vais1000                | VAIS-1000                                                    | TTS                                        | VI             | https://ieee-dataport.org/documents/vais-1000-vietnamese-speech-synthesis-corpus  |             |
| vcc20                 | Voice Conversion Challenge 2020                              | VC                                         | EN->{EN, DE, FI, ZH} | http://www.vc-challenge.org/                               |                               |
| vivos                   | VIVOS (Vietnamese corpus for ASR)                            | ASR                                        | VI             | https://doi.org/10.5281/zenodo.7068130                            |                               |
| voxforge                | VoxForge                                                     | ASR                                        | 7 languages    | http://www.voxforge.org/                                     |                               |
| wsj                     | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete                | ASR                                        | EN             | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A |                               |
| wsj_mix                 | MERL WSJ0-mix multi-speaker dataset                          | Multispeaker ASR                           | EN             | http://www.merl.com/demos/deep-clustering                    |                               |
| yesno                   | The "yesno" corpus                                           | ASR                                        | HE             | http://www.openslr.org/1                                     |
| Yoloxóchitl-Mixtec      | The Yoloxóchitl-Mixtec corpus                                | ASR                                        | Mixtec         | http://www.openslr.org/89                                    ||
| zeroth_korean           | Zeroth-Korean                                                | ASR                                        | KR             | http://www.openslr.org/40                                    ||