# egs (Examples) ## How to use? See: https://espnet.github.io/espnet/tutorial.html ## Overview of example information | Directory name | Corpus name | Task | Language | URL | Note | | ----------------------- | ------------------------------------------------------------ | ------------------------------------------ | -------------- | ------------------------------------------------------------ | ----------------------------- | |||| | aesrc2020 | Accented English Speech Recognition Challenge 2020 | ASR | EN | https://arxiv.org/abs/2102.10233 | | | aidatatang_200zh | Aidatatang_200zh A free Chinese Mandarin speech corpus | ASR | ZH | http://www.openslr.org/62/ | | | aishell | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/kysjcp | | | aishell2 | AISHELL-2 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/aishell_2 | | ami | The AMI Meeting Corpus | ASR | EN | http://groups.inf.ed.ac.uk/ami/corpus/ | | | an4 | CMU AN4 database | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ | | | arctic | CMU ARCTIC databases | TTS, VC | EN, EN -> EN | http://www.festvox.org/cmu_arctic/ | | | aurora4 | Aurora-4 database | ASR | EN | http://aurora.hsnr.de/aurora-4.html | | | babel | IARPA Babel corups | ASR | ~20 Languages | https://www.iarpa.gov/index.php/research-programs/babel | | | blizzard_2017 | Blizzard Challenge 2017 | TTS | EN | https://www.synsig.org/index.php/Blizzard_Challenge_2017 | | | chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ | | | chime5 | The 5th CHiME Speech Separation and Recognition Challenge | ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME5/index.html | | | chime6 | The 6th CHiME Speech Separation and Recognition Challenge | ASR | EN | https://chimechallenge.github.io/chime6/ | | | cmu_wilderness | CMU Wilderness Multilingual Speech Dataset | Multilingual ASR | ~100 Languages | https://github.com/festvox/datasets-CMU_Wilderness | | | commonvoice | The Mozilla Common Voice | ASR | 13 Languages | https://voice.mozilla.org/datasets | | | covost2 | CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus | ASR/Machine Translation/Speech Translation | 15+21 Language pairs | https://github.com/facebookresearch/covost | | | csj | Corpus of Spontaneous Japanese | ASR | JP | https://pj.ninjal.ac.jp/corpus_center/csj/en/ | | | csmsc | Chinese Standard Mandarin Speech Copus | TTS | ZH | https://www.data-baker.com/open_source.html | | | dipco | Dinner Party Corpus | ASR | EN | https://arxiv.org/abs/1909.13447 | | | dirha_wsj | Distant-speech Interaction for Robust Home Applications | Multi-Array ASR | EN | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj| | | fisher_callhome_spanish | Fisher and CALLHOME Spanish--English Speech Translation | ASR/Machine Translation/Speech Translation | ES->EN | https://catalog.ldc.upenn.edu/LDC2014T23 | | | fisher_swbd | Fisher English Training Speech, Switchboard-1 Release 2 | ASR | EN | https://catalog.ldc.upenn.edu/LDC2004S13, https://catalog.ldc.upenn.edu/LDC2005S13, https://catalog.ldc.upenn.edu/LDC97S62 | | | hkust | HKUST Mandarin Telephone Speech | ASR | ZH | [https://catalog.ldc.upenn.edu/LDC2005S15, https://catalog.ldc.upenn.edu/LDC2005T32](https://catalog.ldc.upenn.edu/LDC2005S15) | | | how2 | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/Machine Translation/Speech Translation | EN->PT | https://github.com/srvk/how2-dataset | | | hub4_spanish | 1997 Spanish Broadcast News Speech (HUB4-NE) | ASR | ES | https://catalog.ldc.upenn.edu/LDC98S74, https://catalog.ldc.upenn.edu/LDC98T29 | | | iwslt16 | International Workshop on Spoken Language Translation 2016 | Machine Translation | EN->DE | https://wit3.fbk.eu/mt.php?release=2016-01 | | | iwslt18 | International Workshop on Spoken Language Translation 2018 | ASR/Machine Translation/Speech Translation | EN->DE | https://sites.google.com/site/iwsltevaluation2018/Lectures-task | | | iwslt19 | International Workshop on Spoken Language Translation 2019 | ASR/Speech Translation | EN->DE | https://sites.google.com/view/iwslt-evaluation-2019/speech-translation | | iwslt21 | International Workshop on Spoken Language Translation 2021 | ASR/Machine Translation/Speech Translation | EN->DE | https://iwslt.org/2021/offline | | iwslt21_low_resource | International Workshop on Spoken Language Translation 2021 | ASR/Speech Translation | SWA->EN & SWC->FR | https://iwslt.org/2021/low-resource | | jesc | Japanese-English Subtitle Corpus | Machine Translation | EN->JP | https://nlp.stanford.edu/projects/jesc/ | | | jnas | ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS) | ASR/TTS | JP | http://research.nii.ac.jp/src/JNAS.html | | | jsalt18e2e | Multilingual End-to-end ASR for Incomplete Data Benchmark | Multilingual ASR | ~20 Languages | https://www.clsp.jhu.edu/workshops/18-workshop/multilingual-end-end-asr-incomplete-data/ | babel+ | | jsut | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS | JP | https://sites.google.com/site/shinnosuketakamichi/publication/jsut | | | jvs | JVS (Japanese versatile speech) corpus | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus | | | ksponspeech | KsponSpeech (Korean spontaneous speech) corpus | ASR | KR | https://aihub.or.kr/aidata/105 | | | li10 | Lanugage-Independent ASR task (10 languages) | Multilingual ASR | ~10 Languages | https://www.merl.com/publications/docs/TR2017-182.pdf | csj+hkust+voxforge(7lang)+wsj | | li42 | Corpora Combination with 42 languages | Multilingual ASR | ~42 Languages | | aishell+aurora4+babel+chime4+commonvoice+csj+fisher_callhome_spanish+fisher_swbd+hkust+voxforge+wsj | | libri_trans | Translation Augmented LibriSpeech Corpus | ASR/Machine Translation/Speech Translation | | https://persyval-platform.univ-grenoble-alpes.fr/DS91/detaildataset | | | librispeech | LibriSpeech ASR corpus | ASR | EN | http://www.openslr.org/12 | | | libritts | LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech | TTS | EN | http://www.openslr.org/60/ | | | ljspeech | The LJ Speech Dataset | TTS | EN | https://keithito.com/LJ-Speech-Dataset/ | | | lrs2 | The Lip Reading Sentences 2 Dataset | ASR | ENG | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html | | | lrs | The Lip Reading Sentences 2 and 3 Dataset | AVSR | ENG | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html | | | m_ailabs | The M-AILABS Speech Dataset | TTS | ~5 languages | https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ | | mucs_2021 | MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages | ASR/Code Switching | HI, MR, OR, TA, TE, GU, HI-EN, BN-EN | https://navana-tech.github.io/MUCS2021/data.html | | | mtedx | Multilingual TEDx | ASR/Machine Translation/Speech Translation | 13 Language pairs | http://www.openslr.org/100/ | | must_c | Must-C Multilingual Speech Translation Corpus | ASR/Machine Translation/Speech Translation | EN->{DE, ES, FR, IT, NL, PT, RO, RU} | https://ict.fbk.eu/must-c/ | | | | must_c_v2 | Must-C Multilingual Speech Translation Corpus | ASR/Machine Translation/Speech Translation | EN->DE | https://ict.fbk.eu/must-c/ https://iwslt.org/2021/offline | More talks that result in 20k more audio/text segments. Improved cleaning strategies able to better discard low-quality triplets. TED talks of MuST-C v2 were downloaded from the YouTube TED channel. | | puebla_nahuatl | The Puebla-Nahuatl Corpus | ASR | Nahuatl | http://www.openslr.org/89 | | | reverb | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR | EN | https://reverb2014.dereverberation.com/ | | | ru_open_stt | Russian Open Speech To Text (STT/ASR) Dataset | ASR | RU | https://github.com/snakers4/open_stt | | | swbd | The Switchboard corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC97S62 | | | tedlium2 | TED-LIUM corpus release 2 | ASR | EN | https://www.openslr.org/19/, http://www.lrec-conf.org/proceedings/lrec2014/pdf/1104_Paper.pdf | | | tedlium3 | TED-LIUM corpus release 3 | ASR | EN | http://www.openslr.org/51/, https://arxiv.org/pdf/1805.04699 | | | timit | TIMIT Acoustic-Phonetic Continuous Speech Corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1 | timit_ssc | Silent Speech Challenge | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1/, https://ftp.espci.fr/pub/sigma/ | Features extracted from ultra sound image and lip motion video. Train set and test set transcripts are from TIMIT corpus and WSJ corpus respectively | | tweb | The World English Bible | TTS | EN | https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset | | | vais1000 | VAIS-1000 | TTS | VI | https://ieee-dataport.org/documents/vais-1000-vietnamese-speech-synthesis-corpus | | | vcc20 | Voice Conversion Challenge 2020 | VC | EN->{EN, DE, FI, ZH} | http://www.vc-challenge.org/ | | | vivos | VIVOS (Vietnamese corpus for ASR) | ASR | VI | https://doi.org/10.5281/zenodo.7068130 | | | voxforge | VoxForge | ASR | 7 languages | http://www.voxforge.org/ | | | wsj | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A | | | wsj_mix | MERL WSJ0-mix multi-speaker dataset | Multispeaker ASR | EN | http://www.merl.com/demos/deep-clustering | | | yesno | The "yesno" corpus | ASR | HE | http://www.openslr.org/1 | | Yoloxóchitl-Mixtec | The Yoloxóchitl-Mixtec corpus | ASR | Mixtec | http://www.openslr.org/89 || | zeroth_korean | Zeroth-Korean | ASR | KR | http://www.openslr.org/40 ||