add pre-trained wav2vec model

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884 Differential Revision: D17774515 Pulled By: alexeib fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f

add pre-trained wav2vec model
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884 Differential Revision: D17774515 Pulled By: alexeib fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f
4cb895b6 · alexeib · Facebook Github Bot · 315c463d · 4cb895b6 · 4cb895b6
Commit 4cb895b6 authored Oct 04, 2019 by alexeib Committed by Facebook Github Bot Oct 04, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 22 additions and 0 deletions

README.md README.md +1 -0

examples/wav2vec/README.md examples/wav2vec/README.md +21 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -99,6 +99,7 @@ as well as example training and evaluation commands.

 - [Translation](examples/translation/README.md): convolutional and transformer models are available
 - [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available
+- [wav2vec](examples/wav2vec/README.md): wav2vec large model is available

 We also have more detailed READMEs to reproduce results from specific papers:
 - [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )

--- a/examples/wav2vec/README.md
+++ b/examples/wav2vec/README.md
@@ -2,6 +2,27 @@

 Example to train a wav2vec model as described in [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](https://arxiv.org/abs/1904.05862).

+## Pre-trained models
+
+Description | Parameters | Dataset | Model
+---|---:|---|---
+Wav2Vec large <br> ([(Schneider et al., 2019)](https://arxiv.org/abs/1904.05862)) | 32.5M | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_large.pt)
+
+#### Example usage:
+```python
+import torch
+from fairseq.models.wav2vec import Wav2VecModel
+
+cp = torch.load('/path/to/wav2vec.pt')
+model = Wav2VecModel.build_model(cp['args'], task=None)
+model.load_state_dict(cp['model'])
+model.eval()
+
+wav_input_16khz = torch.randn(1,10000)
+z = model.feature_extractor(wav_input_16khz)
+c = model.feature_aggregator(z)
+```
+
 ## Training a new model with the CLI tools

 Given a directory containing wav files to be used for pretraining (we recommend splitting each file into separate file 10 to 30 seconds in length)