This is an example vocoder pipeline using the WaveRNN model trained with LJSpeech. WaveRNN model is based on the implementation from [this repository](https://github.com/fatchord/WaveRNN). The original implementation was
introduced in "Efficient Neural Audio Synthesis". WaveRNN and LJSpeech are available in torchaudio.

### Usage

An example can be invoked as follows.
```
python main.py \
    --batch-size 256 \
    --learning-rate 1e-4 \
    --n-freq 80 \
    --loss 'crossentropy' \
    --n-bits 8 \
```

For inference, an example can be invoked as follows.
Please refer to the [documentation](https://pytorch.org/audio/master/models.html#id10) for
available checkpoints.
```
python inference.py \
    --checkpoint-name wavernn_10k_epochs_8bits_ljspeech \
    --output-wav-path ./output.wav
```

This example would generate a file named `output.wav` in the current working directory.

### Output

The information reported at each iteration and epoch (e.g. loss) is printed to standard output in the form of one json per line. Here is an example python function to parse the output if redirected to a file.
```python
def read_json(filename):
	"""
	Convert the standard output saved to filename into a pandas dataframe for analysis.
	"""

	import pandas
	import json

    with open(filename, "r") as f:
        data = f.read()

    # pandas doesn't read single quotes for json
    data = data.replace("'", '"')

    data = [json.loads(l) for l in data.splitlines()]
    return pandas.DataFrame(data)
```