README.md 1.49 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
This is an example vocoder pipeline using the WaveRNN model trained with LJSpeech. WaveRNN model is based on the implementation from [this repository](https://github.com/fatchord/WaveRNN). The original implementation was
introduced in "Efficient Neural Audio Synthesis". WaveRNN and LJSpeech are available in torchaudio.

### Usage

An example can be invoked as follows.
```
python main.py \
    --batch-size 256 \
    --learning-rate 1e-4 \
    --n-freq 80 \
    --loss 'crossentropy' \
    --n-bits 8 \
```

16
17
18
19
20
21
22
23
24
25
26
For inference, an example can be invoked as follows.
Please refer to the [documentation](https://pytorch.org/audio/master/models.html#id10) for
available checkpoints.
```
python inference.py \
    --checkpoint-name wavernn_10k_epochs_8bits_ljspeech \
    --output-wav-path ./output.wav
```

This example would generate a file named `output.wav` in the current working directory.

27
28
29
30
31
### Output

The information reported at each iteration and epoch (e.g. loss) is printed to standard output in the form of one json per line. Here is an example python function to parse the output if redirected to a file.
```python
def read_json(filename):
Caroline Chen's avatar
Caroline Chen committed
32
33
34
    """
    Convert the standard output saved to filename into a pandas dataframe for analysis.
    """
35

Caroline Chen's avatar
Caroline Chen committed
36
37
    import pandas
    import json
38
39
40
41
42
43
44
45
46
47

    with open(filename, "r") as f:
        data = f.read()

    # pandas doesn't read single quotes for json
    data = data.replace("'", '"')

    data = [json.loads(l) for l in data.splitlines()]
    return pandas.DataFrame(data)
```