README.md 3.07 KB
Newer Older
Sugon_ldc's avatar
Sugon_ldc committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# WeNet Python Binding

This is a python binding of WeNet.

WeNet is a production first and production ready end-to-end speech recognition toolkit.

The best things of the binding are:

1. Multiple languages supports, including English, Chinese. Other languages are in development.
2. Non-streaming and streaming API
3. N-best, contextual biasing, and timestamp supports, which are very important for speech productions.
4. Alignment support. You can get phone level alignments this tool, on developing.

## Install

Python 3.6+ is required.

``` sh
pip3 install wenetruntime
```

## Usage

Note:

1. For macOS, wenetruntime packed `libtorch.so`, so we can't import torch and wenetruntime at the same time.
2. For Windows and Linux, wenetruntime depends on torch. Please install and import the same version `torch` as wenetruntime.

### Non-streaming Usage

``` python
import sys
import torch
import wenetruntime as wenet

wav_file = sys.argv[1]
decoder = wenet.Decoder(lang='chs')
ans = decoder.decode_wav(wav_file)
print(ans)
```

You can also specify the following parameter in `wenet.Decoder`

* `lang` (str): The language you used, `chs` for Chinese, and `en` for English.
* `model_dir` (str): is the `Runtime Model` directory, it contains the following files.
   If not provided, official model for specific `lang` will be downloaded automatically.

  * `final.zip`: runtime TorchScript ASR model.
  * `units.txt`: modeling units file
  * `TLG.fst`: optional, it means decoding with LM when `TLG.fst` is given.
  * `words.txt`: optional, word level symbol table for decoding with `TLG.fst`

  Please refer https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.md for the details of `Runtime Model`.

* `nbest` (int): Output the top-n best result.
* `enable_timestamp` (bool): Whether to enable the word level timestamp.
* `context` (List[str]): a list of context biasing words.
* `context_score` (float): context bonus score.
* `continuous_decoding` (bool): Whether to enable continuous(long) decoding.

For example:
``` python
decoder = wenet.Decoder(model_dir,
                        lang='chs',
                        nbest=5,
                        enable_timestamp=True,
                        context=['不忘初心', '牢记使命'],
                        context_score=3.0)
```

### Streaming Usage

``` python
import sys
import torch
import wave
import wenetruntime as wenet

test_wav = sys.argv[1]

with wave.open(test_wav, 'rb') as fin:
    assert fin.getnchannels() == 1
    wav = fin.readframes(fin.getnframes())

decoder = wenet.Decoder(lang='chs')
# We suppose the wav is 16k, 16bits, and decode every 0.5 seconds
interval = int(0.5 * 16000) * 2
for i in range(0, len(wav), interval):
    last = False if i + interval < len(wav) else True
    chunk_wav = wav[i: min(i + interval, len(wav))]
    ans = decoder.decode(chunk_wav, last)
    print(ans)
```

You can use the same parameters as we introduced above to control the behavior of `wenet.Decoder`


## Build on Your Local Machine

``` sh
git clone https://github.com/wenet-e2e/wenet.git
cd wenet/runtime/binding/python
python setup.py install
```