Tables.md 12.2 KB
Newer Older
“change”'s avatar
“change” committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
# FunASR-1.x.x Registration  New Model Tutorial

([简体中文](./Tables_zh.md)|English)

The original intention of the funasr-1.x.x version is to make model integration easier. The core feature is the registry and AutoModel:

*   The introduction of the registry enables the development of building blocks to access the model, compatible with a variety of tasks;
    
*   The newly designed AutoModel interface unifies modelscope, huggingface, and funasr inference and training interfaces, and supports free download of repositories;
    
*   Support model export, demo-level service deployment, and industrial-level multi-concurrent service deployment;
    
*   Unify academic and industrial model inference training scripts;
    

# Quick to get started

## AutoModel usage

### SenseVoiceSmall 模型

Input any length of voice, the output as the voice content corresponding to the text, the text has punctuation broken sentences, support Chinese, English, Japanese, Guangdong, Korean and 5 Chinese languages. \[Word-level timestamp and speaker identity\] will be supported later.

```python
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text) #👏Senior staff, Priipal Doris Jackson, Wakefield faculty, and, of course, my fellow classmates.I am honored to have been chosen to speak before my classmates, as well as the students across America today.
```

## API documentation

#### Definition of AutoModel

```plaintext
Model = AutoModel(model=[str], device=[str], ncpu=[int], output_dir=[str], batch_size= [int], hub=[str], **quargs)
```

*   `model`(str): [Model Warehouse](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo)The model name in, or the model path in the local disk
    
*   `device`(str): `cuda:0`(Default gpu0), using GPU for inference, specified. If`cpu`Then the CPU is used for inference
    
*   `ncpu`(int): `4`(Default), set the number of threads used for CPU internal operation parallelism
    
*   `output_dir`(str): `None`(Default) If set, the output path of the output result
    
*   `batch_size`(int): `1`(Default), batch processing during decoding, number of samples
    
*   `hub`(str):`ms`(Default) to download the model from modelscope. If`hf`To download the model from huggingface.
    
*   `**kwargs`(dict): All in`config.yaml`Parameters, which can be specified directly here, for example, the maximum cut length in the vad model.`max_single_segment_time=6000`(Milliseconds).
    

#### AutoModel reasoning

```plaintext
Res = model.generate(input=[str], output_dir=[str])
```

*   *   wav file path, for example: asr\_example.wav
        
    *   pcm file path, for example: asr\_example.pcm, you need to specify the audio sampling rate fs (default is 16000)
        
    *   Audio byte stream, for example: microphone byte data
        
    *   wav.scp,kaldi-style wav list (`wav_id \t wav_path`), for example:
        

```plaintext
Asr_example1./audios/asr_example1.wav
Asr_example2./audios/asr_example2.wav

```

In this input

*   Audio sampling points, for example:`audio, rate = soundfile.read("asr_example_zh.wav")`Is numpy.ndarray. batch input is supported. The type is list:`[audio_sample1, audio_sample2, ..., audio_sampleN]`
    
*   fbank input, support group batch. shape is \[batch, frames, dim\], type is torch.Tensor, for example
    
*   `output_dir`: None (default), if set, the output path of the output result
    
*   `**kwargs`(dict): Model-related inference parameters, e.g,`beam_size=10`,`decoding_ctc_weight=0.1`.
    

Detailed documentation link:[https://github.com/modelscope/FunASR/blob/main/examples/README\_zh.md](https://github.com/modelscope/FunASR/blob/main/examples/README_zh.md)

# Registry Details

Take the SenseVoiceSmall model as an example, explain how to register a new model, model link:

**modelscope:**[https://www.modelscope.cn/models/iic/SenseVoiceSmall/files](https://www.modelscope.cn/models/iic/SenseVoiceSmall/files)

**huggingface:**[https://huggingface.co/FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall)

## Model Resource Catalog

![image.png](https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/8oLl9y628rBNlapY/img/cab7f215-787f-4407-885a-14dc89ae9e02.png)

Configuration File: config.yaml

```yaml
encoder: SenseVoiceEncoderSmall
encoder_conf:
    output_size: 512
    attention_heads: 4
    linear_units: 2048
    num_blocks: 50
    tp_blocks: 20
    dropout_rate: 0.1
    positional_dropout_rate: 0.1
    attention_dropout_rate: 0.1
    input_layer: pe
    pos_enc_class: SinusoidalPositionEncoder
    normalize_before: true
    kernel_size: 11
    sanm_shfit: 0
    selfattention_layer_type: sanm


model: SenseVoiceSmall
model_conf:
    length_normalized_loss: true
    sos: 1
    eos: 2
    ignore_id: -1

tokenizer: SentencepiecesTokenizer
tokenizer_conf:
  bpemodel: null
  unk_symbol: <unk>
  split_with_space: true

frontend: WavFrontend
frontend_conf:
    fs: 16000
    window: hamming
    n_mels: 80
    frame_length: 25
    frame_shift: 10
    lfr_m: 7
    lfr_n: 6
    cmvn_file: null


dataset: SenseVoiceCTCDataset
dataset_conf:
  index_ds: IndexDSJsonl
  batch_sampler: EspnetStyleBatchSampler
  data_split_num: 32
  batch_type: token
  batch_size: 14000
  max_token_length: 2000
  min_token_length: 60
  max_source_length: 2000
  min_source_length: 60
  max_target_length: 200
  min_target_length: 0
  shuffle: true
  num_workers: 4
  sos: ${model_conf.sos}
  eos: ${model_conf.eos}
  IndexDSJsonl: IndexDSJsonl
  retry: 20

train_conf:
  accum_grad: 1
  grad_clip: 5
  max_epoch: 20
  keep_nbest_models: 10
  avg_nbest_model: 10
  log_interval: 100
  resume: true
  validate_interval: 10000
  save_checkpoint_interval: 10000

optim: adamw
optim_conf:
  lr: 0.00002
Scheduler: warmuplr
Scheduler_conf:
Warmup_steps: 25000

```

Model parameters: model.pt

Path resolution: configuration.json (not required)

```json
{
  "framework": "pytorch",
  "task" : "auto-speech-recognition",
  "model": {"type" : "funasr"},
  "pipeline": {"type":"funasr-pipeline"},
  "model_name_in_hub": {
    "ms":"", 
    "hf":""},
  "file_path_metas": {
    "init_param":"model.pt", 
    "config":"config.yaml",
    "tokenizer_conf": {"bpemodel": "chn_jpn_yue_eng_ko_spectok.bpe.model"},
    "frontend_conf":{"cmvn_file": "am.mvn"}}
}
```

The function of configuration.json is to add the model root directory to the item in file\_path\_metas, so that the path can be correctly parsed. For example, assume that the model root directory is:/home/zhifu.gzf/init\_model/SenseVoiceSmall,The relevant path in config.yaml in the directory is replaced with the correct path (ignoring irrelevant configuration):

```yaml
init_param: /home/zhifu.gz F/init_model/sensevoicemail Mall/model.pt

tokenizer_conf:
  bpemodel: /home/Zhifu.gzf/init_model/SenseVoiceSmall/chn_jpn_yue_eng_ko_spectok.bpe.model

frontend_conf:
    cmvn_file: /home/zhifu.Gzf/init_model/SenseVoiceSmall/am.mvn
```

## Registry

![image](https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/a/pDaAnLxn5IX2w9Y1/73da157edae94d78b68c8d30c8e085eb0521.png)

### View Registry

```plaintext
from funasr.register import tables

tables.print()
```

Support to view the specified type of Registry: 'tables.print("model")', currently funasr has registered model as shown in the figure above. The following categories are currently predefined:

```python
    model_classes = {}
    frontend_classes = {}
    specaug_classes = {}
    normalize_classes = {}
    encoder_classes = {}
    decoder_classes = {}
    joint_network_classes = {}
    predictor_classes = {}
    stride_conv_classes = {}
    tokenizer_classes = {}
    dataloader_classes = {}
    batch_sampler_classes = {}
    dataset_classes = {}
    index_ds_classes = {}
```

### Registration Model

```python
from funasr.register import tables

@tables.register("model_classes", "SenseVoiceSmall")
class SenseVoiceSmall(nn.Module):
  def __init__(*args, **kwargs):
    ...

  def forward(
      self,
      **kwargs,
  ):  

  def inference(
      self,
      data_in,
      data_lengths=None,
      key: list = None,
      tokenizer=None,
      frontend=None,
      **kwargs,
  ):
    ...

```

Add @ tables.register("model\_classes", "SenseVoiceSmall") before the name of the class to be registered. The class needs to implement the following methods:\_\_init \_\_, forward, and inference.

register Usage:

```python
@ tables.register("registration classification", "registration name")
```

Among them, "registration classification" can be a predefined classification (see the figure above). If it is a new classification defined by oneself, the new classification will be automatically written into the registry classification. "registration name" means the name you want to register and can be used directly in the future.

Full code:[https://github.com/modelscope/FunASR/blob/main/funasr/models/sense\_voice/model.py#L443](https://github.com/modelscope/FunASR/blob/main/funasr/models/sense_voice/model.py#L443)

After the registration is complete, specify the new registration model in config.yaml to define the model.

```python
model: SenseVoiceSmall
model_conf:
  ...
```

### Registration failed

If the registration model or method is not found, assert model\_class is not None, f'{kwargs\["model"\]} is not registered '. The principle of model registration is to import the model file,You can view the specific reason for the registration failure through import. For example, the preceding model file is funasr/models/sense\_voice/model.py:

```python
from funasr.models.sense_voice.model import *
```

## Principles of Registration

*   Model: models are independent of each other. Each Model needs to create a new Model directory under funasr/models/. Do not use class inheritance method!!! Do not import from other model directories, and put everything you need into your own model directory!!! Do not modify the existing model code!!!
    
*   dataset,frontend,tokenizer, if you can reuse the existing one, reuse it directly, if you cannot reuse it, please register a new one, modify it again, and do not modify the original one!!!
    

# Independent warehouse

It can exist as a stand-alone repository for code secrecy, or as a stand-alone open source. Based on the registration mechanism, you do not need to integrate it into funasr. You can also use funasr for inference, and you can also directly perform inference. finetune is also supported.

**Using AutoModel for inference**

```python
from funasr import AutoModel

# trust_remote_code:'True' means that the model code implementation is loaded from 'remote_code', 'remote_code' specifies the location of the 'model' specific code (for example,'model.py') in the current directory, supports absolute and relative paths, and network url.
model = AutoModel (
model="iic/SenseVoiceSmall ",
trust_remote_code=True
remote_code = "./model.py", 
)
```

**Direct inference**

```python
from model import SenseVoiceSmall

m, kwargs = SenseVoiceSmall.from_pretrained(model="iic/SenseVoiceSmall")
m.eval()

res = m.inference(
    data_in=f"{kwargs ['model_path']}/example/en.mp3",
    language="auto", # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=False,
    ban_emo_unk=False,
    **kwargs,
)

print(text)
```

Trim reference:[https://github.com/FunAudioLLM/SenseVoice/blob/main/finetune.sh](https://github.com/FunAudioLLM/SenseVoice/blob/main/finetune.sh)