Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
e118db15
Unverified
Commit
e118db15
authored
Oct 28, 2021
by
Patrick von Platen
Committed by
GitHub
Oct 28, 2021
Browse files
Update README.md
parent
01b14669
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
24 additions
and
13 deletions
+24
-13
examples/pytorch/speech-recognition/README.md
examples/pytorch/speech-recognition/README.md
+24
-13
No files found.
examples/pytorch/speech-recognition/README.md
View file @
e118db15
...
@@ -113,19 +113,30 @@ of **0.36**.
...
@@ -113,19 +113,30 @@ of **0.36**.
### Examples
### Examples
In the following a couple of demonstration fine-tuning runs are listed.
The following tables present a couple of example runs on the most popular speech-recognition datasets.
It has been verified that the script works for the following datasets:
The presented performances are by no means optimal as no hyper-parameter tuning was done. Nevertheless,
they can serve as a baseline to improve upon.
-
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
-
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
-
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
-
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs |
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs | Command to reproduce |
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------|
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------| -------------|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/wav2vec2-large-lv60
](
https://huggingface.co/facebook/wav2vec2-large-lv60
)
| 0.042 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-librispeech-clean-100h-demo-dist
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 0.21 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned/blob/main/run.sh
)
|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/hubert-large-ll60k
](
https://huggingface.co/facebook/hubert-large-ll60k
)
| 0.088 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/hubert-librispeech-clean-100h-demo-dist
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 0.21 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned/blob/main/run.sh
)
|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.36 | 8 GPU V100 | 18min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo-dist
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
unispeech-large-1500h-cv
](
https://huggingface.co/microsoft/unispeech-large-1500h-cv
)
| 0.22 | 1 GPU TITAN RTX | 35min |
[
here
](
https://huggingface.co/patrickvonplaten/unispeech-large-1500h-cv-timit
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/unispeech-large-1500h-cv-timit/blob/main/run.sh
)
|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.35 | 1 GPU V100 | 1h20min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
asapp/sew-mid-100k
](
https://huggingface.co/asapp/sew-mid-100k
)
| 0.30 | 1 GPU TITAN RTX | 28min |
[
here
](
https://huggingface.co/patrickvonplaten/sew-small-100k-timit
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/sew-small-100k-timit/blob/main/run.sh
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 0.21 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
unispeech-large-1500h-cv
](
https://huggingface.co/microsoft/unispeech-large-1500h-cv
)
| 0.22 | 1 GPU TITAN RTX | 35min |
[
here
](
https://huggingface.co/patrickvonplaten/unispeech-large-1500h-cv-timit
)
|
-
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
unispeech-sat-base
](
https://huggingface.co/microsoft/unispeech-sat-base
)
| 0.41 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/unispeech-sat-base-timit-ft
)
|
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs | Command to reproduce |
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------| -------------|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/wav2vec2-large-lv60
](
https://huggingface.co/facebook/wav2vec2-large-lv60
)
| 0.042 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-librispeech-clean-100h-demo-dist
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-librispeech-clean-100h-demo-dist/blob/main/run.sh
)
|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/hubert-large-ll60k
](
https://huggingface.co/facebook/hubert-large-ll60k
)
| 0.088 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/hubert-librispeech-clean-100h-demo-dist
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/hubert-librispeech-clean-100h-demo-dist/blob/main/run.sh
)
|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
asapp/sew-mid-100k
](
https://huggingface.co/asapp/sew-mid-100k
)
| 0.167 | 8 GPU V100 | 54min |
[
here
](
https://huggingface.co/patrickvonplaten/sew-mid-100k-librispeech-clean-100h-ft
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/sew-mid-100k-librispeech-clean-100h-ft/blob/main/run.sh
)
|
-
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs | Command to reproduce |
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------| -------------|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.36 | 8 GPU V100 | 18min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo-dist
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo-dist/blob/main/run_dist.sh
)
|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.35 | 1 GPU V100 | 1h20min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo/blob/main/run.sh
)
|
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment