Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
e118db15
"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "d70fab8b2062526e9c2c60196421a8bc96c7df03"
Unverified
Commit
e118db15
authored
Oct 28, 2021
by
Patrick von Platen
Committed by
GitHub
Oct 28, 2021
Browse files
Update README.md
parent
01b14669
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
24 additions
and
13 deletions
+24
-13
examples/pytorch/speech-recognition/README.md
examples/pytorch/speech-recognition/README.md
+24
-13
No files found.
examples/pytorch/speech-recognition/README.md
View file @
e118db15
...
@@ -113,19 +113,30 @@ of **0.36**.
...
@@ -113,19 +113,30 @@ of **0.36**.
### Examples
### Examples
In the following a couple of demonstration fine-tuning runs are listed.
The following tables present a couple of example runs on the most popular speech-recognition datasets.
It has been verified that the script works for the following datasets:
The presented performances are by no means optimal as no hyper-parameter tuning was done. Nevertheless,
they can serve as a baseline to improve upon.
-
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
-
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
-
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
-
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs |
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs | Command to reproduce |
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------|
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------| -------------|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/wav2vec2-large-lv60
](
https://huggingface.co/facebook/wav2vec2-large-lv60
)
| 0.042 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-librispeech-clean-100h-demo-dist
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 0.21 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned/blob/main/run.sh
)
|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/hubert-large-ll60k
](
https://huggingface.co/facebook/hubert-large-ll60k
)
| 0.088 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/hubert-librispeech-clean-100h-demo-dist
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 0.21 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned/blob/main/run.sh
)
|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.36 | 8 GPU V100 | 18min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo-dist
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
unispeech-large-1500h-cv
](
https://huggingface.co/microsoft/unispeech-large-1500h-cv
)
| 0.22 | 1 GPU TITAN RTX | 35min |
[
here
](
https://huggingface.co/patrickvonplaten/unispeech-large-1500h-cv-timit
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/unispeech-large-1500h-cv-timit/blob/main/run.sh
)
|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.35 | 1 GPU V100 | 1h20min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
asapp/sew-mid-100k
](
https://huggingface.co/asapp/sew-mid-100k
)
| 0.30 | 1 GPU TITAN RTX | 28min |
[
here
](
https://huggingface.co/patrickvonplaten/sew-small-100k-timit
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/sew-small-100k-timit/blob/main/run.sh
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 0.21 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-fine-tuned
)
|
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
unispeech-large-1500h-cv
](
https://huggingface.co/microsoft/unispeech-large-1500h-cv
)
| 0.22 | 1 GPU TITAN RTX | 35min |
[
here
](
https://huggingface.co/patrickvonplaten/unispeech-large-1500h-cv-timit
)
|
-
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
[
TIMIT
](
https://huggingface.co/datasets/timit_asr
)
| - |
[
unispeech-sat-base
](
https://huggingface.co/microsoft/unispeech-sat-base
)
| 0.41 | 1 GPU TITAN RTX | 32min |
[
here
](
https://huggingface.co/patrickvonplaten/unispeech-sat-base-timit-ft
)
|
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs | Command to reproduce |
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------| -------------|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/wav2vec2-large-lv60
](
https://huggingface.co/facebook/wav2vec2-large-lv60
)
| 0.042 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-librispeech-clean-100h-demo-dist
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-librispeech-clean-100h-demo-dist/blob/main/run.sh
)
|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
facebook/hubert-large-ll60k
](
https://huggingface.co/facebook/hubert-large-ll60k
)
| 0.088 | 8 GPU V100 | 1h30min |
[
here
](
https://huggingface.co/patrickvonplaten/hubert-librispeech-clean-100h-demo-dist
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/hubert-librispeech-clean-100h-demo-dist/blob/main/run.sh
)
|
|
[
Librispeech
](
https://huggingface.co/datasets/librispeech_asr
)
|
`"clean"`
-
`"train.100"`
|
[
asapp/sew-mid-100k
](
https://huggingface.co/asapp/sew-mid-100k
)
| 0.167 | 8 GPU V100 | 54min |
[
here
](
https://huggingface.co/patrickvonplaten/sew-mid-100k-librispeech-clean-100h-ft
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/sew-mid-100k-librispeech-clean-100h-ft/blob/main/run.sh
)
|
-
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
| Dataset | Dataset Config | Pretrained Model | Word error rate on eval | GPU setup | Training time | Fine-tuned Model & Logs | Command to reproduce |
|-------|------------------------------|-------------|---------------|---------------|----------------------|-------------| -------------|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.36 | 8 GPU V100 | 18min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo-dist
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo-dist/blob/main/run_dist.sh
)
|
|
[
Common Voice
](
https://huggingface.co/datasets/common_voice
)
|
`"tr"`
|
[
facebook/wav2vec2-large-xlsr-53
](
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
)
| 0.35 | 1 GPU V100 | 1h20min |
[
here
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo
)
|
[
run.sh
](
https://huggingface.co/patrickvonplaten/wav2vec2-common_voice-tr-demo/blob/main/run.sh
)
|
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment