-[Automatic Speech Recognition with Sequence-to-Sequence](#sequence-to-sequence)
-[Single GPU example](#single-gpu)
-[Multi GPU example](#multi-gpu)
-[Examples](#examples)
-[Librispeech](#librispeech)
-[Single GPU example](#single-gpu-seq2seq)
-[Multi GPU example](#multi-gpu-seq2seq)
-[Examples](#examples-seq2seq)
-[Librispeech](#librispeech-seq2seq)
## Connectionist Temporal Classification
...
...
@@ -56,7 +56,7 @@ If the environment variable is not set, the training script might freeze, *i.e.*
---
### Single GPU
### Single GPU CTC
The following command shows how to fine-tune [XLSR-Wav2Vec2](https://huggingface.co/transformers/master/model_doc/xlsr_wav2vec2.html) on [Common Voice](https://huggingface.co/datasets/common_voice) using a single GPU in half-precision.
On a single V100 GPU, this script should run in *ca.* 1 hour 20 minutes and yield a CTC loss of **0.39** and word error rate
of **0.35**.
### Multi GPU
### Multi GPU CTC
The following command shows how to fine-tune [XLSR-Wav2Vec2](https://huggingface.co/transformers/master/model_doc/xlsr_wav2vec2.html) on [Common Voice](https://huggingface.co/datasets/common_voice) using 8 GPUs in half-precision.
@@ -276,7 +276,7 @@ If the environment variable is not set, the training script might freeze, *i.e.*
---
### Single GPU
### Single GPU Seq2Seq
The following command shows how to fine-tune [XLSR-Wav2Vec2](https://huggingface.co/transformers/master/model_doc/xlsr_wav2vec2.html) on [Common Voice](https://huggingface.co/datasets/common_voice) using a single GPU in half-precision.
On a single V100 GPU, this script should run in *ca.* 5 hours and yield a
cross-entropy loss of **0.405** and word error rate of **0.0728**.
### Multi GPU
### Multi GPU Seq2Seq
The following command shows how to fine-tune [XLSR-Wav2Vec2](https://huggingface.co/transformers/master/model_doc/xlsr_wav2vec2.html) on [Common Voice](https://huggingface.co/datasets/common_voice) using 8 GPUs in half-precision.