Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
78b6a2ec
You need to sign in or sign up before continuing.
Unverified
Commit
78b6a2ec
authored
Oct 28, 2021
by
Anton Lozhkov
Committed by
GitHub
Oct 28, 2021
Browse files
Add audio-classification benchmarking results (#14192)
parent
1dc96a76
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
25 additions
and
5 deletions
+25
-5
examples/pytorch/audio-classification/README.md
examples/pytorch/audio-classification/README.md
+25
-5
No files found.
examples/pytorch/audio-classification/README.md
View file @
78b6a2ec
...
...
@@ -33,7 +33,7 @@ python run_audio_classification.py \
--model_name_or_path
facebook/wav2vec2-base
\
--dataset_name
superb
\
--dataset_config_name
ks
\
--output_dir
wav2vec2-base-keyword-spotting
\
--output_dir
wav2vec2-base-
ft-
keyword-spotting
\
--overwrite_output_dir
\
--remove_unused_columns
False
\
--do_train
\
...
...
@@ -41,6 +41,7 @@ python run_audio_classification.py \
--fp16
\
--learning_rate
3e-5
\
--max_length_seconds
1
\
--attention_mask
False
\
--warmup_ratio
0.1
\
--num_train_epochs
5
\
--per_device_train_batch_size
32
\
...
...
@@ -52,14 +53,15 @@ python run_audio_classification.py \
--evaluation_strategy
epoch
\
--save_strategy
epoch
\
--load_best_model_at_end
True
\
--metric_for_best_model
accuracy
\
--save_total_limit
3
\
--seed
0
\
--push_to_hub
```
On a single V100 GPU (16GB), this script should run in ~1
0
minutes and yield accuracy of
**98.
4
%**
.
On a single V100 GPU (16GB), this script should run in ~1
4
minutes and yield accuracy of
**98.
26
%**
.
👀 See the results here:
[
anton-l/wav2vec2-base-keyword-spotting
](
https://huggingface.co/anton-l/wav2vec2-base-keyword-spotting
)
👀 See the results here:
[
anton-l/wav2vec2-base-
ft-
keyword-spotting
](
https://huggingface.co/anton-l/wav2vec2-base-
ft-
keyword-spotting
)
## Multi-GPU
...
...
@@ -69,7 +71,7 @@ The following command shows how to fine-tune [wav2vec2-base](https://huggingface
python run_audio_classification.py
\
--model_name_or_path
facebook/wav2vec2-base
\
--dataset_name
common_language
\
--audio_column_name
path
\
--audio_column_name
audio
\
--label_column_name
language
\
--output_dir
wav2vec2-base-lang-id
\
--overwrite_output_dir
\
...
...
@@ -91,6 +93,7 @@ python run_audio_classification.py \
--evaluation_strategy
epoch
\
--save_strategy
epoch
\
--load_best_model_at_end
True
\
--metric_for_best_model
accuracy
\
--save_total_limit
3
\
--seed
0
\
--push_to_hub
...
...
@@ -124,4 +127,21 @@ python run_audio_classification.py \
--push_to_hub
\
--hub_model_id
<username/model_id>
\
...
```
\ No newline at end of file
```
### Examples
The following table shows a couple of demonstration fine-tuning runs.
It has been verified that the script works for the following datasets:
-
[
SUPERB Keyword Spotting
](
https://huggingface.co/datasets/superb#ks
)
-
[
Common Language
](
https://huggingface.co/datasets/common_language
)
| Dataset | Pretrained Model | # transformer layers | Accuracy on eval | GPU setup | Training time | Fine-tuned Model & Logs |
|---------|------------------|----------------------|------------------|-----------|---------------|--------------------------|
| Keyword Spotting |
[
ntu-spml/distilhubert
](
https://huggingface.co/ntu-spml/distilhubert
)
| 2 | 0.9706 | 1 V100 GPU | 11min |
[
here
](
https://huggingface.co/anton-l/distilhubert-ft-keyword-spotting
)
|
| Keyword Spotting |
[
facebook/wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 12 | 0.9826 | 1 V100 GPU | 14min |
[
here
](
https://huggingface.co/anton-l/wav2vec2-base-ft-keyword-spotting
)
|
| Keyword Spotting |
[
facebook/hubert-base-ls960
](
https://huggingface.co/facebook/hubert-base-ls960
)
| 12 | 0.9819 | 1 V100 GPU | 14min |
[
here
](
https://huggingface.co/anton-l/hubert-base-ft-keyword-spotting
)
|
| Keyword Spotting |
[
asapp/sew-mid-100k
](
https://huggingface.co/asapp/sew-mid-100k
)
| 24 | 0.9757 | 1 V100 GPU | 15min |
[
here
](
https://huggingface.co/anton-l/sew-mid-100k-ft-keyword-spotting
)
|
| Common Language |
[
ntu-spml/distilhubert
](
https://huggingface.co/ntu-spml/distilhubert
)
| 2 | 0.2797 | 4 V100 GPUs | 38min |
[
here
](
https://huggingface.co/anton-l/distilhubert-ft-common-language
)
|
| Common Language |
[
facebook/wav2vec2-base
](
https://huggingface.co/facebook/wav2vec2-base
)
| 12 | 0.7945 | 4 V100 GPUs | 1h10m |
[
here
](
https://huggingface.co/anton-l/wav2vec2-base-lang-id
)
|
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment