Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
parler-tts
Commits
85b8cac7
Commit
85b8cac7
authored
Apr 05, 2024
by
sanchit-gandhi
Browse files
more renaming
parent
cd26d4f4
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
8 additions
and
8 deletions
+8
-8
README.md
README.md
+4
-4
dataset_concatenation_scripts/run_dataset_concatenation.sh
dataset_concatenation_scripts/run_dataset_concatenation.sh
+2
-2
edacc/prepare_edacc.py
edacc/prepare_edacc.py
+1
-1
edacc/run_edacc.sh
edacc/run_edacc.sh
+1
-1
No files found.
README.md
View file @
85b8cac7
...
@@ -27,9 +27,9 @@ In the proceeding example, we follow Stability's approach by taking audio embedd
...
@@ -27,9 +27,9 @@ In the proceeding example, we follow Stability's approach by taking audio embedd
model, and training the linear classifier on a combination of three open-source datasets:
model, and training the linear classifier on a combination of three open-source datasets:
1.
The English Accented (
`en_accented`
) subset of
[
Voxpopuli
](
https://huggingface.co/datasets/facebook/voxpopuli
)
1.
The English Accented (
`en_accented`
) subset of
[
Voxpopuli
](
https://huggingface.co/datasets/facebook/voxpopuli
)
2.
The train split of
[
VCTK
](
https://huggingface.co/datasets/vctk
)
2.
The train split of
[
VCTK
](
https://huggingface.co/datasets/vctk
)
3.
The dev split of
[
EdAcc
](
https://huggingface.co/datasets/
sanchit-gandhi
/edacc
)
3.
The dev split of
[
EdAcc
](
https://huggingface.co/datasets/
edinburghcstr
/edacc
)
The model is subsequently evaluated on the test split of
[
EdAcc
](
https://huggingface.co/datasets/
sanchit-gandhi
/edacc
)
The model is subsequently evaluated on the test split of
[
EdAcc
](
https://huggingface.co/datasets/
edinburghcstr
/edacc
)
to give the final classification accuracy.
to give the final classification accuracy.
```
bash
```
bash
...
@@ -37,11 +37,11 @@ to give the final classification accuracy.
...
@@ -37,11 +37,11 @@ to give the final classification accuracy.
python run_audio_classification.py
\
python run_audio_classification.py
\
--model_name_or_path
"facebook/mms-lid-126"
\
--model_name_or_path
"facebook/mms-lid-126"
\
--train_dataset_name
"vctk+facebook/voxpopuli+
sanchit-gandhi
/edacc"
\
--train_dataset_name
"vctk+facebook/voxpopuli+
edinburghcstr
/edacc"
\
--train_dataset_config_name
"main+en_accented+default"
\
--train_dataset_config_name
"main+en_accented+default"
\
--train_split_name
"train+test+validation"
\
--train_split_name
"train+test+validation"
\
--train_label_column_name
"accent+accent+accent"
\
--train_label_column_name
"accent+accent+accent"
\
--eval_dataset_name
"
sanchit-gandhi
/edacc"
\
--eval_dataset_name
"
edinburghcstr
/edacc"
\
--eval_dataset_config_name
"default"
\
--eval_dataset_config_name
"default"
\
--eval_split_name
"test"
\
--eval_split_name
"test"
\
--eval_label_column_name
"accent"
\
--eval_label_column_name
"accent"
\
...
...
dataset_concatenation_scripts/run_dataset_concatenation.sh
View file @
85b8cac7
#!/usr/bin/env bash
#!/usr/bin/env bash
python run_dataset_concatenation.py
\
python run_dataset_concatenation.py
\
--dataset_name
"sanchit-gandhi/vctk+facebook/voxpopuli+
sanchit-gandhi
/edacc-normalized"
\
--dataset_name
"sanchit-gandhi/vctk+facebook/voxpopuli+
edinburghcstr
/edacc-normalized"
\
--dataset_config_name
"default+en_accented+default"
\
--dataset_config_name
"default+en_accented+default"
\
--dataset_split_name
"train+test+validation"
\
--dataset_split_name
"train+test+validation"
\
--label_column_name
"accent+accent+accent"
\
--label_column_name
"accent+accent+accent"
\
...
@@ -11,7 +11,7 @@ python run_dataset_concatenation.py \
...
@@ -11,7 +11,7 @@ python run_dataset_concatenation.py \
--output_dir
"./concatenated-dataset"
--output_dir
"./concatenated-dataset"
python run_dataset_concatenation.py
\
python run_dataset_concatenation.py
\
--dataset_name
"
sanchit-gandhi
/edacc-normalized"
\
--dataset_name
"
edinburghcstr
/edacc-normalized"
\
--dataset_config_name
"default"
\
--dataset_config_name
"default"
\
--dataset_split_name
"test"
\
--dataset_split_name
"test"
\
--label_column_name
"accent"
\
--label_column_name
"accent"
\
...
...
edacc/prepare_edacc.py
View file @
85b8cac7
...
@@ -73,7 +73,7 @@ def main():
...
@@ -73,7 +73,7 @@ def main():
"How would you describe your accent in English? (e.g. Italian, Glaswegian)"
"How would you describe your accent in English? (e.g. Italian, Glaswegian)"
]
]
accent_dataset
=
load_dataset
(
"
sanchit-gandhi
/edacc_accents"
,
split
=
"train"
)
accent_dataset
=
load_dataset
(
"
edinburghcstr
/edacc_accents"
,
split
=
"train"
)
def
format_dataset
(
batch
):
def
format_dataset
(
batch
):
batch
[
"speaker_id"
]
=
(
batch
[
"speaker_id"
]
=
(
...
...
edacc/run_edacc.sh
View file @
85b8cac7
...
@@ -3,5 +3,5 @@
...
@@ -3,5 +3,5 @@
python prepare_edacc.py
\
python prepare_edacc.py
\
--dataset_dir
"/fsx/sanchit/edacc/edacc_v1.0"
\
--dataset_dir
"/fsx/sanchit/edacc/edacc_v1.0"
\
--output_dir
"/fsx/sanchit/edacc_processed"
\
--output_dir
"/fsx/sanchit/edacc_processed"
\
--hub_dataset_id
"
sanchit-gandhi
/edacc-normalized"
\
--hub_dataset_id
"
edinburghcstr
/edacc-normalized"
\
--push_to_hub
--push_to_hub
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment