Commit 85b8cac7 authored by sanchit-gandhi's avatar sanchit-gandhi
Browse files

more renaming

parent cd26d4f4
...@@ -27,9 +27,9 @@ In the proceeding example, we follow Stability's approach by taking audio embedd ...@@ -27,9 +27,9 @@ In the proceeding example, we follow Stability's approach by taking audio embedd
model, and training the linear classifier on a combination of three open-source datasets: model, and training the linear classifier on a combination of three open-source datasets:
1. The English Accented (`en_accented`) subset of [Voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) 1. The English Accented (`en_accented`) subset of [Voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli)
2. The train split of [VCTK](https://huggingface.co/datasets/vctk) 2. The train split of [VCTK](https://huggingface.co/datasets/vctk)
3. The dev split of [EdAcc](https://huggingface.co/datasets/sanchit-gandhi/edacc) 3. The dev split of [EdAcc](https://huggingface.co/datasets/edinburghcstr/edacc)
The model is subsequently evaluated on the test split of [EdAcc](https://huggingface.co/datasets/sanchit-gandhi/edacc) The model is subsequently evaluated on the test split of [EdAcc](https://huggingface.co/datasets/edinburghcstr/edacc)
to give the final classification accuracy. to give the final classification accuracy.
```bash ```bash
...@@ -37,11 +37,11 @@ to give the final classification accuracy. ...@@ -37,11 +37,11 @@ to give the final classification accuracy.
python run_audio_classification.py \ python run_audio_classification.py \
--model_name_or_path "facebook/mms-lid-126" \ --model_name_or_path "facebook/mms-lid-126" \
--train_dataset_name "vctk+facebook/voxpopuli+sanchit-gandhi/edacc" \ --train_dataset_name "vctk+facebook/voxpopuli+edinburghcstr/edacc" \
--train_dataset_config_name "main+en_accented+default" \ --train_dataset_config_name "main+en_accented+default" \
--train_split_name "train+test+validation" \ --train_split_name "train+test+validation" \
--train_label_column_name "accent+accent+accent" \ --train_label_column_name "accent+accent+accent" \
--eval_dataset_name "sanchit-gandhi/edacc" \ --eval_dataset_name "edinburghcstr/edacc" \
--eval_dataset_config_name "default" \ --eval_dataset_config_name "default" \
--eval_split_name "test" \ --eval_split_name "test" \
--eval_label_column_name "accent" \ --eval_label_column_name "accent" \
......
#!/usr/bin/env bash #!/usr/bin/env bash
python run_dataset_concatenation.py \ python run_dataset_concatenation.py \
--dataset_name "sanchit-gandhi/vctk+facebook/voxpopuli+sanchit-gandhi/edacc-normalized" \ --dataset_name "sanchit-gandhi/vctk+facebook/voxpopuli+edinburghcstr/edacc-normalized" \
--dataset_config_name "default+en_accented+default" \ --dataset_config_name "default+en_accented+default" \
--dataset_split_name "train+test+validation" \ --dataset_split_name "train+test+validation" \
--label_column_name "accent+accent+accent" \ --label_column_name "accent+accent+accent" \
...@@ -11,7 +11,7 @@ python run_dataset_concatenation.py \ ...@@ -11,7 +11,7 @@ python run_dataset_concatenation.py \
--output_dir "./concatenated-dataset" --output_dir "./concatenated-dataset"
python run_dataset_concatenation.py \ python run_dataset_concatenation.py \
--dataset_name "sanchit-gandhi/edacc-normalized" \ --dataset_name "edinburghcstr/edacc-normalized" \
--dataset_config_name "default" \ --dataset_config_name "default" \
--dataset_split_name "test" \ --dataset_split_name "test" \
--label_column_name "accent" \ --label_column_name "accent" \
......
...@@ -73,7 +73,7 @@ def main(): ...@@ -73,7 +73,7 @@ def main():
"How would you describe your accent in English? (e.g. Italian, Glaswegian)" "How would you describe your accent in English? (e.g. Italian, Glaswegian)"
] ]
accent_dataset = load_dataset("sanchit-gandhi/edacc_accents", split="train") accent_dataset = load_dataset("edinburghcstr/edacc_accents", split="train")
def format_dataset(batch): def format_dataset(batch):
batch["speaker_id"] = ( batch["speaker_id"] = (
......
...@@ -3,5 +3,5 @@ ...@@ -3,5 +3,5 @@
python prepare_edacc.py \ python prepare_edacc.py \
--dataset_dir "/fsx/sanchit/edacc/edacc_v1.0" \ --dataset_dir "/fsx/sanchit/edacc/edacc_v1.0" \
--output_dir "/fsx/sanchit/edacc_processed" \ --output_dir "/fsx/sanchit/edacc_processed" \
--hub_dataset_id "sanchit-gandhi/edacc-normalized" \ --hub_dataset_id "edinburghcstr/edacc-normalized" \
--push_to_hub --push_to_hub
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment