Please follow the steps of data preparation for S2ST in [here](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_speech/docs/enhanced_direct_s2st_discrete_units.md).
## Pre-Training
```
cd speech2s/stpretrain_scripts
base_sc2c_enes.sh
```
## Finetune
```
cd speech2s/stpretrain_scripts
finetune_enes.sh
```
## Inference
```
cd speech2s/stpretrain_scripts
inference_ed.sh
```
## Results on Voxpopuli and Covst
## License
This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq).
[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
## Reference
If you find our work is useful in your research, please cite the following paper:
```bibtex
@article{wei2022joint,
title={Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation},
author={Wei, Kun and Zhou, Long and Zhang, Ziqiang and Chen, Liping and Liu, Shujie and He, Lei and Li, Jinyu and Wei, Furu},
metadata={"help":"weight for predictive loss for masked frames"},
)
pred_nomask_weight:float=field(
default=0.0,
metadata={"help":"weight for predictive loss for unmasked frames"},
)
loss_weights:Optional[List[float]]=field(
default=None,
metadata={"help":"weights for additional loss terms (not first one)"},
)
log_keys:List[str]=field(
default_factory=lambda:[],
metadata={"help":"output keys to log"},
)
u2t_ed_weight:float=field(
default=0.1,
metadata={"help":"weights for text ED Loss, loss will be (hubert_loss + text_mum_weight * MUM_Loss + u2t_ed_weight * CE_Loss + u2t_ctc_weight * CTC_loss)"},
)
u2t_ctc_weight:float=field(
default=0.0,
metadata={"help":"weights for text ED Loss, loss will be (hubert_loss + text_mum_weight * MUM_Loss + u2t_ed_weight * CE_Loss + u2t_ctc_weight * CTC_loss)"},
)
text_mum_weight:float=field(
default=0.0,
metadata={"help":"masked unit modeling weight from the text end"},