This is a reference implementation of Conv-TasNet.
> Luo, Yi, and Nima Mesgarani. "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.8 (2019): 1256-1266. Crossref. Web.
This implementation is based on [arXiv:1809.07454v3](https://arxiv.org/abs/1809.07454v3) and [the reference implementation](https://github.com/naplab/Conv-TasNet) provided by the authors.
For the usage, please checkout the [source separation README](../README.md).
## (Default) Training Configurations
The default training/model configurations follow the best non-causal implementation from the paper. (causal configuration is not implemented.)
- Sample rate: 8000 Hz
- Batch size: total 16 over distributed training workers
- Epochs: 100
- Initial learning rate: 1e-3
- Gradient clipping: maximum L2 norm of 5.0
- Optimizer: Adam
- Learning rate scheduling: Halved after 3 epochs of no improvement in validation accuracy.