@@ -25,9 +25,16 @@ TangoFlux consists of FluxTransformer blocks which are Diffusion Transformer (Di
...
@@ -25,9 +25,16 @@ TangoFlux consists of FluxTransformer blocks which are Diffusion Transformer (Di


## Quickstart
## Quickstart
TangoFlux is a Text To Audio (TTA) Model that is capable of generating stereo audio up to 30 seconds at 44.1kHz in about 3 seconds.
## Training TangoFlux
## Training TangoFlux
We use the accelerate package from Hugging Face for multi-gpu training. Run accelerate config from terminal and set up your run configuration by the answering the questions asked. We have default an accelerator config in the configs folder.
The tangoflux_config defines the training and model hyperparamter
Our evaluation shows that inferencing with 50 steps yield the best results, which takes about 3seconds. For faster inference, consider setting steps to 25 that yield similar audio quality.
Our evaluation shows that inferencing with 50 steps yield the best results. A CFG scale of 3.5,4,4.5 yields simliar quality.
For faster inference, consider setting steps to 25 that yield similar audio quality.