"""Flat and cosine annealing learning rate scheduler with learning rate warmup. A linear warmup schedule will be applied, and then the learning rate will be a fixed value before starting decay.
"""Flat and cosine annealing learning rate scheduler with learning rate warmup. A linear warmup schedule will be
applied, and then the learning rate will be a fixed value before starting decay.
:param optimizer: Wrapped optimizer
:type optimizer: torch.optim.Optimizer
:param total_steps: number of total training steps
:param total_steps: Number of total training steps
:type total_steps: int
:param warmup_steps: number of warmup steps, defaults to 0
:param warmup_steps: Number of warmup steps, defaults to 0
:type warmup_steps: int, optional
:param pct_start: percent of steps before starting learning rate decay
:param pct_start: Percent of steps before starting learning rate decay
:type pct_start: float
:param eta_min: Minimum learning rate, defaults to 0