If Amp uses master params distinct from the model params,
then the params ``step()``\ ed by the optimizer are the master params,
and it is the master gradients (rather than the model gradients) that must be clipped.
If Amp is not using master params distinct from the model params, then the optimizer
directly steps the model params, and the model grads must be clipped.
In both cases, correct practice is to clip the gradients of the params that are about to be stepped **by the optimizer** (which may be distinct from ``model.parameters()``).
Also, if Amp uses loss scaling, gradients must be clipped after they have been unscaled.
The following pattern accounts for all possibilities, and should be correct for
any ``opt_level``::
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
# Gradients are unscaled during context manager exit.