If Amp uses master params distinct from the model params,
Ampcallstheparamsowneddirectlybytheoptimizer's ``param_groups`` the "master params."
then the params ``step()``\ ed by the optimizer are the master params,
and it is the master gradients (rather than the model gradients) that must be clipped.
If Amp is not using master params distinct from the model params, then the optimizer
These master params may be fully or partially distinct from ``model.parameters()``.
directly steps the model params, and the model grads must be clipped.
For example, with `opt_level="O2"`_, ``amp.initialize`` casts most model params to FP16,
creates an FP32 master param outside the model for each newly-FP16 model param,
and updates the optimizer's``param_groups``topointtotheseFP32params.
In both cases, correct practice is to clip the gradients of the params that are about to be stepped **by the optimizer** (which may be distinct from ``model.parameters()``).
Themasterparamsownedbytheoptimizer's ``param_groups`` may also fully coincide with the
model params, which is typically true for ``opt_level``\s ``O0``, ``O1``, and ``O3``.
Also, if Amp uses loss scaling, gradients must be clipped after they have been unscaled.
In all cases, correct practice is to clip the gradients of the params that are guaranteed to be
owned **by the optimizer's**``param_groups``,insteadofthoseretrievedvia``model.parameters()``.
The following pattern accounts for all possibilities, and should be correct for