// Step 1: load existing mean and inv-variance do final welford reduction on mean and
// clang-format off
// variance
// Step 1: load existing mean and inv-variance, or do final welford reduction on mean and variance as well as get inv-variance = 1/sqrt(epsilon+variance)
// Input: x, dy, scale, savedMean and savedInvVar (optional), reduce_size
// Input: x, dy, scale, savedMean and savedInvVar (optional), reduce_size
// Output: dx, dscale, dbias
// Output: dx, dscale, dbias
// Step 1: calculate to get mean and invVariance using welford method (if savedMean and savedInvVar not available)
// Step 1: calculating mean and inv-variance using welford method (if savedMean/savedInvVar not available), where inv-variance = 1/sqrt(epsilon+variance)
// Step 2: reduce on dy and dy *(x - mean) * invVariance to get dbias and dscale respectively
// Step 1: calculating mean and inv-variance using welford method (if savedMean/savedInvVar not available), where inv-variance = 1/sqrt(epsilon+variance)