Commit 85f58cf1 authored by Stanislav Pidhorskyi's avatar Stanislav Pidhorskyi Committed by Facebook GitHub Bot
Browse files

Added precise division function to fix rounding issues due to `-use_fast_math`

Summary:
This diff fixes some long standing issue with skinning weights some times been negative.

Since initial value of skinning weights are always non-negative, and blending coefficients are supposed to be in range [0..1], and such blending of skinning weight should be non-negative.

Unfortunately that was not the case in practice, despite various clamps.

The issue was hunted down to this part of the code:

  c_a = mass_a / mass_ab;
  c_b = 1.0f - c_a;

Even if `mass_a` matches `mass_ab` bit-perfect, `c_a` might not be equal to `1.0`, but some times to `0.999999940395355224609375` and some times to `1.000000119209289550781250`. The later value causes `c_b` to be negative, which leads to negative skinning weights.

tsimk figured out that this behavior is due to the nvcc flag `-use_fast_math`  which makes all devision operators `x/y` to compile to `__fdividef(x, y)` which it turn somehow does not produce exactly 1.0 when dividing same, bit-perfect numbers. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/#intrinsic-functions .
D71423305

Reviewed By: phg1024

Differential Revision: D71436810

fbshipit-source-id: 64c4e6368d07368ee75997da088d3952ed0c36d0
parent 9fc8931f
......@@ -73,6 +73,34 @@ HD_FUNC double saturate(double a) {
return fmin(fmax(a, 0.0), 1.0);
}
// Do IEEE-compliant division even if `-use_fast_math` or `-prec-div=false` is set.
// Useful when most of the code can be compiled with `-use_fast_math` but individual division
// operations need to be precise. In particular, when diving a number by itself has to return
// exactly 1.0 guaranteed
HD_FUNC float precise_div(float a, float b) {
return HOST_DEVICE_DISPATCH(a / b, __fdiv_rn(a, b));
}
// See function above. This is overload for double. There is no fast division for doubles, but
// it can be merged with additions into mad operation. Using this function would guarantee
// that it won't be merged.
HD_FUNC double precise_div(double a, double b) {
return HOST_DEVICE_DISPATCH(a / b, __ddiv_rn(a, b));
}
// Using this function will always result in using fast division, no matter if `-use_fast_math` or
// `-prec-div=false` is set or not. Warning, this might produce result slightly larger or smaller
// than 1.0 when dividing exactly the same (bit-wise) numbers, which can lead to unexpected results.
HD_FUNC float approx_div(float a, float b) {
return HOST_DEVICE_DISPATCH(a / b, __fdividef(a, b));
}
// See functions above. This variant is not actually useful since there is no fast division for
// double. But it exists to enable writing templated code that works with both float and double
HD_FUNC double approx_div(double a, double b) {
return a / b;
}
// If NVCC then use builtin abs/max/min/sqrt/rsqrt.
// All of them have overloads for ints, floats, and doubles,defined in
// `cuda/crt/math_functions.hpp` thus no need for explicit usage of e.g. fabsf
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment