"git@developer.sourcefind.cn:OpenDAS/vision.git" did not exist on "9a6c8bbebdde22b37255f6a78816839823f28d8b"
Commit 85f58cf1 authored by Stanislav Pidhorskyi's avatar Stanislav Pidhorskyi Committed by Facebook GitHub Bot
Browse files

Added precise division function to fix rounding issues due to `-use_fast_math`

Summary:
This diff fixes some long standing issue with skinning weights some times been negative.

Since initial value of skinning weights are always non-negative, and blending coefficients are supposed to be in range [0..1], and such blending of skinning weight should be non-negative.

Unfortunately that was not the case in practice, despite various clamps.

The issue was hunted down to this part of the code:

  c_a = mass_a / mass_ab;
  c_b = 1.0f - c_a;

Even if `mass_a` matches `mass_ab` bit-perfect, `c_a` might not be equal to `1.0`, but some times to `0.999999940395355224609375` and some times to `1.000000119209289550781250`. The later value causes `c_b` to be negative, which leads to negative skinning weights.

tsimk figured out that this behavior is due to the nvcc flag `-use_fast_math`  which makes all devision operators `x/y` to compile to `__fdividef(x, y)` which it turn somehow does not produce exactly 1.0 when dividing same, bit-perfect numbers. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/#intrinsic-functions .
D71423305

Reviewed By: phg1024

Differential Revision: D71436810

fbshipit-source-id: 64c4e6368d07368ee75997da088d3952ed0c36d0
parent 9fc8931f
...@@ -73,6 +73,34 @@ HD_FUNC double saturate(double a) { ...@@ -73,6 +73,34 @@ HD_FUNC double saturate(double a) {
return fmin(fmax(a, 0.0), 1.0); return fmin(fmax(a, 0.0), 1.0);
} }
// Do IEEE-compliant division even if `-use_fast_math` or `-prec-div=false` is set.
// Useful when most of the code can be compiled with `-use_fast_math` but individual division
// operations need to be precise. In particular, when diving a number by itself has to return
// exactly 1.0 guaranteed
HD_FUNC float precise_div(float a, float b) {
return HOST_DEVICE_DISPATCH(a / b, __fdiv_rn(a, b));
}
// See function above. This is overload for double. There is no fast division for doubles, but
// it can be merged with additions into mad operation. Using this function would guarantee
// that it won't be merged.
HD_FUNC double precise_div(double a, double b) {
return HOST_DEVICE_DISPATCH(a / b, __ddiv_rn(a, b));
}
// Using this function will always result in using fast division, no matter if `-use_fast_math` or
// `-prec-div=false` is set or not. Warning, this might produce result slightly larger or smaller
// than 1.0 when dividing exactly the same (bit-wise) numbers, which can lead to unexpected results.
HD_FUNC float approx_div(float a, float b) {
return HOST_DEVICE_DISPATCH(a / b, __fdividef(a, b));
}
// See functions above. This variant is not actually useful since there is no fast division for
// double. But it exists to enable writing templated code that works with both float and double
HD_FUNC double approx_div(double a, double b) {
return a / b;
}
// If NVCC then use builtin abs/max/min/sqrt/rsqrt. // If NVCC then use builtin abs/max/min/sqrt/rsqrt.
// All of them have overloads for ints, floats, and doubles,defined in // All of them have overloads for ints, floats, and doubles,defined in
// `cuda/crt/math_functions.hpp` thus no need for explicit usage of e.g. fabsf // `cuda/crt/math_functions.hpp` thus no need for explicit usage of e.g. fabsf
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment