add some more comments

017d67e2 · Umang Yadav · 61775eab · 017d67e2
Commit 017d67e2 authored Nov 17, 2023 by Umang Yadav
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 0 deletions

src/include/migraphx/float8_impl.hpp src/include/migraphx/float8_impl.hpp +13 -0

No files found.
--- a/src/include/migraphx/float8_impl.hpp
+++ b/src/include/migraphx/float8_impl.hpp
@@ -192,6 +192,19 @@ constexpr uint8_t cast_to_f8(T f_x, bool stoch = false, uint32_t rng = 0)
    uint32_t drop_mask = (1u << (mfmt - Wm)) - 1;
    bool odd =
        mantissa & (1u << (mfmt - Wm)); // if the least significant bit that is not truncated is 1
+    /*
+    This part is doing rounding by adding mantissa part that is going to get dropped.
+    e.g. if the dropped part for less than 0.5 than it would round down.
+    if the dropped part is more than 0.5 then it would round up by rolling carry to LSB of retained
+    mantissa.
+    For the mid point when bit pattern is like this for Odd: `xy1:10000000` for Odd and
+    `xy0:10000000` for the Even.  where `:` is delimiter for dropped v/s retained part.
+    For the odd case :
+    this will add xy1:10000000 + 000:10000000 which would roll over carry to LSB of retained
+    part making it RNE.
+    For the even case : this will add xy0:10000000 + 000:01111111 which would
+    round down and keep number Even
+    */
    mantissa += (stoch ? rng : (midpoint ? (odd ? mantissa : mantissa - 1) : mantissa)) & drop_mask;
    // Now we deal with overflow