Add some comments

e9d053c2 · Chenggang Zhao · 1d3963d2 · e9d053c2
Commit e9d053c2 authored Sep 15, 2025 by Chenggang Zhao
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

deep_ep/buffer.py deep_ep/buffer.py +2 -1

No files found.
--- a/deep_ep/buffer.py
+++ b/deep_ep/buffer.py
@@ -402,6 +402,7 @@ class Buffer:
            x: `[num_tokens, hidden]` with `torch.bfloat16`, the tokens to send for reducing to its original ranks.
            handle: a must-set communication handle, you can obtain this from the dispatch function.
            topk_weights: `[num_tokens, num_topk]` with `torch.float`, the tokens' top-k weights for reducing to its original ranks.
+            bias: 0, 1 or 2 `[num_tokens, hidden]` with `torch.bfloat16` final bias to the output.
            config: the performance tuning config.
            previous_event: the event to wait before actually executing the kernel.
            async_finish: the current stream will not wait for the communication kernels to be finished if set.
@@ -552,7 +553,7 @@ class Buffer:
                monitoring.
            dispatch_wait_recv_cost_stats: a cumulative time spent waiting to receive each token tensor for statistics,
                which should have shape `[num_ranks, num_ranks]` and be typed as `torch.int64`.
-                This is useful for detecting and pre-cisely localizing slow anomalies.
+                This is useful for detecting and precisely localizing slow anomalies.
            use_fp8: whether to enable FP8 casting, with this, the received data will be a tuple of FP8 tensor and scaling factors.
            round_scale: whether round the scaling factors into power of 2.
            use_ue8m0: whether use UE8M0 as scaling factor format (available only with `round_scale=True`).