-
Dayuxiaoshui authored
* [Feature] Support region as input of T.cumsum - Extend T.cumsum to accept BufferRegion and BufferLoad inputs in addition to Buffer - This enables operations on buffer slices/regions like: T.cumsum(InputG_fragment[i * chunk_size:(i + 1) * chunk_size], dim=0) - Update cumsum_fragment to handle region inputs properly - Add comprehensive tests for 1D and 2D region inputs including normal and reverse modes Fixes #879 * Fix formatting and add docstring for cumsum_fragment - Add comprehensive docstring for cumsum_fragment function - Format code according to ruff style guidelines * Fix CodeRabbit review issues - Fix negative dimension bounds check (dim < -len(shape) instead of dim <= -len(shape)) - Add src/dst shape compatibility validation for out-of-place cumsum - Update copy() type annotation to accept BufferRegion as dst parameter - Fix test in-place mutation issues by using out-of-place cumsum operations - Add non-divisible size test cases for tail region coverage * Fix out-of-bounds access in region tests - Add bounds clamping using T.min() for chunk_end calculations - Prevents accessing beyond tensor bounds for non-divisible sizes - Matches reference implementation behavior - Fixes both 1D and 2D region test cases * Fix region test: use simple slice expressions instead of T.min() - Remove T.min() which cannot be used directly in slice indices - Use chunk_start + chunk_size form instead - Rely on system's automatic bounds checking for non-divisible sizes - Update comments to reflect this approach * Fix cumsum region: use region extents in lowering and update tests for shared memory * Simplify fragment scope check using is_fragment() --------- Co-authored-by:LeiWang1999 <leiwang1999@outlook.com>
869f021b