[Feature] Support region as input of T.cumsum (#1426)
* [Feature] Support region as input of T.cumsum
- Extend T.cumsum to accept BufferRegion and BufferLoad inputs in addition to Buffer
- This enables operations on buffer slices/regions like:
T.cumsum(InputG_fragment[i * chunk_size:(i + 1) * chunk_size], dim=0)
- Update cumsum_fragment to handle region inputs properly
- Add comprehensive tests for 1D and 2D region inputs including normal and reverse modes
Fixes #879
* Fix formatting and add docstring for cumsum_fragment
- Add comprehensive docstring for cumsum_fragment function
- Format code according to ruff style guidelines
* Fix CodeRabbit review issues
- Fix negative dimension bounds check (dim < -len(shape) instead of dim <= -len(shape))
- Add src/dst shape compatibility validation for out-of-place cumsum
- Update copy() type annotation to accept BufferRegion as dst parameter
- Fix test in-place mutation issues by using out-of-place cumsum operations
- Add non-divisible size test cases for tail region coverage
* Fix out-of-bounds access in region tests
- Add bounds clamping using T.min() for chunk_end calculations
- Prevents accessing beyond tensor bounds for non-divisible sizes
- Matches reference implementation behavior
- Fixes both 1D and 2D region test cases
* Fix region test: use simple slice expressions instead of T.min()
- Remove T.min() which cannot be used directly in slice indices
- Use chunk_start + chunk_size form instead
- Rely on system's automatic bounds checking for non-divisible sizes
- Update comments to reflect this approach
* Fix cumsum region: use region extents in lowering and update tests for shared memory
* Simplify fragment scope check using is_fragment()
---------
Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
Showing
Please register or sign in to comment