"vscode:/vscode.git/clone" did not exist on "9d01d083a75e2b53e34e5b1136828a30df56a408"
  1. 20 Nov, 2025 2 commits
  2. 13 Nov, 2025 1 commit
  3. 12 Nov, 2025 1 commit
  4. 06 Nov, 2025 1 commit
  5. 05 Nov, 2025 1 commit
  6. 04 Nov, 2025 1 commit
  7. 03 Nov, 2025 2 commits
  8. 31 Oct, 2025 1 commit
    • Lei Wang's avatar
      [FFI] Rebase tvm to v0.22.0 to utilize tvm-ffi (#1108) · 10911e28
      Lei Wang authored
      
      
      * 3rdparty tvm bump
      
      * bump tvm into v0.22.0
      
      * lint fix
      
      * rebase tvm
      
      * Update submodule tvm to latest commit 3085bc4
      
      * Refactor: Update configuration retrieval in CopyNode and adjust test registration in tilelang
      
      * test fix
      
      * add requirement
      
      * atomic_fix
      
      * atomic_fix
      
      * phaseout py39
      
      * optimize
      
      * optimize
      
      * lint fix
      
      * do not clean cache
      
      * do not clean cache
      
      * [Minor] Minor update for Python versions and dependencies
      
      * [Lint] fix lint for py39
      
      * [Lint] fix lint for ROCm
      
      * [Build][CI] Sync CI changes from upstream/sdist
      
      * [Lint] fix lint for ROCm
      
      * [Build][CI] Update `repair-wheel-command`
      
      * [Minor] update abi3audit result format
      
      * [Lint] fix lint for ROCm
      
      * [BugFix] fix build
      
      * [Lint] fix lint for ROCm
      
      * [BugFix] set rpath for libtvm and libtvm_runtime
      
      * [Deps] pin apache-tvm-ffi version
      
      * [Build] set Python 3.9 Limited API for Cython target
      
      * [Build] set Python 3.9 Limited API for Cython target
      
      * [Deps] Restore Python 3.8 support
      
      * [Build] use `apache-tvm-ffi`'s `libtvm_ffi`
      
      * [BugFix] use `;` as delimiter for RPATH on macOS
      
      * [BugFix] use `--ignore-missing-dependencies` for `delocate-wheel`
      
      * [Build] support `sccache` if available
      
      * [Build] add CIBW import test
      
      * [Build][CI] enable ccache for CIBW on Linux
      
      * [BugFix] set rpath for libtvm and libtvm_runtime
      
      * Revert "[Build][CI] enable ccache for CIBW on Linux"
      
      This reverts commit cd9ab57bb5ddd2572c60bcbbebde81480a658fd3.
      
      * [CI] fix perfbench bot
      
      * [BugFix] use Python 3.9 to build wheel
      
      * [Minor] update perfbench bot envs
      
      * [BugFix] fix CIBW environment on Linux
      
      * [CI] skip import test on CentOS 7
      
      * [CI] use Python urllib to download file instead of Wget
      
      ---------
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@pku.edu.cn>
      10911e28
  9. 29 Oct, 2025 1 commit
  10. 27 Oct, 2025 3 commits
  11. 22 Oct, 2025 1 commit
  12. 20 Oct, 2025 1 commit
  13. 19 Oct, 2025 1 commit
  14. 18 Oct, 2025 1 commit
  15. 16 Oct, 2025 1 commit
  16. 15 Oct, 2025 1 commit
    • Xuehai Pan's avatar
      [CI][Refactor] Merge test CI workflow files into one (#973) · 8ce27782
      Xuehai Pan authored
      * refactor: merge test CI workflow files into one
      
      * chore: set `UV_INDEX_STRATEGY=unsafe-best-match`
      
      * feat: add AST test with Python 3.8
      
      * feat: implement manual caching mechanism for self-hosted runners
      
      * refactor: simplify cache logic for self-hosted runners
      
      * chore: clear uv cache on failure
      
      * chore: print format.sh output to logs
      
      * chore: improve uv caching
      
      * chore: disable parallel test
      
      * chore: use `PYTHONDEVMODE=1` in CI
      
      * feat: enable coredump generation
      
      * fix: fix perfbench condition
      
      * Revert "feat: enable coredump generation"
      
      This reverts commit c52da65cb572932e09905d08c43a39ec3cf47c54.
      
      * chore: move example CI down
      
      * Revert "chore: move example CI down"
      
      This reverts commit 9d8e65055e01d955c5268a9a6705d270c2de0d57.
      
      * chore: skip example `test_example_mha_sink_bwd_bhsd`
      
      * chore: skip example `test_example_gqa_sink_bwd_bhsd`
      
      * fix: fix example argument passing
      
      * fix: loosen test criteria
      
      * chore: rename `CMAKE_CONFIG...
      8ce27782
  17. 14 Oct, 2025 1 commit
  18. 13 Oct, 2025 2 commits
    • Cunxiao Ni's avatar
      [CI] Removes redundant environment variable (#1020) · eb37e459
      Cunxiao Ni authored
      * [CI] Removes redundant environment variable
      Removes the `UV_INDEX_URL`
      
      * triggle CI
      
      * triggle CI
      
      * triggle CI
      
      * triggle CI
      eb37e459
    • Yichen Yan's avatar
      [Build] Migrate to scikit-build-core (#939) · d89ba5b8
      Yichen Yan authored
      
      
      * cleanup
      
      * init
      
      * build first wheel that may not work
      
      * build cython ext
      
      * fix tvm build
      
      * use sabi
      
      * update rpath to support auditwheel
      
      * pass editible build
      
      * update ci
      
      * fix warnings
      
      * do not use ccache in self host runner
      
      * test local uv cache
      
      * test pip index
      
      * update lib search to respect new lib location
      
      * fix
      
      * update ci
      
      * enable cuda by default
      
      * update src map
      
      * fix
      
      * fix
      
      * fix
      
      * Generate version with backend and git information at build time
      
      * copy tvm_cython to wheels
      
      * fix tvm lib search
      
      * fmt
      
      * remove unused
      
      * auto detect ccache
      
      * add back backend-related files
      
      * remove jit cython adaptor to simplify code
      
      * fmt
      
      * fix ci
      
      * ci fix 2
      
      * ci fix 3
      
      * workaround metal
      
      * ci fix 4
      
      * fmt
      
      * fmt
      
      * Revert "ci fix 4"
      
      This reverts commit d1de8291c3e40927955f3ad3cf87a75c78813676.
      
      * tmp
      
      * fix metal
      
      * trivial cleanup
      
      * add detailed build-time version for cuda
      
      * add back mlc
      
      * Restore wheel info and other trivial updates
      
      * update
      
      * fix cuda
      
      * upd
      
      * fix metal ci
      
      * test for ga build
      
      * test for nvidia/cuda
      
      * test ubuntu 20
      
      * fix
      
      * fix
      
      * Do not use `uv build`
      
      * fix
      
      * fix
      
      * log toolchain version
      
      * merge wheel
      
      * update
      
      * debug
      
      * fix
      
      * update
      
      * skip rocm
      
      * update artifacts each
      
      * fix
      
      * fix
      
      * add mac
      
      * fix cache
      
      * fix cache
      
      * fix cache
      
      * reset and add comment
      
      * upd
      
      * fix git version
      
      * update deps
      
      * trivial update
      
      * use in-tree build dir and install to src to speedup editable build
      
      * Revert "use in-tree build dir and install to src to speedup editable build"
      
      This reverts commit 6ab87b05c5eed811210136b8dca4fc3677dd51f2.
      
      * add build-dir
      
      * update docs
      
      * remove old scrips
      
      * [1/n] cleanup scripts
      
      * [Lint]: [pre-commit.ci] auto fixes [...]
      
      * fix and update
      
      * wait for tvm fix
      
      * revert some tmp fix
      
      * fix
      
      * fix
      
      * spell
      
      * doc update
      
      * test cibuildwheel
      
      * fix and test macos on ci
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ga event
      
      * cleanup
      
      * bump tvm to support api3
      
      * test final version
      
      * add cron
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ccache for metal cibuildwheel
      
      * test newer macos
      
      * finish
      
      ---------
      Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      d89ba5b8
  19. 11 Oct, 2025 1 commit
  20. 09 Oct, 2025 7 commits
    • Lei Wang's avatar
      [TileOp] Implement WGMMA for T.gemm_v2 (#813) · a13cde28
      Lei Wang authored
      * [Feature] Introduce WGMMA support and enhance GEMM layout handling
      
      - Added support for the WGMMA intrinsic in the TileLang framework, enabling efficient matrix multiplication on newer architectures.
      - Refactored GEMM layout functions to accept a boolean parameter for K dimension handling, improving flexibility in layout generation.
      - Updated layout inference logic to accommodate new WGMMA configurations and ensure compatibility with existing GEMM operations.
      - Enhanced Python bindings for layout functions, allowing for better integration and usability in user-defined operations.
      - Improved documentation for layout functions and GEMM operations to clarify usage and parameters.
      
      These changes enhance the performance and usability of GEMM operations, particularly for advanced architectures, while maintaining backward compatibility with existing implementations.
      
      * [Refactor] Clean up code formatting and enhance layout function readability
      
      - Improved code formatting across multiple files for better readability, including consistent indentation and line breaks.
      - Updated layout function signatures to enhance clarity, particularly in `gemm_layouts.cc`, `layout.cc`, and `layout.h`.
      - Refactored lambda functions in `builtin.cc` and `gemm_py.cc` for improved structure and maintainability.
      - Enhanced comments and documentation in layout-related files to clarify usage and parameters.
      
      These changes contribute to a cleaner codebase and improved maintainability of layout functions in the TileLang framework.
      
      * [Feature] Add descriptor initialization and offset manipulation for WGMMA
      
      - Introduced new TileLang builtins `initialize_descriptor` and `increase_descriptor_offset` to facilitate descriptor management for WGMMA operations.
      - Updated `builtin.cc` and `builtin.h` to define and document the new builtins, enhancing the framework's capabilities for descriptor handling.
      - Modified `codegen_cuda.cc` and `ptx.cc` to integrate the new builtins into the code generation process, ensuring proper assembly generation for WGMMA operations.
      - Enhanced the `GemmWGMMA` class to utilize the new descriptor functionalities, improving the efficiency of matrix multiplication operations.
      - Updated related tests and documentation to reflect the new features and ensure comprehensive coverage.
      
      These changes enhance the TileLang framework's support for advanced matrix operations on newer architectures, improving performance and usability.
      
      * [Refactor] Improve code formatting and readability in various files
      
      - Enhanced code formatting across multiple files for better readability, including consistent indentation and line breaks.
      - Updated function signatures and comments in `builtin.h`, `codegen_cuda.cc`, and `ptx.cc` to improve clarity.
      - Refactored descriptor initialization and offset manipulation functions in `builtin.py` and `wgmma_macro_generator.py` for improved structure.
      - Cleaned up unnecessary whitespace and improved alignment in `common.h` and `allocate.py`.
      
      These changes contribute to a cleaner and more maintainable codebase in the TileLang framework.
      
      * [Update] Update subproject commit and refactor layout function call
      
      - Updated the subproject commit for `cutlass` to indicate a dirty state.
      - Refactored the `UpdateAnalyzer` function in `layout.cc` to call `LayoutNode::getVarMap()` instead of `getVarMap()`, improving clarity and ensuring proper context for variable mapping.
      
      These changes enhance the maintainability and clarity of the layout handling in the TileLang framework.
      
      * support more data types
      
      * gemm_rs support
      
      * lint fix
      
      * wgmma wrapper
      
      * Remove debug logging for wgmma assembly code and refactor swizzle byte size calculations in wgmma macro generator. Enhanced handling of leading and stride byte offsets based on swizzle mode, improving clarity and performance in tensor core intrinsic emissions.
      
      * Refactor GEMM layout functions to replace 'kfactor' with 'k_inner' for improved clarity and consistency. Update includes necessary changes in error messages for Hopper and Sm100 layouts. Additionally, include a new header for CUTE utilities in common.h.
      
      * Comprehensively support WGMMA GEMM SS
      
      * remove debug print
      
      * lint fix
      
      * remove debug print
      
      * reduce bwd test shape
      
      * lint fix
      
      * clear cache for pytest
      
      * lint fix
      
      * Update sparse MLA examples to support SKV adjustment and correctness checks
      
      - Changed SKV parameter from 32768 to 8192 in sparse MLA backward and forward tests.
      - Added check_correctness parameter to test functions for validation of outputs.
      - Updated test cases to reflect new SKV values and correctness checks.
      
      * test fix
      
      * adjust test case
      
      * test fix
      
      * skip some test currently
      a13cde28
    • dependabot[bot]'s avatar
      [CI]: Bump actions/checkout from 2 to 5 (#953) · 10adb79f
      dependabot[bot] authored
      Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 5.
      - [Release notes](https://github.com/actions/checkout/releases)
      - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
      - [Commits](https://github.com/actions/checkout/compare/v2...v5
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: actions/checkout
        dependency-version: '5'
        dependency-type: direct:production
        update-type: version-update:semver-major
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      10adb79f
    • dependabot[bot]'s avatar
      [CI]: Bump actions/github-script from 7 to 8 (#954) · 5d881a57
      dependabot[bot] authored
      Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 8.
      - [Release notes](https://github.com/actions/github-script/releases)
      - [Commits](https://github.com/actions/github-script/compare/v7...v8
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: actions/github-script
        dependency-version: '8'
        dependency-type: direct:production
        update-type: version-update:semver-major
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      5d881a57
    • dependabot[bot]'s avatar
      [CI]: Bump astral-sh/setup-uv from 6 to 7 (#952) · b6f90d25
      dependabot[bot] authored
      Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6 to 7.
      - [Release notes](https://github.com/astral-sh/setup-uv/releases)
      - [Commits](https://github.com/astral-sh/setup-uv/compare/v6...v7
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: astral-sh/setup-uv
        dependency-version: '7'
        dependency-type: direct:production
        update-type: version-update:semver-major
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      b6f90d25
    • dependabot[bot]'s avatar
      [CI]: Bump actions/setup-python from 2 to 6 (#951) · d8fedc17
      dependabot[bot] authored
      Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 6.
      - [Release notes](https://github.com/actions/setup-python/releases)
      - [Commits](https://github.com/actions/setup-python/compare/v2...v6
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: actions/setup-python
        dependency-version: '6'
        dependency-type: direct:production
        update-type: version-update:semver-major
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      d8fedc17
    • Xuehai Pan's avatar
    • Xuehai Pan's avatar
      [CI] enable dependabot for GHA workflows (#950) · f6d4bd3a
      Xuehai Pan authored
      * chore: add .editorconfig
      
      * feat: enable dependabot for GHA workflows
      f6d4bd3a
  21. 07 Oct, 2025 1 commit
  22. 01 Oct, 2025 1 commit
  23. 26 Sep, 2025 1 commit
  24. 06 Sep, 2025 1 commit
    • Cunxiao Ni's avatar
      [CI]Adds pytest timeout to CI (#792) · bcfc8343
      Cunxiao Ni authored
      * [CI]Adds pytest timeout to CI
      
      Adds a timeout to pytest runs in CI to prevent jobs from hanging indefinitely.
      This also adds `pytest-timeout` to the test requirements.
      
      * fix lint
      bcfc8343
  25. 03 Sep, 2025 1 commit
    • Cunxiao Ni's avatar
      [CI] Adds pytest-durations for test timing (#782) · 141e01fb
      Cunxiao Ni authored
      * [Ci] Adds pytest-durations for test timing
      
      Adds `pytest-durations` to the test requirements and configures pytest to display test durations.
      
      This helps in identifying slow-running tests and optimizing the test suite for faster feedback.
      
      * add amd ci durations
      
      * Removes flash_attn installation from CI
      141e01fb
  26. 02 Sep, 2025 2 commits
    • Lei Wang's avatar
      [Cache] Introduce detailed target information for the disk kernel cache (#780) · 7ffc5b44
      Lei Wang authored
      * Fix type hint for target_host parameter in compile function to allow None value
      
      * Refactor target handling in compile function to utilize determine_target for improved clarity and consistency
      
      * Update PrintConst function in codegen_cuda.cc to use hexfloat format for bfloat16 and float8/float4 types, while adding scientific notation comments for clarity. This change enhances the representation of floating-point constants in the generated code.
      
      * Refactor PrintType function in codegen_cuda.cc to remove unnecessary failure conditions for floating-point types with lane counts greater than 4. This change simplifies the logic and improves code clarity.
      
      * Enhance benchmark_matmul.py to conditionally print Reference TFlops only if ref_latency is not None. Update param.py to ensure target is converted to string for consistency. Refactor tuner.py to utilize determine_target for improved clarity in target handling.
      
      * Remove automatic commit and push step from AMD and NVIDIA CI workflows to streamline the process and avoid unnecessary commits.
      7ffc5b44
    • Lei Wang's avatar
      [Lint] Introduce clang-tidy into format.sh (#777) · cdc5d8d3
      Lei Wang authored
      * [Refactor] Update Clang-Tidy Checks and Improve Code Consistency
      
      - Enhanced .clang-tidy configuration by adding specific checks for better bug detection and performance optimization.
      - Refactored function signatures across multiple files to use `const` references for parameters, improving performance and code clarity.
      - Updated various methods to ensure consistent handling of parameters, particularly in `AddPredicate`, `Substitute`, and `PlanLoopPartition` functions.
      - Improved readability by replacing size checks with `empty()` method calls in several locations, ensuring clearer intent in the code.
      - General code cleanup and adherence to best practices for better maintainability.
      
      * [Refactor] Enhance Code Consistency and Clang-Tidy Configuration
      
      - Updated .clang-tidy configuration to include additional checks for improved code quality and performance.
      - Refactored function signatures across multiple files to use `const` references, enhancing performance and clarity.
      - Replaced size checks with `empty()` method calls in various locations for clearer intent.
      - Improved handling of parameters in several functions, ensuring consistent usage of `std::move` where applicable.
      - General code cleanup to adhere to best practices and improve maintainability.
      
      * [Refactor] Integrate Clang-Tidy Checks and Enhance Code Consistency
      
      - Added clang-tidy checks to the format script for improved code quality assurance.
      - Refactored function signatures across multiple files to consistently use `const` references, enhancing performance and clarity.
      - Updated the requirements-lint.txt file to include clang-tidy as a dependency.
      - General code cleanup to adhere to best practices and improve maintainability.
      
      * [CI] Update AMD CI Workflow to Include Build Directory Creation
      
      - Added steps to create a build directory and configure CMake with ROCm support during the format check process.
      - Ensured cleanup of the build directory after the format check to maintain a clean workspace.
      
      * [Refactor] Remove Unused Member Variables in AtomicAddNode and CopyNode
      
      - Removed the `args_` member variable from both `AtomicAddNode` and `CopyNode` classes to streamline the code and eliminate unnecessary data members.
      - This change enhances code clarity and maintainability by focusing on relevant attributes for each class.
      
      * [Refactor] Update Clang-Tidy Integration and Code Improvements
      
      - Modified the format script to include the `-fix` option in the clang-tidy command for automatic code fixes.
      - Refactored the `AtomicAddVectorizePlanner` class to improve variable handling and consistency, including changes to member variable types and function signatures.
      - Enhanced code clarity by removing unnecessary `std::move` calls and ensuring consistent usage of types across the class.
      - General code cleanup to adhere to best practices and improve maintainability.
      
      * [Refactor] Improve Parameter Handling and Consistency in AtomicAddVectorize
      
      - Updated function signatures in `AtomicAddVectorizePlanResult` and `AtomicAddVectorizeRewriter` to use `const` references and `std::move` for better performance and clarity.
      - Enhanced the `UpdateVectorSize` method to accept `const Array<PrimExpr>&` for improved efficiency.
      - General code cleanup to maintain consistency and adhere to best practices.
      
      * [CI] Add Git Submodule Initialization to CI Workflow
      
      - Included a step to initialize and update git submodules recursively in the CI workflow.
      - This change ensures that all necessary submodules are available during the format check process, improving build reliability.
      
      * [CI] Add Git Submodule Update Step to Format Check
      
      - Included a command to initialize and update git submodules recursively in the CI workflow during the format check process.
      - This enhancement ensures that all required submodules are available, contributing to improved build reliability.
      
      * [Refactor] Update Function Signatures in AtomicAddVectorize
      
      - Modified the `VectorizeAtomicAdd` function signature to use `const` references for `thread_var` and `thread_bounds`, enhancing performance and code clarity.
      - This change aligns with previous refactoring efforts to improve parameter handling and consistency across the codebase.
      cdc5d8d3
  27. 23 Aug, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Merge ThreadPartialSync and ThreadStorageSync (#741) · 6b125028
      Lei Wang authored
      * Remove `thread_partial_sync.cc` and refactor `thread_storage_sync.cc` to streamline synchronization handling. Introduce `thread_sync_types.h` for thread-bound key definitions and reserved named barriers. Update related logic in `ThreadSyncInserter` and `TileLangThreadSync` for improved clarity and efficiency.
      
      * Remove `sync_thread_partial` references and related documentation from the codebase. Update CUDA and HIP code generation files to eliminate calls to the removed function. Refactor `__sync_thread_partial` to `sync_thread_partial` in CUDA common header for consistency.
      
      * Remove unused import of `bulk_copy.h` in `codegen_hip.cc` to enhance code clarity and maintainability.
      
      * Add import of `bulk_copy.h` in `codegen_hip.cc` to support new functionality.
      
      * typo fix
      
      * Update data type in reduce_sum tests from float16 to float32 for consistency and clarity. Remove redundant dtype tests and streamline run functions. Enhance reshape kernel compilation with pass configurations to address shared memory layout issues.
      
      * lint fix
      
      * test fix
      
      * Enhance CI configuration by adding verbose output to pip install command for better visibility during installation.
      
      * use ninja instead of make
      
      * Add CMake configuration step for Ninja build system in setup.py
      
      * Update pyproject.toml to include additional build dependencies: build, torch, tox, auditwheel, patchelf, and ninja.
      
      * Enhance CI configuration by adding verbose output to pytest commands for improved test visibility.
      
      * Update pyproject.toml to add Cython as a build dependency. Enhance thread storage synchronization in thread_storage_sync.cc by introducing new thread variable handling and improving index disjointness checks.
      
      * Update data type in cumulative sum tests from float16 to float32 for consistency. Modify run_cumsum function to utilize the updated dtype and enhance result validation with assertions. Adjust test cases accordingly.
      
      * Refactor storage access handling by introducing buffer data mapping in TileLangStorageAccessVisitor. Enhance access entry structure to include pointer access flag. Update thread storage synchronization to accommodate new buffer data mappings. Adjust quickstart example to print kernel source for debugging purposes.
      
      * Refactor linear index conversion in TileLangStorageAccessVisitor to utilize the analyzer for simplification. Update buffer index calculations to ensure consistent simplification of range expressions.
      
      * bugfix
      
      * Refactor buffer index calculation in TileLangStorageAccessVisitor to simplify access handling. Removed unused buffer mapping logic, ensuring consistent buffer index generation with a default ramp.
      
      * Refactor TileLangStorageAccessVisitor to replace buffer indices with buffer ranges for improved pointer access handling. Update AccessEntry structure to include buffer_ranges and adjust thread storage synchronization logic to account for pointer access conflicts.
      
      * Refactor thread storage synchronization to replace 'shared.dyn' with 'shared' for consistency in memory allocation. Update related test cases to reflect this change and ensure proper functionality.
      6b125028
  28. 18 Aug, 2025 1 commit
    • alex_xiao's avatar
      [CI] Fix AMD CI (#729) · a86223f4
      alex_xiao authored
      
      
      * [Enhancement] Refactor buffer index handling for improved precision and clarity (#668)
      
      - Enhanced buffer index handling to address precision issues by removing redundant operations.
      - Streamlined the logic for determining buffer overlaps, ensuring more accurate conflict detection.
      - Updated related documentation to reflect changes in buffer management practices.
      
      * Remove obsolete test script for AMD example, streamlining the examples directory.
      
      * Remove unused dtype_size variable in AMD example script to streamline code.
      
      * Add input configuration file and update AMD example script for enhanced flexibility
      
      - Introduced a new input.txt file for configurable parameters.
      - Modified the example_amd_flash_attn_fwd.py script to allow for a wider range of configurations, including additional options for num_stages, enable_rasterization, and k_pack.
      - Streamlined the main function for better clarity and organization.
      - Added a new test script to facilitate running the example with specified parameters.
      
      * Remove input configuration file and obsolete test script; enhance AMD example with swizzle layout annotations
      
      - Deleted input.txt and test.sh files as they are no longer needed.
      - Updated example_amd_flash_attn_fwd.py to include swizzle layout annotations for shared memory, improving bank conflict avoidance.
      - Reintroduced swizzle usage in the kernel for better performance.
      
      * Refactor AMD example script for FlashAttention-2
      
      - Updated function names for clarity, changing `get_v2_configs` to `get_configs` and `fast_flashattn_v2` to `fast_flashattn`.
      - Streamlined the main function by renaming `main_v2` to `main` and adjusting the corresponding calls.
      - Removed outdated comments and improved code organization for better readability.
      
      * Refactor formatting in AMD FlashAttention example script
      
      - Improved code readability by adjusting line breaks and indentation in the `fast_flashattn` function.
      - Streamlined the `main` function parameter formatting for consistency.
      - Removed unnecessary blank lines to enhance overall code organization.
      
      * Update example_amd_flash_attn_fwd.py
      
      * Enhance AMD example script and update CI workflows
      
      - Improved the `example_amd_flash_attn_fwd.py` script for better clarity and organization.
      - Added new CI workflows for AMD and documentation publishing.
      - Updated various requirements files to include necessary dependencies.
      - Introduced new test cases and examples for better coverage and functionality.
      - Refactored existing code for improved readability and maintainability.
      
      * Remove redundant tool cache cleanup step in AMD CI workflow
      
      * Remove `torch` dependency from `requirements-rocm.txt` to streamline requirements.
      
      ---------
      Co-authored-by: default avatarxinxyxiao <xinyxiao@amd.com>
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      a86223f4