"git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "5c0e4b60a7518de44599c16347ce79c9895cb770"
  1. 13 Dec, 2025 1 commit
    • Lei Wang's avatar
      [CUDA] Add read-only parameter annotation for CUDA codegen (#1416) · 00dd7388
      Lei Wang authored
      * [Enhancement] Add read-only parameter annotation for CUDA codegen
      
      * Introduced the `AnnotateReadOnlyParams` transformation to annotate read-only handle parameters in PrimFuncs, enabling the generation of `const` qualifiers in CUDA codegen.
      * Updated `PrintFunctionSignature` and `AddFunction` methods to utilize the new attribute `tl.readonly_param_indices`, enhancing performance by allowing read-only cache loads.
      * Modified the optimization pipeline to include the new annotation step, improving the overall efficiency of the code generation process.
      
      * lint fix
      
      * [Dependency] Update apache-tvm-ffi version to >=0.1.3
      
      * Updated the version of apache-tvm-ffi in pyproject.toml, requirements.txt, and requirements-dev.txt to ensure compatibility with the latest features and fixes.
      * Made adjustments in CUDA and HIP template files to use `const` qualifiers for global pointer parameters, enhancing code safety and clarity.
      
      * lint fix
      
      * [Enhancement] Refactor ReadWriteMarker for improved parameter handling
      
      * Updated the ReadWriteMarker class to accept a set of parameter or data variables, enhancing its ability to track written variables.
      * Introduced a new method, ResolveDataVarFromPtrArg, to resolve underlying buffer data from pointer-like arguments, improving accuracy in identifying written variables.
      * Modified the MarkReadOnlyParams function to gather handle parameters and their corresponding buffer data variables, streamlining the process of determining read-only parameters.
      * Enhanced the logic for identifying written variables to account for aliased data variables, ensuring comprehensive tracking of modifications.
      
      * lint fix
      
      * Update tma_load function to use const qualifier for global memory pointer
      
      * Changed the parameter type of gmem_ptr in the tma_load function from void* to void const* to enhance type safety and clarity in memory operations.
      * This modification ensures that the function correctly handles read-only global memory pointers, aligning with best practices in CUDA programming.
      
      * Remove commented-out code and reorder transformations in OptimizeForTarget function for clarity
      
      * Refactor buffer marking logic in annotate_read_only_params.cc to improve accuracy in identifying written variables. Update OptimizeForTarget function to reorder transformations for better clarity.
      00dd7388
  2. 11 Dec, 2025 1 commit
    • Lei Wang's avatar
      [Dependency] Update apache-tvm-ffi version to >=0.1.2 (#1400) · 0eb33f28
      Lei Wang authored
      * [Dependency] Update apache-tvm-ffi version to >=0.1.2 in project files
      
      * [Dependency] Update subproject commit for TVM to latest version afc07935
      
      * [Enhancement] Add support for optional step parameter in loop constructs
      
      - Updated loop creation functions to accept an optional step parameter, enhancing flexibility in loop definitions.
      - Modified ForFrame implementations to utilize the new step parameter across various loop types including serial, parallel, and pipelined loops.
      - Adjusted related vectorization transformations to accommodate the step parameter, ensuring consistent behavior in loop vectorization processes.
      
      * lint fix
      0eb33f28
  3. 05 Nov, 2025 1 commit
    • Lei Wang's avatar
      [SM70] Refactor and minor fix for SM70 (#1195) · 4a9cb470
      Lei Wang authored
      * [Feature] Add support for SM70 tensor core MMA instructions
      
      - Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation.
      - Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture.
      - Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic.
      - Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency.
      - Added correctness evaluation script for testing GEMM operations on SM70.
      
      * [Refactor] Update formatting and installation commands in scripts
      
      - Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations.
      - Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters.
      - Cleaned up spacing and formatting in various C++ source files for better consistency and readability.
      - Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity.
      - Ensured consistent formatting in layout initialization and swizzle functions.
      
      * typo fix
      4a9cb470
  4. 15 Oct, 2025 1 commit
    • Xuehai Pan's avatar
      [CI][Refactor] Merge test CI workflow files into one (#973) · 8ce27782
      Xuehai Pan authored
      * refactor: merge test CI workflow files into one
      
      * chore: set `UV_INDEX_STRATEGY=unsafe-best-match`
      
      * feat: add AST test with Python 3.8
      
      * feat: implement manual caching mechanism for self-hosted runners
      
      * refactor: simplify cache logic for self-hosted runners
      
      * chore: clear uv cache on failure
      
      * chore: print format.sh output to logs
      
      * chore: improve uv caching
      
      * chore: disable parallel test
      
      * chore: use `PYTHONDEVMODE=1` in CI
      
      * feat: enable coredump generation
      
      * fix: fix perfbench condition
      
      * Revert "feat: enable coredump generation"
      
      This reverts commit c52da65cb572932e09905d08c43a39ec3cf47c54.
      
      * chore: move example CI down
      
      * Revert "chore: move example CI down"
      
      This reverts commit 9d8e65055e01d955c5268a9a6705d270c2de0d57.
      
      * chore: skip example `test_example_mha_sink_bwd_bhsd`
      
      * chore: skip example `test_example_gqa_sink_bwd_bhsd`
      
      * fix: fix example argument passing
      
      * fix: loosen test criteria
      
      * chore: rename `CMAKE_CONFIG...
      8ce27782
  5. 14 Oct, 2025 1 commit
  6. 13 Oct, 2025 1 commit
    • Yichen Yan's avatar
      [Build] Migrate to scikit-build-core (#939) · d89ba5b8
      Yichen Yan authored
      
      
      * cleanup
      
      * init
      
      * build first wheel that may not work
      
      * build cython ext
      
      * fix tvm build
      
      * use sabi
      
      * update rpath to support auditwheel
      
      * pass editible build
      
      * update ci
      
      * fix warnings
      
      * do not use ccache in self host runner
      
      * test local uv cache
      
      * test pip index
      
      * update lib search to respect new lib location
      
      * fix
      
      * update ci
      
      * enable cuda by default
      
      * update src map
      
      * fix
      
      * fix
      
      * fix
      
      * Generate version with backend and git information at build time
      
      * copy tvm_cython to wheels
      
      * fix tvm lib search
      
      * fmt
      
      * remove unused
      
      * auto detect ccache
      
      * add back backend-related files
      
      * remove jit cython adaptor to simplify code
      
      * fmt
      
      * fix ci
      
      * ci fix 2
      
      * ci fix 3
      
      * workaround metal
      
      * ci fix 4
      
      * fmt
      
      * fmt
      
      * Revert "ci fix 4"
      
      This reverts commit d1de8291c3e40927955f3ad3cf87a75c78813676.
      
      * tmp
      
      * fix metal
      
      * trivial cleanup
      
      * add detailed build-time version for cuda
      
      * add back mlc
      
      * Restore wheel info and other trivial updates
      
      * update
      
      * fix cuda
      
      * upd
      
      * fix metal ci
      
      * test for ga build
      
      * test for nvidia/cuda
      
      * test ubuntu 20
      
      * fix
      
      * fix
      
      * Do not use `uv build`
      
      * fix
      
      * fix
      
      * log toolchain version
      
      * merge wheel
      
      * update
      
      * debug
      
      * fix
      
      * update
      
      * skip rocm
      
      * update artifacts each
      
      * fix
      
      * fix
      
      * add mac
      
      * fix cache
      
      * fix cache
      
      * fix cache
      
      * reset and add comment
      
      * upd
      
      * fix git version
      
      * update deps
      
      * trivial update
      
      * use in-tree build dir and install to src to speedup editable build
      
      * Revert "use in-tree build dir and install to src to speedup editable build"
      
      This reverts commit 6ab87b05c5eed811210136b8dca4fc3677dd51f2.
      
      * add build-dir
      
      * update docs
      
      * remove old scrips
      
      * [1/n] cleanup scripts
      
      * [Lint]: [pre-commit.ci] auto fixes [...]
      
      * fix and update
      
      * wait for tvm fix
      
      * revert some tmp fix
      
      * fix
      
      * fix
      
      * spell
      
      * doc update
      
      * test cibuildwheel
      
      * fix and test macos on ci
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ga event
      
      * cleanup
      
      * bump tvm to support api3
      
      * test final version
      
      * add cron
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ccache for metal cibuildwheel
      
      * test newer macos
      
      * finish
      
      ---------
      Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      d89ba5b8
  7. 08 Aug, 2025 1 commit
    • Lei Wang's avatar
      [Layout] Introduce a new layout inference mechanism (#699) · 407117e1
      Lei Wang authored
      
      
      * Implement new free stage layout inference.
      
      * Fix bug
      
      * Make replication upcasting and unnormalizable iterators safe.
      
      * Better handling of updating with more replica
      
      * Remove unnecessary check.
      
      * Fix compilation.
      
      * Fix setup.py.
      
      * Simplify development mode.
      
      * Allow ParallelOp layout when there's already a compatible layout specified
      
      * lint fix
      
      * Add ProveFragmentContains function to validate thread access between small and large fragments
      
      This function checks if the threads accessing elements of a smaller fragment are a subset of those accessing a larger fragment, ensuring valid access during updates. The implementation includes deriving thread indices, computing logical indices, and verifying thread mappings.
      
      * Update dependencies in requirements files
      
      * Remove 'thefuzz' from requirements-dev.txt
      * Specify exact versions for 'torch' and add 'flash_attn' in requirements-test.txt
      
      * Update CI workflow to use SHA256 hash for requirements file
      
      * Update requirements and CI workflow for flash attention
      
      * Removed specific version for 'torch' in requirements-test.txt
      * Added installation of 'flash_attn==2.5.8' in CI workflow to ensure compatibility
      
      * Refactor flash attention import handling in examples
      
      * Removed availability checks for 'flash_attn' in multiple example scripts.
      * Simplified import statements for 'flash_attn' to ensure consistent usage across examples.
      
      ---------
      Co-authored-by: default avatarHuanqi Cao <caohuanqi@deepseek.com>
      407117e1
  8. 19 Apr, 2025 1 commit
  9. 22 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Wheel] Provide a bare docker scripts to help build wheels for manylinux (#105) · b4bd2a56
      Lei Wang authored
      * [Build] Improve build configuration and package distribution support
      
      - Add `build` to requirements-build.txt for package building
      - Update MANIFEST.in to include Cython wrapper source file
      - Enhance setup.py to improve Cython file copying logic
      - Update build scripts to support multi-Python version distribution
      
      * [Build] Improve Cython file handling in setup.py and MANIFEST.in
      
      - Remove Cython wrapper from MANIFEST.in
      - Enhance setup.py to create target directory if it doesn't exist when copying Cython files
      - Improve file copying logic for Cython source files during build process
      
      * [Build] Remove Cython file copying logic in setup.py
      
      - Comment out Cython file copying code in TileLangBuilPydCommand
      - Simplify setup.py build process by removing redundant Cython file handling
      
      * [Build] Enhance Docker distribution scripts for multi-Python version support
      
      - Refactor local and PyPI distribution Docker scripts
      - Replace hardcoded Python installation with Miniconda-based multi-version Python environment
      - Improve Docker image setup with dynamic Python version creation
      - Simplify build process by using Miniconda for Python environment management
      
      * [Build] Separate lint requirements into a dedicated file
      
      - Create new requirements-lint.txt for formatting and linting tools
      - Update format.sh to use requirements-lint.txt instead of requirements-dev.txt
      - Update requirements-dev.txt and requirements-test.txt to reference requirements-lint.txt
      - Improve dependency management by isolating lint-specific requirements
      
      * [Build] Restore Cython file copying logic in setup.py
      
      - Re-add Cython file copying mechanism in TileLangBuilPydCommand
      - Implement robust file search across multiple potential directories
      - Add warning for cases where Cython source files cannot be found
      - Improve build process reliability for Cython source files
      
      * [Build] Refactor Cython file copying logic in setup.py
      
      - Simplify Cython file copying mechanism in TileLangBuilPydCommand
      - Improve directory creation and file copying for Cython source files
      - Relocate potential directories list to a more logical position
      - Enhance robustness of file and directory handling during build process
      
      * [Build] Refine Cython file copying logic in setup.py
      
      - Improve file existence check when copying Cython source files
      - Use os.path.join to construct full path for more robust file checking
      - Enhance file copying mechanism in TileLangBuilPydCommand
      b4bd2a56
  10. 10 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Dev] Remove unnecessary python dependencies (#69) · 2411fa28
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      
      * Add legend to plot_layout for improved clarity of thread and local IDs
      
      * Remove unnecessary dependencies from requirements files for cleaner setup
      
      * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
      
      * Add build requirements and update installation scripts for improved setup
      2411fa28
  11. 25 Jan, 2025 1 commit
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang kernel FlashAttention (#54) · bedab1a0
      Yu Cheng authored
      * [Dev] Add FlashDecoding example
      
      * [CI][Test] Add test cases for tilelang kernel convolution
      
      * [CI][Test] Add test cases for tilelang kernel FlashAttention
      
      * Reduce the number of stages to ensure the shared memory allocation is valid
      
      * Temporarily remove the dim128 case
      
      * lint
      
      * update einops in requirements-dev.txt
      
      * update einops in requirements-test.txt
      
      * remove einops in requirements-dev.txt
      bedab1a0
  12. 11 Jan, 2025 1 commit
    • Lei Wang's avatar
      [Initialization] Migration of Codebase from Dev Branch into Main (#10) · 57ab687c
      Lei Wang authored
      
      
      * Add format.sh script for code formatting and linting
      
      * docs update
      
      * center align the title
      
      * lint fix
      
      * add ignore
      
      * Add .gitignore for 3rdparty directory
      
      * Add requirements-dev.txt, requirements-test.txt, and requirements.txt
      
      * 3rdparty
      
      * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h
      
      * Refactor CMakeLists.txt and include statements
      
      - Update CMakeLists.txt to use a newer version of CMake and add project name
      - Remove unnecessary include directories
      
      Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc
      
      - Update include paths to use relative paths instead of absolute paths
      
      * Update submodule for 3rdparty/tvm
      
      * update
      
      * load dll first
      
      * Refactor CMakeLists.txt and include statements
      
      * Refactor CMakeLists.txt and include statements
      
      * git keep update
      
      * Refactor CMakeLists.txt and include statements
      
      * Refactor CMakeLists.txt and include statements
      
      * refactor code structure
      
      * Update Readme
      
      * CMakeLists Customized
      
      * update readme
      
      * update README
      
      * update readme
      
      * update usage
      
      * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import
      
      * annotate lower transform global func with `transform` prefix
      
      * Migrate Simplify Pass from tilelang tvm branch
      
      * enhance system environment handling with __init__ and CMake
      
      * Initial commit
      
      * CODE_OF_CONDUCT.md committed
      
      * LICENSE committed
      
      * README.md committed
      
      * SECURITY.md committed
      
      * SUPPORT.md committed
      
      * CODE_OF_CONDUCT Commit
      
      * LICENSE Commit
      
      * SECURITY Commit
      
      * SUPPORT Commit
      
      * Modify Support
      
      * Update README.md
      
      * security ci update
      
      * remove examples
      
      * Update and implement clang-format
      
      * add composable kernel components
      
      * Migrate from latest update
      
      * submodule update
      
      * Test update
      
      * Update License
      
      * Spell check
      
      * lint fix
      
      * add clang-tidy to apply static analysis for c source
      
      * update tilelang examples
      
      * Update Install Docs
      
      * Refactor filetree
      
      * Enhance Install
      
      * conflict resloved
      
      * annotate_version
      
      * Initial Update
      
      * test fix
      
      * install
      
      * Implement setup.py
      
      * lint fix
      
      * Separate Init
      
      * Separate test
      
      * docker file commit
      
      * add logo
      
      * Update Readme and Examples
      
      * update readme
      
      * update logo
      
      * Implement AMD Installation
      
      * Add License
      
      * Update AMD MI300x Benchmark
      
      * update README
      
      * update mi300 benchmark scripts
      
      * update ignore
      
      * enhance build scirpt
      
      * update image
      
      * enhance setup.py to remove duplicated libraries
      
      * remove debug files
      
      * update readme
      
      * update image
      
      * update gemm examples
      
      * update flashattention README
      
      * readme update
      
      * add cmake into requirements
      
      * libinfo fix
      
      * auto update submodule
      
      * lint fix
      
      * Fix AMD Build and Test
      
      * Update check for transpose attribute for CDNA Arch
      
      * typo fix for amd
      
      * Implement Matmul Benchmark
      
      * Refactor Code
      
      * [TypoFix] Fix GEMM Example
      
      * [Docs] Init Linear Attention README
      
      * [TYPO] Typo fix
      
      * [Lint] Lint Fix
      
      * enhance example with intrinsics
      
      * [Enhancement] Improve Buffer Collection during IR Parser
      
      * [Dev] Introduce Current classmethod to get current frame
      
      * submodule update
      
      * fake test pass update
      
      * support thread_extent_api
      
      * code optimize
      
      * Add GEMM function implementation for matrix multiplication
      
      * Update logging format to reflect TileLang in logger messages
      
      * Refactor CMakeLists.txt for improved readability and set default build type to Release
      
      * Support Gemm SS Primitives Implementation
      
      * [README] Upload Tile Language Logo (#5)
      
      * update logo
      
      * Update README.md to enhance formatting and center the title
      
      ---------
      Co-authored-by: default avatarmicrosoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com>
      Co-authored-by: default avatarMicrosoft Open Source <microsoftopensource@users.noreply.github.com>
      Co-authored-by: default avatarYu Cheng <yu.cheng@pku.edu.cn>
      57ab687c