Commits · 7cd6b3cd7410c55a599bef810598f1c442255289 · OpenDAS / tilelang

19 Feb, 2025 2 commits

[Bugfix] Put `InjectPtxAsyncCopy` Pass behind `ThreadSync` Pass (#97) · 7cd6b3cd

Lei Wang authored Feb 20, 2025

* bump version into v0.1.0

* [Enhancement] Add custom develop command for editable installs and update .gitignore

* [Documentation] Update README to include system dependencies installation instructions

* [Build] Update setup.py to support library file copying for both release and develop modes

* [Build] Refactor library file copying logic in setup.py

* [Documentation] Remove unnecessary install section header in Installation.md

* [Build] Add tox configuration and local distribution script for multi-Python version support

* [Build] Improve git submodule update function with better error handling

* [Build] Update LLVM configuration path in ROCm installation script

* [Build] Add .tox/ to .gitignore for tox testing environment

* [Build] Add support for TVM prebuild path configuration in CMakeLists.txt

* [Cleanup] Remove unused TVM runtime error codes header

* [Cleanup] Fix TVM grid constant type reference in CUDA module

* [Cleanup] Remove unused customized_code function from IR module

* [Feature] Add TileLang thread synchronization and storage access analysis passes

* [Build] Reorder DLL search path directories for more flexible library loading

* [Refactor] Improve thread synchronization and library path handling

- Rename ThreadSync and TileLangThreadSync functions in C++ code
- Update Python docstring for ThreadSync with more detailed description
- Reorder library path detection in tilelang environment setup
- Minor comment and code cleanup in CUDA and warp specialization modules

* [Refactor] Improve thread synchronization code style and formatting

- Standardize pointer type spacing in storage_access.h and storage_access.cc
- Update whitespace and indentation in thread_storage_sync.cc
- Reorder include statements in thread_partial_sync.cc
- Minor code formatting improvements across thread synchronization files

* [Refactor] Fix global function registration for ThreadSync

- Correct global function registration to use ThreadSync instead of TileLangThreadSync
- Update TVM global registration to match recent refactoring efforts

* [Refactor] Simplify ThreadSync global function registration

- Remove unnecessary whitespace in global function registration
- Compact the TVM global registration line for ThreadSync

* [Feature] Add WebGPU code generation support in TileLang

- Implement WebGPU code generator (codegen_webgpu.cc and codegen_webgpu.h)
- Add WebGPU target support in lower.py and target.py
- Update CMakeLists.txt to include WebGPU codegen source files
- Introduce WebGPU-specific code generation for WGSL shader language

* [Refactor] Improve WebGPU code generation formatting and readability

- Enhance code formatting in codegen_webgpu.cc and codegen_webgpu.h
- Standardize pointer type spacing and indentation
- Improve line breaks and reduce line length for better readability
- Minor code style improvements in WebGPU code generation

* [Test] Add WebGPU matrix multiplication code generation test

- Implement test_webgpu_codegen.py for WebGPU matrix multiplication
- Add assert_gemm_codegen function to validate WebGPU code generation
- Include basic matrix multiplication kernel test case

* Update README with WebGPU codegen support announcement

* Support multi version pypi package build via tox

* Add support for CPU device backend with C code generation

- Introduce `is_cpu_device_backend` function to detect CPU backend with C code generation
- Modify `lower` function to handle special case of CPU device backend
- Update host and device call filtering for CPU backend
- Add conditional source code generation for C host target
- Extend JITKernel to support optional target_host parameter

* lint fix

* Enhance JIT kernel adapters with CTypes and Torch C++ backends

- Add CtypesKernelAdapter with dynamic library generation and kernel wrapping
- Implement TorchCPPKernelAdapter for CUDA kernel compilation
- Refactor BaseKernelAdapter to support more flexible initialization
- Improve error handling and argument processing in kernel adapters
- Update adapter initialization to support various execution backends

* Refactor and clean up code style in JIT CTypes adapter modules

- Apply consistent code formatting and whitespace in CTypes adapter files
- Remove unused imports and improve import organization
- Enhance readability of code in adapter, libgen, and wrapper modules
- Add missing whitespace and improve line breaks
- Minor linting and code style improvements across CTypes adapter files

* Add test for TileLang JIT GEMM with CTypes backend

- Implement comprehensive test for matrix multiplication using CTypes execution backend
- Create test functions for GEMM with float16 data type
- Add kernel source verification with custom callback
- Implement reference implementation using PyTorch for result validation
- Support various matrix multiplication configurations (transposition, block sizes)

* test fix

* Update TileLang JIT callback registration with override parameter

- Modify tilelang_callback_cuda_postproc to use @tvm.register_func(override=True)
- Ensure proper function registration with ability to replace existing implementations

* Reorder TileLang lowering passes for Hopper intrinsics and PTX async copy

- Adjust the order of LowerHopperIntrin and InjectPTXAsyncCopy passes
- Move these passes to ensure correct synchronization and device preparation

* Rebase main

* shared.dyn

* lint fix

* test fix

* Add environment variable handling for TileLang template and CUTLASS paths

- Introduce fallback logic for TL_TEMPLATE_PATH environment variable
- Add support for optional TL_CUTLASS_PATH configuration
- Include TODO comment for future environment variable renaming

7cd6b3cd

[Wrap] Use a ctypes-based kernel wrapper instead of dlpack for runtime efficiency (#95) · 2ac51a03

Lei Wang authored Feb 20, 2025

* bump version into v0.1.0

* [Enhancement] Add custom develop command for editable installs and update .gitignore

* [Documentation] Update README to include system dependencies installation instructions

* [Build] Update setup.py to support library file copying for both release and develop modes

* [Build] Refactor library file copying logic in setup.py

* [Documentation] Remove unnecessary install section header in Installation.md

* [Build] Add tox configuration and local distribution script for multi-Python version support

* [Build] Improve git submodule update function with better error handling

* [Build] Update LLVM configuration path in ROCm installation script

* [Build] Add .tox/ to .gitignore for tox testing environment

* [Build] Add support for TVM prebuild path configuration in CMakeLists.txt

* [Cleanup] Remove unused TVM runtime error codes header

* [Cleanup] Fix TVM grid constant type reference in CUDA module

* [Cleanup] Remove unused customized_code function from IR module

* [Feature] Add TileLang thread synchronization and storage access analysis passes

* [Build] Reorder DLL search path directories for more flexible library loading

* [Refactor] Improve thread synchronization and library path handling

- Rename ThreadSync and TileLangThreadSync functions in C++ code
- Update Python docstring for ThreadSync with more detailed description
- Reorder library path detection in tilelang environment setup
- Minor comment and code cleanup in CUDA and warp specialization modules

* [Refactor] Improve thread synchronization code style and formatting

- Standardize pointer type spacing in storage_access.h and storage_access.cc
- Update whitespace and indentation in thread_storage_sync.cc
- Reorder include statements in thread_partial_sync.cc
- Minor code formatting improvements across thread synchronization files

* [Refactor] Fix global function registration for ThreadSync

- Correct global function registration to use ThreadSync instead of TileLangThreadSync
- Update TVM global registration to match recent refactoring efforts

* [Refactor] Simplify ThreadSync global function registration

- Remove unnecessary whitespace in global function registration
- Compact the TVM global registration line for ThreadSync

* [Feature] Add WebGPU code generation support in TileLang

- Implement WebGPU code generator (codegen_webgpu.cc and codegen_webgpu.h)
- Add WebGPU target support in lower.py and target.py
- Update CMakeLists.txt to include WebGPU codegen source files
- Introduce WebGPU-specific code generation for WGSL shader language

* [Refactor] Improve WebGPU code generation formatting and readability

- Enhance code formatting in codegen_webgpu.cc and codegen_webgpu.h
- Standardize pointer type spacing and indentation
- Improve line breaks and reduce line length for better readability
- Minor code style improvements in WebGPU code generation

* [Test] Add WebGPU matrix multiplication code generation test

- Implement test_webgpu_codegen.py for WebGPU matrix multiplication
- Add assert_gemm_codegen function to validate WebGPU code generation
- Include basic matrix multiplication kernel test case

* Update README with WebGPU codegen support announcement

* Support multi version pypi package build via tox

* Add support for CPU device backend with C code generation

- Introduce `is_cpu_device_backend` function to detect CPU backend with C code generation
- Modify `lower` function to handle special case of CPU device backend
- Update host and device call filtering for CPU backend
- Add conditional source code generation for C host target
- Extend JITKernel to support optional target_host parameter

* lint fix

* Enhance JIT kernel adapters with CTypes and Torch C++ backends

- Add CtypesKernelAdapter with dynamic library generation and kernel wrapping
- Implement TorchCPPKernelAdapter for CUDA kernel compilation
- Refactor BaseKernelAdapter to support more flexible initialization
- Improve error handling and argument processing in kernel adapters
- Update adapter initialization to support various execution backends

* Refactor and clean up code style in JIT CTypes adapter modules

- Apply consistent code formatting and whitespace in CTypes adapter files
- Remove unused imports and improve import organization
- Enhance readability of code in adapter, libgen, and wrapper modules
- Add missing whitespace and improve line breaks
- Minor linting and code style improvements across CTypes adapter files

* Add test for TileLang JIT GEMM with CTypes backend

- Implement comprehensive test for matrix multiplication using CTypes execution backend
- Create test functions for GEMM with float16 data type
- Add kernel source verification with custom callback
- Implement reference implementation using PyTorch for result validation
- Support various matrix multiplication configurations (transposition, block sizes)

* test fix

* Update TileLang JIT callback registration with override parameter

- Modify tilelang_callback_cuda_postproc to use @tvm.register_func(override=True)
- Ensure proper function registration with ability to replace existing implementations

2ac51a03

06 Feb, 2025 1 commit

[Dev] Support FP8 Codegen for cuda backend (#64) · 61de5288

Lei Wang authored Feb 06, 2025

* [Enhancement] Add VectorizeLoop function and update imports for compatibility

* [CI][Test] Improve test cases for vectorization and fix typos in parser comments

* lint fix

* Fix incorrect module reference for VectorizeLoop transformation

* Refactor vectorize_loop transformation by removing unused extent mutation logic

* [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen

* Fix formatting in CUDA FP8 header file for consistency

* Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity

* Update submodule 'tvm' to latest commit for improved functionality

* Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.

* Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.

* Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency

* Add CUDA requirements to FP8 test cases and update references for clarity

* Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py

* Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py

* Add CUDA requirements and FP8 test cases for matmul and gemv simulations

* Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py

* Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py

61de5288

26 Jan, 2025 1 commit

[Doc] Addd debug relevant testing and documentations (#58) · 5e259239

Lei Wang authored Jan 26, 2025

* implement jit test case

* [Dev] implement auto tune test case for matrix multiplication

* Implement test for legalize memory access and vectorized loop

* lint fix

* introduce run_once

* Refactor callback function names for consistency and improve code readability

* enhance documentations

* lint fix

* lint fix

* lint fix

* lint fix

* fix formatting issues in rt_mod_hip.cc

* add random seed initialization for deterministic testing

5e259239

25 Jan, 2025 1 commit

[Dev] Implement test case for tilelang transformations (#53) · 38ba083b

Lei Wang authored Jan 25, 2025

* implement jit test case

* [Dev] implement auto tune test case for matrix multiplication

* Implement test for legalize memory access and vectorized loop

* lint fix

38ba083b

20 Jan, 2025 1 commit

[Dev][jit] Introduce jit for kernel functions (#12) · 39fc5a6d

Lei Wang authored Jan 20, 2025

* instruction update

* replace link with TileLang/tile-lang

* [Dev][Adapter] Implement Torch DLPack Kernel Adapter and related utilities

* lint fix

* Implement JIT Compiler Components

* Documents update

* lint fix

* update logo

* install script fix

39fc5a6d

12 Jan, 2025 2 commits
- [CI] Implement basic test cases and ci support (#16) · 6e051e01
  Lei Wang authored Jan 13, 2025
```
* README.md fixed

* test fix
```
  6e051e01
- test fix · 8bf752ae
  LeiWang1999 authored Jan 12, 2025
  
  8bf752ae
11 Jan, 2025 1 commit

[Initialization] Migration of Codebase from Dev Branch into Main (#10) · 57ab687c

Lei Wang authored Jan 11, 2025



* Add format.sh script for code formatting and linting

* docs update

* center align the title

* lint fix

* add ignore

* Add .gitignore for 3rdparty directory

* Add requirements-dev.txt, requirements-test.txt, and requirements.txt

* 3rdparty

* Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h

* Refactor CMakeLists.txt and include statements

- Update CMakeLists.txt to use a newer version of CMake and add project name
- Remove unnecessary include directories

Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc

- Update include paths to use relative paths instead of absolute paths

* Update submodule for 3rdparty/tvm

* update

* load dll first

* Refactor CMakeLists.txt and include statements

* Refactor CMakeLists.txt and include statements

* git keep update

* Refactor CMakeLists.txt and include statements

* Refactor CMakeLists.txt and include statements

* refactor code structure

* Update Readme

* CMakeLists Customized

* update readme

* update README

* update readme

* update usage

* with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import

* annotate lower transform global func with `transform` prefix

* Migrate Simplify Pass from tilelang tvm branch

* enhance system environment handling with __init__ and CMake

* Initial commit

* CODE_OF_CONDUCT.md committed

* LICENSE committed

* README.md committed

* SECURITY.md committed

* SUPPORT.md committed

* CODE_OF_CONDUCT Commit

* LICENSE Commit

* SECURITY Commit

* SUPPORT Commit

* Modify Support

* Update README.md

* security ci update

* remove examples

* Update and implement clang-format

* add composable kernel components

* Migrate from latest update

* submodule update

* Test update

* Update License

* Spell check

* lint fix

* add clang-tidy to apply static analysis for c source

* update tilelang examples

* Update Install Docs

* Refactor filetree

* Enhance Install

* conflict resloved

* annotate_version

* Initial Update

* test fix

* install

* Implement setup.py

* lint fix

* Separate Init

* Separate test

* docker file commit

* add logo

* Update Readme and Examples

* update readme

* update logo

* Implement AMD Installation

* Add License

* Update AMD MI300x Benchmark

* update README

* update mi300 benchmark scripts

* update ignore

* enhance build scirpt

* update image

* enhance setup.py to remove duplicated libraries

* remove debug files

* update readme

* update image

* update gemm examples

* update flashattention README

* readme update

* add cmake into requirements

* libinfo fix

* auto update submodule

* lint fix

* Fix AMD Build and Test

* Update check for transpose attribute for CDNA Arch

* typo fix for amd

* Implement Matmul Benchmark

* Refactor Code

* [TypoFix] Fix GEMM Example

* [Docs] Init Linear Attention README

* [TYPO] Typo fix

* [Lint] Lint Fix

* enhance example with intrinsics

* [Enhancement] Improve Buffer Collection during IR Parser

* [Dev] Introduce Current classmethod to get current frame

* submodule update

* fake test pass update

* support thread_extent_api

* code optimize

* Add GEMM function implementation for matrix multiplication

* Update logging format to reflect TileLang in logger messages

* Refactor CMakeLists.txt for improved readability and set default build type to Release

* Support Gemm SS Primitives Implementation

* [README] Upload Tile Language Logo (#5)

* update logo

* Update README.md to enhance formatting and center the title

---------
Co-authored-by: microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com>
Co-authored-by: Microsoft Open Source <microsoftopensource@users.noreply.github.com>
Co-authored-by: Yu Cheng <yu.cheng@pku.edu.cn>

57ab687c