"git@developer.sourcefind.cn:change/sglang.git" did not exist on "6f81a710f79626ef2cf56a84acaf1127c63d0d1c"
  1. 15 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Backend][WebGPU] Support WebGPU WGSL code generation (#86) · c8fc0cbb
      Lei Wang authored
      * bump version into v0.1.0
      
      * [Enhancement] Add custom develop command for editable installs and update .gitignore
      
      * [Documentation] Update README to include system dependencies installation instructions
      
      * [Build] Update setup.py to support library file copying for both release and develop modes
      
      * [Build] Refactor library file copying logic in setup.py
      
      * [Documentation] Remove unnecessary install section header in Installation.md
      
      * [Build] Add tox configuration and local distribution script for multi-Python version support
      
      * [Build] Improve git submodule update function with better error handling
      
      * [Build] Update LLVM configuration path in ROCm installation script
      
      * [Build] Add .tox/ to .gitignore for tox testing environment
      
      * [Build] Add support for TVM prebuild path configuration in CMakeLists.txt
      
      * [Cleanup] Remove unused TVM runtime error codes header
      
      * [Cleanup] Fix TVM grid constant type reference in CUDA module
      
      * [Cleanup] Remove unused customized_code function from IR module
      
      * [Feature] Add TileLang thread synchronization and storage access analysis passes
      
      * [Build] Reorder DLL search path directories for more flexible library loading
      
      * [Refactor] Improve thread synchronization and library path handling
      
      - Rename ThreadSync and TileLangThreadSync functions in C++ code
      - Update Python docstring for ThreadSync with more detailed description
      - Reorder library path detection in tilelang environment setup
      - Minor comment and code cleanup in CUDA and warp specialization modules
      
      * [Refactor] Improve thread synchronization code style and formatting
      
      - Standardize pointer type spacing in storage_access.h and storage_access.cc
      - Update whitespace and indentation in thread_storage_sync.cc
      - Reorder include statements in thread_partial_sync.cc
      - Minor code formatting improvements across thread synchronization files
      
      * [Refactor] Fix global function registration for ThreadSync
      
      - Correct global function registration to use ThreadSync instead of TileLangThreadSync
      - Update TVM global registration to match recent refactoring efforts
      
      * [Refactor] Simplify ThreadSync global function registration
      
      - Remove unnecessary whitespace in global function registration
      - Compact the TVM global registration line for ThreadSync
      
      * [Feature] Add WebGPU code generation support in TileLang
      
      - Implement WebGPU code generator (codegen_webgpu.cc and codegen_webgpu.h)
      - Add WebGPU target support in lower.py and target.py
      - Update CMakeLists.txt to include WebGPU codegen source files
      - Introduce WebGPU-specific code generation for WGSL shader language
      
      * [Refactor] Improve WebGPU code generation formatting and readability
      
      - Enhance code formatting in codegen_webgpu.cc and codegen_webgpu.h
      - Standardize pointer type spacing and indentation
      - Improve line breaks and reduce line length for better readability
      - Minor code style improvements in WebGPU code generation
      
      * [Test] Add WebGPU matrix multiplication code generation test
      
      - Implement test_webgpu_codegen.py for WebGPU matrix multiplication
      - Add assert_gemm_codegen function to validate WebGPU code generation
      - Include basic matrix multiplication kernel test case
      
      * Update README with WebGPU codegen support announcement
      c8fc0cbb
  2. 14 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Separate tilelang Pass Thread Sync (with Hopper support) from tvm (#85) · ec84188f
      Lei Wang authored
      * bump version into v0.1.0
      
      * [Enhancement] Add custom develop command for editable installs and update .gitignore
      
      * [Documentation] Update README to include system dependencies installation instructions
      
      * [Build] Update setup.py to support library file copying for both release and develop modes
      
      * [Build] Refactor library file copying logic in setup.py
      
      * [Documentation] Remove unnecessary install section header in Installation.md
      
      * [Build] Add tox configuration and local distribution script for multi-Python version support
      
      * [Build] Improve git submodule update function with better error handling
      
      * [Build] Update LLVM configuration path in ROCm installation script
      
      * [Build] Add .tox/ to .gitignore for tox testing environment
      
      * [Build] Add support for TVM prebuild path configuration in CMakeLists.txt
      
      * [Cleanup] Remove unused TVM runtime error codes header
      
      * [Cleanup] Fix TVM grid constant type reference in CUDA module
      
      * [Cleanup] Remove unused customized_code function from IR module
      
      * [Feature] Add TileLang thread synchronization and storage access analysis passes
      
      * [Build] Reorder DLL search path directories for more flexible library loading
      
      * [Refactor] Improve thread synchronization and library path handling
      
      - Rename ThreadSync and TileLangThreadSync functions in C++ code
      - Update Python docstring for ThreadSync with more detailed description
      - Reorder library path detection in tilelang environment setup
      - Minor comment and code cleanup in CUDA and warp specialization modules
      
      * [Refactor] Improve thread synchronization code style and formatting
      
      - Standardize pointer type spacing in storage_access.h and storage_access.cc
      - Update whitespace and indentation in thread_storage_sync.cc
      - Reorder include statements in thread_partial_sync.cc
      - Minor code formatting improvements across thread synchronization files
      
      * [Refactor] Fix global function registration for ThreadSync
      
      - Correct global function registration to use ThreadSync instead of TileLangThreadSync
      - Update TVM global registration to match recent refactoring efforts
      
      * [Refactor] Simplify ThreadSync global function registration
      
      - Remove unnecessary whitespace in global function registration
      - Compact the TVM global registration line for ThreadSync
      ec84188f
  3. 13 Feb, 2025 3 commits
    • Lei Wang's avatar
      [WHL] Support whl building for different python versions via tox (#83) · f55defac
      Lei Wang authored
      * bump version into v0.1.0
      
      * [Enhancement] Add custom develop command for editable installs and update .gitignore
      
      * [Documentation] Update README to include system dependencies installation instructions
      
      * [Build] Update setup.py to support library file copying for both release and develop modes
      
      * [Build] Refactor library file copying logic in setup.py
      
      * [Documentation] Remove unnecessary install section header in Installation.md
      
      * [Build] Add tox configuration and local distribution script for multi-Python version support
      
      * [Build] Improve git submodule update function with better error handling
      f55defac
    • Lei Wang's avatar
      [Bugfix] Bugfix of installing with develop mode (#81) · b43ab2dd
      Lei Wang authored
      * bump version into v0.1.0
      
      * [Enhancement] Add custom develop command for editable installs and update .gitignore
      
      * [Documentation] Update README to include system dependencies installation instructions
      
      * [Build] Update setup.py to support library file copying for both release and develop modes
      
      * [Build] Refactor library file copying logic in setup.py
      
      * [Documentation] Remove unnecessary install section header in Installation.md
      b43ab2dd
    • Wenhao Xie's avatar
      [Doc] Convert docs from rst format to Markdown format. (#82) · d44e291c
      Wenhao Xie authored
      * [CI] Clean up target repository before publishing documentation.
      
      * [Doc] Convert docs from rst format to Markdown format.
      d44e291c
  4. 12 Feb, 2025 2 commits
  5. 11 Feb, 2025 4 commits
    • Yu Cheng's avatar
      [Dev] Add mha backward example (#77) · a6fe61e2
      Yu Cheng authored
      * [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized
      
      * Relax the mismatch ratio restrictions in the flash_linear_attention and mha tests
      
      * [Dev] Add mha backward example
      a6fe61e2
    • Lei Wang's avatar
      [Carver] Remove legacy todo items in carver's readme (#74) · a26a4315
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      
      * Add legend to plot_layout for improved clarity of thread and local IDs
      
      * Remove unnecessary dependencies from requirements files for cleaner setup
      
      * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
      
      * Add build requirements and update installation scripts for improved setup
      
      * Introduce carver
      
      * Refactor imports and improve code formatting for consistency
      
      * Add unit tests for carver recommendation hints
      
      * lint fix
      
      * Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity
      
      * Refactor import statements and clean up whitespace in template files for improved readability
      
      * Add README.md for Carver framework with usage examples and architecture support
      
      * Refactor import statement in matmul_analysis.py for consistency
      
      * Refactor TileDict and TensorCorePolicy methods for improved clarity and functionality
      
      * Add tests for general matrix multiplication emit configurations
      
      * Refactor formatting in test_tilelang_carver_generate_hints.py for improved readability
      
      * Add FlashAttentionTemplate and related functionality for hint recommendations
      
      * Refactor whitespace in FlashAttentionTemplate and test_tilelang_carver_recommend_hints for improved readability
      
      * Update README.md to include FlashAttentionTemplate in the carver section
      a26a4315
    • Lei Wang's avatar
      [CostModel][Carver] Support Hint Recommend for Shared memory Kernel Fusion (#73) · 1ef644e7
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      
      * Add legend to plot_layout for improved clarity of thread and local IDs
      
      * Remove unnecessary dependencies from requirements files for cleaner setup
      
      * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
      
      * Add build requirements and update installation scripts for improved setup
      
      * Introduce carver
      
      * Refactor imports and improve code formatting for consistency
      
      * Add unit tests for carver recommendation hints
      
      * lint fix
      
      * Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity
      
      * Refactor import statements and clean up whitespace in template files for improved readability
      
      * Add README.md for Carver framework with usage examples and architecture support
      
      * Refactor import statement in matmul_analysis.py for consistency
      
      * Refactor TileDict and TensorCorePolicy methods for improved clarity and functionality
      
      * Add tests for general matrix multiplication emit configurations
      
      * Refactor formatting in test_tilelang_carver_generate_hints.py for improved readability
      
      * Add FlashAttentionTemplate and related functionality for hint recommendations
      
      * Refactor whitespace in FlashAttentionTemplate and test_tilelang_carver_recommend_hints for improved readability
      1ef644e7
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized (#72) · 465f0107
      Yu Cheng authored
      * [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized
      
      * Relax the mismatch ratio restrictions in the flash_linear_attention and mha tests
      465f0107
  6. 10 Feb, 2025 3 commits
    • Lei Wang's avatar
      [Bugfix] bug fix for bitblas dependency (#71) · be946d02
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      
      * Add legend to plot_layout for improved clarity of thread and local IDs
      
      * Remove unnecessary dependencies from requirements files for cleaner setup
      
      * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
      
      * Add build requirements and update installation scripts for improved setup
      
      * Introduce carver
      
      * Refactor imports and improve code formatting for consistency
      
      * Add unit tests for carver recommendation hints
      
      * lint fix
      
      * Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity
      
      * Refactor import statements and clean up whitespace in template files for improved readability
      
      * Add README.md for Carver framework with usage examples and architecture support
      
      * Refactor import statement in matmul_analysis.py for consistency
      be946d02
    • Lei Wang's avatar
      [Carver] Introduce a tile-structure based cost model for auto tuning (#70) · cd191889
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      
      * Add legend to plot_layout for improved clarity of thread and local IDs
      
      * Remove unnecessary dependencies from requirements files for cleaner setup
      
      * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
      
      * Add build requirements and update installation scripts for improved setup
      
      * Introduce carver
      
      * Refactor imports and improve code formatting for consistency
      
      * Add unit tests for carver recommendation hints
      
      * lint fix
      
      * Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity
      
      * Refactor import statements and clean up whitespace in template files for improved readability
      
      * Add README.md for Carver framework with usage examples and architecture support
      cd191889
    • Lei Wang's avatar
      [Dev] Remove unnecessary python dependencies (#69) · 2411fa28
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      
      * Add legend to plot_layout for improved clarity of thread and local IDs
      
      * Remove unnecessary dependencies from requirements files for cleaner setup
      
      * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
      
      * Add build requirements and update installation scripts for improved setup
      2411fa28
  7. 09 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Tools] Introduce `plot_layout` to visualize the fragment layout (#68) · f9b6a92e
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      
      * enhance acknowledgement
      
      * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
      
      * Update subproject commit for TVM dependency
      
      * Update subproject commit for TVM dependency
      
      * Add int4_t type and functions for packing char values in CUDA common header
      
      * Add plot_layout example and implement GetForwardVars method in layout classes
      
      * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
      
      * Fix formatting by removing unnecessary line break in layout.h
      
      * Refactor make_int4 function for improved readability by adjusting parameter formatting
      f9b6a92e
  8. 08 Feb, 2025 1 commit
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang transform InjectFenceProxy (#66) · 0677e542
      Yu Cheng authored
      * [Dev] Add FlashDecoding example
      
      * [CI][Test] Add test cases for tilelang kernel convolution
      
      * [CI][Test] Add test cases for tilelang kernel FlashAttention
      
      * Reduce the number of stages to ensure the shared memory allocation is valid
      
      * Temporarily remove the dim128 case
      
      * lint
      
      * update einops in requirements-dev.txt
      
      * update einops in requirements-test.txt
      
      * remove einops in requirements-dev.txt
      
      * [CI][Test] Add test cases for tilelang transform ClusterPlanning
      
      * [CI][Test] Add test cases for tilelang transform LowerHopperIntrin
      
      * [CI][Test] Add test cases for tilelang transform InjectFenceProxy
      0677e542
  9. 06 Feb, 2025 2 commits
    • Lei Wang's avatar
      [Dev] Add test case for bfloat16 and int4 gemm with mma (#65) · 0d19e268
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      0d19e268
    • Lei Wang's avatar
      [Dev] Support FP8 Codegen for cuda backend (#64) · 61de5288
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      61de5288
  10. 03 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Dev] Separate `LoopVectorize` Pass from upstream tvm (#62) · 7111239d
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      7111239d
  11. 02 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Doc] Add matmul kernel tutorial documentations with tile library (#60) · ea612446
      Lei Wang authored
      * implement jit test case
      
      * [Dev] implement auto tune test case for matrix multiplication
      
      * Implement test for legalize memory access and vectorized loop
      
      * lint fix
      
      * introduce run_once
      
      * Refactor callback function names for consistency and improve code readability
      
      * enhance documentations
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix formatting issues in rt_mod_hip.cc
      
      * add random seed initialization for deterministic testing
      
      * Add documentation images and comprehensive GEMM tutorial for TileLang
      
      * Update MATMUL documentation title to highlight Tile Library
      ea612446
  12. 27 Jan, 2025 1 commit
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang transform LowerHopperIntrin (#59) · 7d4156df
      Yu Cheng authored
      * [Dev] Add FlashDecoding example
      
      * [CI][Test] Add test cases for tilelang kernel convolution
      
      * [CI][Test] Add test cases for tilelang kernel FlashAttention
      
      * Reduce the number of stages to ensure the shared memory allocation is valid
      
      * Temporarily remove the dim128 case
      
      * lint
      
      * update einops in requirements-dev.txt
      
      * update einops in requirements-test.txt
      
      * remove einops in requirements-dev.txt
      
      * [CI][Test] Add test cases for tilelang transform ClusterPlanning
      
      * [CI][Test] Add test cases for tilelang transform LowerHopperIntrin
      7d4156df
  13. 26 Jan, 2025 3 commits
    • Lei Wang's avatar
      [Doc] Addd debug relevant testing and documentations (#58) · 5e259239
      Lei Wang authored
      * implement jit test case
      
      * [Dev] implement auto tune test case for matrix multiplication
      
      * Implement test for legalize memory access and vectorized loop
      
      * lint fix
      
      * introduce run_once
      
      * Refactor callback function names for consistency and improve code readability
      
      * enhance documentations
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix formatting issues in rt_mod_hip.cc
      
      * add random seed initialization for deterministic testing
      5e259239
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang transform ClusterPlanning (#57) · 0d8421f1
      Yu Cheng authored
      * [Dev] Add FlashDecoding example
      
      * [CI][Test] Add test cases for tilelang kernel convolution
      
      * [CI][Test] Add test cases for tilelang kernel FlashAttention
      
      * Reduce the number of stages to ensure the shared memory allocation is valid
      
      * Temporarily remove the dim128 case
      
      * lint
      
      * update einops in requirements-dev.txt
      
      * update einops in requirements-test.txt
      
      * remove einops in requirements-dev.txt
      
      * [CI][Test] Add test cases for tilelang transform ClusterPlanning
      0d8421f1
    • Wenhao Xie's avatar
  14. 25 Jan, 2025 8 commits
  15. 24 Jan, 2025 6 commits
  16. 23 Jan, 2025 2 commits
    • Lei Wang's avatar
      [Bugfix] Reorder Passes: Place Vectorize Loop Before StorageFlatten and... · 951c2300
      Lei Wang authored
      [Bugfix] Reorder Passes: Place Vectorize Loop Before StorageFlatten and FlattenBuffer to Prevent Redundant Allocations (#37)
      
      * installation script fix
      
      * readme typo fix
      
      * doc fix for dequantize gemm
      
      * [Doc] remove CODE_OF_CONDUCT.md and SECURITY.md; update references in CONTRIBUTING.md
      
      * [Doc] add unit tests for AnnotateDeviceRegions transform; remove SUPPORT.md
      
      * update license
      
      * [Enhancement] add tensor supply handling for unsigned integers; improve error message for execution backend assertion
      
      * [Refactor] improve code readability by reformatting function signatures and assertions
      
      * [Refactor] replace torch.manual_seed with tilelang.testing.set_random_seed for consistency in random seed handling
      
      * [Refactor] unify thread binding variable naming across kernel and example files
      
      * [Refactor] remove unused thread binding parameter from matrix multiplication functions
      
      * [Refactor] remove unused thread binding parameter from matrix multiplication functions
      
      * [Refactor] enable main testing function in tilelang kernel gemm test
      
      * bug fix
      
      * lint fix
      
      * [Refactor] reorder vectorize loop
      951c2300
    • Lei Wang's avatar
      [Refactor] Simplify interface via replacing argument thread binding of... · 362b3520
      Lei Wang authored
      [Refactor] Simplify interface via replacing argument thread binding of intrinsics with `KernelFrame.Current` (#34)
      
      * installation script fix
      
      * readme typo fix
      
      * doc fix for dequantize gemm
      
      * [Doc] remove CODE_OF_CONDUCT.md and SECURITY.md; update references in CONTRIBUTING.md
      
      * [Doc] add unit tests for AnnotateDeviceRegions transform; remove SUPPORT.md
      
      * update license
      
      * [Enhancement] add tensor supply handling for unsigned integers; improve error message for execution backend assertion
      
      * [Refactor] improve code readability by reformatting function signatures and assertions
      
      * [Refactor] replace torch.manual_seed with tilelang.testing.set_random_seed for consistency in random seed handling
      
      * [Refactor] unify thread binding variable naming across kernel and example files
      
      * [Refactor] remove unused thread binding parameter from matrix multiplication functions
      
      * [Refactor] remove unused thread binding parameter from matrix multiplication functions
      
      * [Refactor] enable main testing function in tilelang kernel gemm test
      
      * bug fix
      362b3520