• Lei Wang's avatar
    [Dev] Remove unnecessary python dependencies (#69) · 2411fa28
    Lei Wang authored
    * [Enhancement] Add VectorizeLoop function and update imports for compatibility
    
    * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
    
    * lint fix
    
    * Fix incorrect module reference for VectorizeLoop transformation
    
    * Refactor vectorize_loop transformation by removing unused extent mutation logic
    
    * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
    
    * Fix formatting in CUDA FP8 header file for consistency
    
    * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
    
    * Update submodule 'tvm' to latest commit for improved functionality
    
    * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
    
    * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
    
    * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
    
    * Add CUDA requirements to FP8 test cases and update references for clarity
    
    * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
    
    * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
    
    * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
    
    * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
    
    * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
    
    * Add BF16 support to matrix multiplication and introduce corresponding test cases
    
    * Add a blank line for improved readability in BF16 GEMM test
    
    * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
    
    * enhance acknowledgement
    
    * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives
    
    * Update subproject commit for TVM dependency
    
    * Update subproject commit for TVM dependency
    
    * Add int4_t type and functions for packing char values in CUDA common header
    
    * Add plot_layout example and implement GetForwardVars method in layout classes
    
    * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files
    
    * Fix formatting by removing unnecessary line break in layout.h
    
    * Refactor make_int4 function for improved readability by adjusting parameter formatting
    
    * Add legend to plot_layout for improved clarity of thread and local IDs
    
    * Remove unnecessary dependencies from requirements files for cleaner setup
    
    * Remove flash_mha.py and add .gitkeep to deepseek_mla directory
    
    * Add build requirements and update installation scripts for improved setup
    2411fa28
install_cpu.sh 3.5 KB