• Lei Wang's avatar
    [Docker] Update Dockerfiles to specify exact version of libstdcxx-ng (#203) · 05d72dfc
    Lei Wang authored
    * Optimize CMake build process with dynamic job count calculation
    
    - Modify build_csrc function to use 90% of available CPU cores
    - Ensure at least one job is used during compilation
    - Improve build performance by dynamically adjusting parallel job count
    
    * Optimize build_csrc function with multiprocessing module
    
    - Replace os.cpu_count() with multiprocessing.cpu_count()
    - Maintain existing 90% CPU utilization logic
    - Improve CPU core count calculation for build process
    
    * Add dynamic shape support with out_idx in Cython JIT kernel compilation
    
    - Implement `run_cython_dynamic_shape_with_out_idx` function in test_tilelang_jit_gemm_cython.py
    - Update Cython wrapper to handle dynamic symbolic shapes during tensor allocation
    - Add support for resolving dynamic shape dimensions using input tensor references
    - Enhance flexibility of JIT kernel compilation with symbolic shape handling
    
    * Enhance error reporting for dynamic symbolic shape resolution in Cython JIT kernel
    
    - Add detailed error message when a dynamic symbolic dimension is not found in dynamic_symbolic_map
    - Improve debugging by providing context about missing symbolic dimensions
    - Maintain existing dynamic shape resolution logic
    
    * Fix Copy operation handling for scalar and multi-dimensional tensors
    
    - Add special handling for scalar tensor copy operations
    - Enhance error reporting in MakeIndices method with more detailed diagnostic information
    - Improve SIMT loop generation to support zero-dimensional tensors
    - Add explicit check and handling for scalar tensor scenarios
    
    * Refactor Copy operation code formatting and improve readability
    
    - Improve code formatting in MakeIndices and MakeSIMTLoop methods
    - Add line breaks to enhance readability of complex ICHECK statements
    - Simplify code structure in scalar tensor handling
    - Remove unnecessary whitespace and improve code alignment
    
    * Simplify GEMM example with direct kernel compilation
    
    - Update copyright header to Tile-AI Corporation
    - Remove Profiler import and usage
    - Replace tilelang.lower() with tilelang.compile()
    - Simplify kernel execution workflow
    - Update kernel source retrieval method
    
    * Enhance block sparse attention implementation
    
    - Update `blocksparse_flashattn` to use 2 stages for improved performance.
    - Change `block_mask_dtype` from `int8` to `bool` for better memory efficiency.
    - Modify condition checks in the kernel to utilize boolean values.
    - Introduce a new example for top-k sparse attention and a benchmark for native sparse attention.
    - Add support for asynchronous copy in PTX and improve pipeline planning with condition handling.
    
    * Refactor and clean up code formatting across multiple files
    
    - Added whitespace for improved readability in `example_blocksparse_gemm.py`, `example_tilelang_nsa_fwd.py`, and `benchmark_nsa_fwd.py`.
    - Enhanced code structure and alignment in `inject_ptx_async_copy.cc` and `pipeline_planning.cc`.
    - Updated comments and documentation for clarity in `__init__.py` and `phase.py`.
    - Ensured consistent formatting and style across the codebase.
    
    * Add kernel source printing in example_tilelang_nsa_fwd.py and implement IfThenElse node replacement in inject_pipeline.cc
    
    - Added a print statement to output the kernel source in `example_tilelang_nsa_fwd.py` for debugging purposes.
    - Introduced a new function `replace_if_then_else` in `inject_pipeline.cc` to transform IfThenElse nodes while preserving attributes, enhancing the handling of conditional statements in the pipeline.
    
    * Refactor condition handling in inject_pipeline.cc
    
    - Change the data structure for mapping conditions to statements from a Map to an Array for improved performance and simplicity.
    - Update condition comparison logic to use StructuralEqual for better accuracy.
    - Enhance logging to provide detailed insights into condition changes and statement processing.
    - Adjust final statement construction to utilize the new data structure, ensuring correct handling of conditions and statements.
    
    * Improve logging and formatting in inject_pipeline.cc
    
    - Enhance logging statements for better clarity on condition changes and statement processing.
    - Adjust formatting for improved readability, including line breaks and consistent spacing.
    - Ensure accurate condition comparison and handling in the pipeline logic.
    
    * Refactor logging and clean up inject_pipeline.cc
    
    - Remove excessive logging statements to streamline the code and improve performance.
    - Simplify condition handling by eliminating unnecessary log outputs related to condition changes and statement processing.
    - Maintain the core functionality while enhancing code readability and maintainability.
    
    * Update Dockerfiles to specify exact version of libstdcxx-ng
    
    - Change installation command in multiple Dockerfiles to use `libstdcxx-ng=12` instead of `libstdcxx-ng-12` for consistency and to avoid potential issues with package resolution.
    - Ensure all Dockerfiles from cu118 to cu126 reflect this change for uniformity across builds.
    05d72dfc
Dockerfile.cu120 1.02 KB