• Lei Wang's avatar
    [Language] Support `T.annotate_l2_hit_ratio` via `cudaStreamSetAttribute` (#539) · a65f481e
    Lei Wang authored
    * Refactor OptimizeForTarget function by removing redundant buffer allocation step and cleaning up code
    
    * Removed the PlanAndUpdateBufferAllocationLocation step from the OptimizeForTarget function to streamline the optimization process.
    * Cleaned up unnecessary whitespace in the function for improved readability.
    * Enhanced the overall clarity and maintainability of the code.
    
    * Refactor AllocateNode handling in vectorize_loop.cc
    
    * Simplified the VisitStmt_ method for AllocateNode by removing the complex extent mutation logic.
    * Streamlined the allocation process to directly call the base class method, enhancing code clarity and maintainability.
    * Improved overall readability by eliminating unnecessary comments and code related to extent handling.
    
    * Remove `tl_kernel.c` file, eliminating the backward kernel implementation and associated error handling functions. This cleanup enhances code maintainability by removing unused components related to the backward kernel processing.
    
    * Add buffer allocation planning step in OptimizeForTarget function
    
    * Introduced the PlanAndUpdateBufferAllocationLocation step to the OptimizeForTarget function, enhancing the optimization process.
    * This addition improves the overall efficiency of buffer allocation during the target optimization phase, ensuring better resource management.
    
    * Update submodule TVM to latest commit db50d4e, ensuring alignment with upstream changes.
    
    * Add L2 persistent annotation support and related functionality
    
    * Introduced a new file `lower_l2_persistent_annotation.cc` to handle the lowering of L2 persistent annotations.
    * Added functions to annotate L2 hit ratios for buffers, ensuring compatibility with global buffer requirements.
    * Updated the `LowerAndLegalize` function to include the new L2 persistent map lowering step.
    * Enhanced CUDA driver with a function to retrieve the maximum size of the persisting L2 cache.
    * Modified the `TLCUDASourceWrapper` class to integrate L2 persistent map handling during kernel launches.
    
    These changes improve the framework's ability to manage L2 cache optimizations, enhancing performance for CUDA applications.
    
    * lint fix
    a65f481e
lower_l2_persistent_annotation.cc 3.46 KB