"...resnet50_tensorflow.git" did not exist on "b5248395b0e1e86f9a8a5ad620624f8a56e6c370"
  1. 29 May, 2025 1 commit
    • Lei Wang's avatar
      [Language] Support `T.annotate_l2_hit_ratio` via `cudaStreamSetAttribute` (#539) · a65f481e
      Lei Wang authored
      * Refactor OptimizeForTarget function by removing redundant buffer allocation step and cleaning up code
      
      * Removed the PlanAndUpdateBufferAllocationLocation step from the OptimizeForTarget function to streamline the optimization process.
      * Cleaned up unnecessary whitespace in the function for improved readability.
      * Enhanced the overall clarity and maintainability of the code.
      
      * Refactor AllocateNode handling in vectorize_loop.cc
      
      * Simplified the VisitStmt_ method for AllocateNode by removing the complex extent mutation logic.
      * Streamlined the allocation process to directly call the base class method, enhancing code clarity and maintainability.
      * Improved overall readability by eliminating unnecessary comments and code related to extent handling.
      
      * Remove `tl_kernel.c` file, eliminating the backward kernel implementation and associated error handling functions. This cleanup enhances code maintainability by removing unused components related to the backward kernel processing.
      
      * Add buffer allocation planning step in OptimizeForTarget function
      
      * Introduced the PlanAndUpdateBufferAllocationLocation step to the OptimizeForTarget function, enhancing the optimization process.
      * This addition improves the overall efficiency of buffer allocation during the target optimization phase, ensuring better resource management.
      
      * Update submodule TVM to latest commit db50d4e, ensuring alignment with upstream changes.
      
      * Add L2 persistent annotation support and related functionality
      
      * Introduced a new file `lower_l2_persistent_annotation.cc` to handle the lowering of L2 persistent annotations.
      * Added functions to annotate L2 hit ratios for buffers, ensuring compatibility with global buffer requirements.
      * Updated the `LowerAndLegalize` function to include the new L2 persistent map lowering step.
      * Enhanced CUDA driver with a function to retrieve the maximum size of the persisting L2 cache.
      * Modified the `TLCUDASourceWrapper` class to integrate L2 persistent map handling during kernel launches.
      
      These changes improve the framework's ability to manage L2 cache optimizations, enhancing performance for CUDA applications.
      
      * lint fix
      a65f481e