• Lei Wang's avatar
    [Cache] Introduce detailed target information for the disk kernel cache (#780) · 7ffc5b44
    Lei Wang authored
    * Fix type hint for target_host parameter in compile function to allow None value
    
    * Refactor target handling in compile function to utilize determine_target for improved clarity and consistency
    
    * Update PrintConst function in codegen_cuda.cc to use hexfloat format for bfloat16 and float8/float4 types, while adding scientific notation comments for clarity. This change enhances the representation of floating-point constants in the generated code.
    
    * Refactor PrintType function in codegen_cuda.cc to remove unnecessary failure conditions for floating-point types with lane counts greater than 4. This change simplifies the logic and improves code clarity.
    
    * Enhance benchmark_matmul.py to conditionally print Reference TFlops only if ref_latency is not None. Update param.py to ensure target is converted to string for consistency. Refactor tuner.py to utilize determine_target for improved clarity in target handling.
    
    * Remove automatic commit and push step from AMD and NVIDIA CI workflows to streamline the process and avoid unnecessary commits.
    7ffc5b44
ci.yml 4.22 KB