"library/vscode:/vscode.git/clone" did not exist on "d2b2d273257a2c72c7d5b1373b804fc3f1859ca4"
  1. 02 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Math] Dispatch `T.rsqrt(x)` into cuda intrin instead of `1 / T.sqrt(x)` (#781) · b66f9aae
      Lei Wang authored
      * Fix type hint for target_host parameter in compile function to allow None value
      
      * Refactor target handling in compile function to utilize determine_target for improved clarity and consistency
      
      * Update PrintConst function in codegen_cuda.cc to use hexfloat format for bfloat16 and float8/float4 types, while adding scientific notation comments for clarity. This change enhances the representation of floating-point constants in the generated code.
      
      * Refactor PrintType function in codegen_cuda.cc to remove unnecessary failure conditions for floating-point types with lane counts greater than 4. This change simplifies the logic and improves code clarity.
      
      * Enhance benchmark_matmul.py to conditionally print Reference TFlops only if ref_latency is not None. Update param.py to ensure target is converted to string for consistency. Refactor tuner.py to utilize determine_target for improved clarity in target handling.
      
      * Remove automatic commit and push step from AMD and NVIDIA CI workflows to streamline the process and avoid unnecessary commits.
      
      * Add intrin_rule source files to CMakeLists.txt and implement hrsqrt function for half_t in common.h
      
      * lint fix
      
      * remove cmake dep in pyproject as it may lead to different cmake paths in diff stages
      
      * lint fix
      
      * Add cmake dependency to pyproject.toml and improve build logging in setup.py
      b66f9aae