• Lei Wang's avatar
    [FastMath] Disable default TVM fastmath intrinsic dispatch and add explicit... · 95c373f5
    Lei Wang authored
    [FastMath] Disable default TVM fastmath intrinsic dispatch and add explicit fastmath op to invoke (#875)
    
    * Add fast math operations for CUDA: exp, exp10, log, log2, log10, tan, cos, and sin (#865)
    
    * Refactor fast math operation definitions for consistency and readability in CUDA code. Consolidated multiple definitions into single lines and improved formatting in related test files for better clarity.
    
    * Remove unnecessary pass configurations for warp specialization and TMA lowering in fast math operation tests for CUDA. This simplifies the test setup while maintaining the focus on fast math functionality.
    
    * Update fastmath tests to reflect that tl.* intrinsics generate no fastmath versions and disable cache in main execution.
    
    * Fix formatting in fastmath test comments for clarity on tl.* intrinsics behavior.
    
    * Add precision comparison tool for CUDA operations
    
    This commit introduces a new Python script and CUDA source file for a precision comparison tool that evaluates the accuracy of various CUDA operations (including division, reciprocal, exponential, logarithmic, and trigonometric functions) across different implementations: CUDA Precise, CUDA Fast, Triton, Triton LibDevice, and TileLang. The tool generates test data, executes the operations, and summarizes the error statistics for each implementation against a double precision reference. Additionally, a README file is added to document the results of the comparisons for various operations.
    
    * Add precision comparison tool for CUDA operations
    
    This commit introduces a new precision comparison tool implemented in Python and CUDA, designed to evaluate the accuracy of various mathematical operations (division, reciprocal, exponential, logarithmic, trigonometric, square root, etc.) across different frameworks including CUDA Precise/Fast, Triton, Triton LibDevice, PyTorch, and TileLang. The tool includes functionality for generating test data, executing operations, and summarizing error statistics for each implementation. Additionally, it provides a comprehensive README with error metrics for each operation tested.
    95c373f5
__init__.py 6.74 KB