• jayhshah's avatar
    Fp8 kernel with "in-kernel" transpose of V in producer (#1100) · 5018ac6a
    jayhshah authored
    * base version
    
    * restructure pipelines, add special fp8 epilogue
    
    * add variants
    
    * add fp8 causal and modify dynamic tile scheduler
    
    * better causal schedule
    
    * maintain two schedules for non causal and causal
    
    * removing macros
    
    * fix regression
    
    * clean up unneeded methods and variants
    
    * fix mistake with NumProducerThreads
    
    * base version
    
    * restructure pipelines, add special fp8 epilogue
    
    * add variants
    
    * add fp8 causal and modify dynamic tile scheduler
    
    * better causal schedule
    
    * maintain two schedules for non causal and causal
    
    * removing macros
    
    * fix regression
    
    * clean up unneeded methods and variants
    
    * fix mistake with NumProducerThreads
    
    * use seqlen traits
    
    * add fp8 .cu files and benchmark script
    
    * fix merge issue
    
    * fix merge issue
    
    * fix merge issue
    
    * remove duplicate code
    
    * fix regression with varseqlen
    
    * move varseqlen init in constexpr
    
    * fix test script
    
    * more constexpr on varseqlen and add max offset
    
    * add back test cases
    5018ac6a
flash_api.cpp 32.1 KB