"llm/llama.cpp/models/ggml-vocab-refact.gguf.out" did not exist on "b0135f4b9b176eab9155b660d04c9ca2a1ec2341"
  • jayhshah's avatar
    Fp8 kernel with "in-kernel" transpose of V in producer (#1100) · 5018ac6a
    jayhshah authored
    * base version
    
    * restructure pipelines, add special fp8 epilogue
    
    * add variants
    
    * add fp8 causal and modify dynamic tile scheduler
    
    * better causal schedule
    
    * maintain two schedules for non causal and causal
    
    * removing macros
    
    * fix regression
    
    * clean up unneeded methods and variants
    
    * fix mistake with NumProducerThreads
    
    * base version
    
    * restructure pipelines, add special fp8 epilogue
    
    * add variants
    
    * add fp8 causal and modify dynamic tile scheduler
    
    * better causal schedule
    
    * maintain two schedules for non causal and causal
    
    * removing macros
    
    * fix regression
    
    * clean up unneeded methods and variants
    
    * fix mistake with NumProducerThreads
    
    * use seqlen traits
    
    * add fp8 .cu files and benchmark script
    
    * fix merge issue
    
    * fix merge issue
    
    * fix merge issue
    
    * remove duplicate code
    
    * fix regression with varseqlen
    
    * move varseqlen init in constexpr
    
    * fix test script
    
    * more constexpr on varseqlen and add max offset
    
    * add back test cases
    5018ac6a
benchmark_flash_attention_fp8.py 12.3 KB