1. 01 Aug, 2024 2 commits
  2. 30 Jul, 2024 1 commit
    • jayhshah's avatar
      Fp8 kernel with "in-kernel" transpose of V in producer (#1100) · 5018ac6a
      jayhshah authored
      * base version
      
      * restructure pipelines, add special fp8 epilogue
      
      * add variants
      
      * add fp8 causal and modify dynamic tile scheduler
      
      * better causal schedule
      
      * maintain two schedules for non causal and causal
      
      * removing macros
      
      * fix regression
      
      * clean up unneeded methods and variants
      
      * fix mistake with NumProducerThreads
      
      * base version
      
      * restructure pipelines, add special fp8 epilogue
      
      * add variants
      
      * add fp8 causal and modify dynamic tile scheduler
      
      * better causal schedule
      
      * maintain two schedules for non causal and causal
      
      * removing macros
      
      * fix regression
      
      * clean up unneeded methods and variants
      
      * fix mistake with NumProducerThreads
      
      * use seqlen traits
      
      * add fp8 .cu files and benchmark script
      
      * fix merge issue
      
      * fix merge issue
      
      * fix merge issue
      
      * remove duplicate code
      
      * fix regression with varseqlen
      
      * move varseqlen init in constexpr
      
      * fix test script
      
      * more constexpr on varseqlen and add max offset
      
      * add back test cases
      5018ac6a
  3. 29 Jul, 2024 3 commits
  4. 27 Jul, 2024 1 commit
  5. 25 Jul, 2024 3 commits
  6. 24 Jul, 2024 1 commit
  7. 23 Jul, 2024 11 commits
  8. 22 Jul, 2024 4 commits
    • Cameron Shinn's avatar
      cb516f85
    • Phil Wang's avatar
      backwards for softcapping (#1033) · 5f1ae4a3
      Phil Wang authored
      * check in the two ways of approaching backwards for softcapping, both functional
      
      * prepare the softcap switch for backwards
      
      * temporary
      
      * cleanup to the way Tri prefers
      
      * calculate dtanh when copying from scores -> dtanh Tensor
      
      * no ternary operators allowed for constexpr, so just use some hack found online
      
      * fix maybe_dtanh, restore some files
      
      * restore another file
      
      * move calculate_dtanh to utils and colocate with apply_softcap
      
      * cleanup
      
      * maybe last cleanup
      
      * save for another pr
      
      * remove a stray line
      
      * fix spacing
      
      * fix an issue, and make test_flash_attn.py ready to test softcapping backwards
      5f1ae4a3
    • youkaichao's avatar
      remove lambda (#1056) · ef3e358a
      youkaichao authored
      ef3e358a
    • Jorge António's avatar
      catch typo (#1058) · 4df62e14
      Jorge António authored
      4df62e14
  9. 15 Jul, 2024 1 commit
  10. 13 Jul, 2024 1 commit
  11. 11 Jul, 2024 8 commits
  12. 10 Jul, 2024 4 commits