1. 16 Jul, 2025 4 commits
  2. 14 Jul, 2025 1 commit
    • Thorsten Kurth's avatar
      Tkurth/cleanup (#90) · ab44ba59
      Thorsten Kurth authored
      * removing duplicate code from distributed convoloution
      
      * replacing from_numpy with as_tensor
      
      * make preprocess_psi_tensor GPU ready.
      ab44ba59
  3. 08 Jul, 2025 1 commit
    • Thorsten Kurth's avatar
      Tkurth/remove sparse coo tensor (#89) · bd92cdf7
      Thorsten Kurth authored
      * refactoring disco backend code
      
      * removed get_psi as member function and instead put it in _disco_convolution
      
      * setting seeds in tests more consistently
      
      * parametrized test classes to ensure that tests are always run on both CPU and GPU (if available)
      
      * cleaning up
      bd92cdf7
  4. 07 Jul, 2025 1 commit
  5. 04 Jul, 2025 1 commit
  6. 03 Jul, 2025 3 commits
  7. 02 Jul, 2025 4 commits
  8. 18 Jun, 2025 1 commit
  9. 17 Jun, 2025 2 commits
  10. 16 Jun, 2025 2 commits
  11. 13 Jun, 2025 2 commits
  12. 11 Jun, 2025 3 commits
  13. 06 Jun, 2025 1 commit
  14. 04 Jun, 2025 1 commit
  15. 02 Jun, 2025 1 commit
    • Max Rietmann's avatar
      Optimized CUDA kernels for improved backward gradient computation · 5f051c97
      Max Rietmann authored
      
      
      Introduce new CUDA kernels, `s2_attention_bwd_dkvq_kernel_mbT` and
      `s2_attention_kernel_mbT`, for more efficient computation of backward gradients
      and forward attention respectively. These changes optimize memory access
      patterns and employ coalesced operations by leveraging tensor transpositions.
      
      Forward kernel written by Mauro Bisson
      Backwards kernel written by Andrea Paris (aparis@ethz.ch) and Max Rietmann
      
      Parallelization strategy computes 1 output per Warp, with threads computing the
      dot-product in parallel. Because inputs are transposed to have channel dimension
      last, the dot-product memory access pattern is perfectly coalesced, leading to
      excellent performance. This is true across both forward and backward kernels.
      Co-authored-by: default avatarMauro Bisson <maurob@nvidia.com>
      Co-authored-by: default avatarMax Rietmann <mrietmann@nvidia.com>
      Co-authored-by: default avatarAndrea Paris <aparis@ethz.ch>
      5f051c97
  16. 26 May, 2025 1 commit
  17. 24 May, 2025 5 commits
  18. 08 May, 2025 1 commit
  19. 29 Apr, 2025 2 commits
  20. 26 Feb, 2025 1 commit
  21. 21 Feb, 2025 1 commit
  22. 21 Jan, 2025 1 commit