- 21 Jul, 2025 2 commits
-
-
Andrea Paris authored
-
apaaris authored
-
- 16 Jul, 2025 1 commit
-
-
Thorsten Kurth authored
-
- 08 Jul, 2025 1 commit
-
-
Thorsten Kurth authored
* refactoring disco backend code * removed get_psi as member function and instead put it in _disco_convolution * setting seeds in tests more consistently * parametrized test classes to ensure that tests are always run on both CPU and GPU (if available) * cleaning up
-
- 02 Jul, 2025 1 commit
-
- 13 Jun, 2025 3 commits
-
-
Thorsten Kurth authored
-
Thorsten Kurth authored
-
Thorsten Kurth authored
-
- 11 Jun, 2025 2 commits
-
-
Max Rietmann authored
-
Max Rietmann authored
Also match the gradient output to the input, in terms of memory layout
-
- 06 Jun, 2025 1 commit
-
-
Max Rietmann authored
Detect memory layout (B,C,H,W) (stride for C should be 1, if not, fix it) This ensures that the backwards kernel is fast
-
- 04 Jun, 2025 1 commit
-
-
Max Rietmann authored
putting qy in shared is a little faster Changing internal memory layout means we can leave code in standard shape and only change layout external to kernel
-
- 24 May, 2025 2 commits
-
-
Boris Bonev authored
fixing bug in quadrature weights for full attention. Adding better unit tests for attention. Cleanup in the cuda code.
-
Boris Bonev authored
-