Commits · cfbf5f807d27f5563614c9917c1e6782f82a7c9c · OpenDAS / torch-harmonics

17 Jun, 2025 4 commits
- doubling the indent to 4 · cfbf5f80
  Thorsten Kurth authored Jun 17, 2025
  
  cfbf5f80
- formatting · 4805b39c
  Thorsten Kurth authored Jun 13, 2025
  
  4805b39c
- fixing clang formatting files · 5eaa7f79
  Thorsten Kurth authored Jun 13, 2025
  
  5eaa7f79
- adding clang formatter · 3011eb1c
  Thorsten Kurth authored Jun 13, 2025
  
  3011eb1c
13 Jun, 2025 7 commits
- Merge pull request #78 from NVIDIA/tkurth/attention-perf-test-fix · 584e1bd6
  Thorsten Kurth authored Jun 13, 2025
```
fixing attention perf test attempt 1
```
  584e1bd6
- fixing attention perf test attempt 1 · 47beb41a
  Thorsten Kurth authored Jun 13, 2025
  
  47beb41a
- Merge pull request #77 from rietmann-nv/mr/bwd-channel-permute-experiments · 26ce5cb5
  Thorsten Kurth authored Jun 13, 2025
```
Optimized CUDA kernels for S2 Attention (forward and backward)
```
  26ce5cb5
- adjusted perf test shapes · 79fa6ad9
  Thorsten Kurth authored Jun 13, 2025
  
  79fa6ad9
- Merge branch 'mr/bwd-channel-permute-experiments' of... · ec413b4d
  Thorsten Kurth authored Jun 13, 2025
```
Merge branch 'mr/bwd-channel-permute-experiments' of https://github.com/rietmann-nv/torch-harmonics into mr/bwd-channel-permute-experiments
```
  ec413b4d
- Changed to qdotk_max single loop torch reference kernel · 37b08bb8
  Max Rietmann authored Jun 13, 2025
  
  37b08bb8
- streamlining perf test · 1a47fa08
  Thorsten Kurth authored Jun 13, 2025
  
  1a47fa08
11 Jun, 2025 3 commits
- Update copyright · a07c5b2b
  Max Rietmann authored Jun 11, 2025
  
  a07c5b2b
- Removed unnecessary code in fwd and bwd kernels. · 3d06f4da
  Max Rietmann authored Jun 11, 2025
```
Also: Made fwd kernel use modified memory layout with standard shape
```
  3d06f4da
- Removed all stale backwards kernel code · 6512d042
  Max Rietmann authored Jun 11, 2025
```
Also match the gradient output to the input, in terms of memory layout
```
  6512d042
06 Jun, 2025 1 commit

Optimizations for backward kernel: moved qy to shared, memory layout · 4096e64b

Max Rietmann authored Jun 06, 2025

Detect memory layout (B,C,H,W) (stride for C should be 1, if not, fix it)

This ensures that the backwards kernel is fast

4096e64b

04 Jun, 2025 1 commit

Moved permute out of bwd kernel & qy shared cache · b62c420f

Max Rietmann authored Jun 04, 2025

putting qy in shared is a little faster

Changing internal memory layout means we can leave code in standard shape and
only change layout external to kernel

b62c420f

02 Jun, 2025 1 commit

Optimized CUDA kernels for improved backward gradient computation · 5f051c97

Max Rietmann authored Jun 02, 2025



Introduce new CUDA kernels, `s2_attention_bwd_dkvq_kernel_mbT` and
`s2_attention_kernel_mbT`, for more efficient computation of backward gradients
and forward attention respectively. These changes optimize memory access
patterns and employ coalesced operations by leveraging tensor transpositions.

Forward kernel written by Mauro Bisson
Backwards kernel written by Andrea Paris (aparis@ethz.ch) and Max Rietmann

Parallelization strategy computes 1 output per Warp, with threads computing the
dot-product in parallel. Because inputs are transposed to have channel dimension
last, the dot-product memory access pattern is perfectly coalesced, leading to
excellent performance. This is true across both forward and backward kernels.
Co-authored-by: Mauro Bisson <maurob@nvidia.com>
Co-authored-by: Max Rietmann <mrietmann@nvidia.com>
Co-authored-by: Andrea Paris <aparis@ethz.ch>

5f051c97

26 May, 2025 1 commit
- fixing distributed resampling routine (#74) · 318fc76e
  Thorsten Kurth authored May 26, 2025
  
  318fc76e
24 May, 2025 7 commits
- adding more unittests to CI · 18f2c1cc
  Boris Bonev authored May 24, 2025
  
  18f2c1cc
- fixing bug in quadrature weights for full attention. Adding better unit tests... · 4350ba9f
  Boris Bonev authored May 23, 2025
```
fixing bug in quadrature weights for full attention. Adding better unit tests for attention. Cleanup in the cuda code.
```
  4350ba9f
- more header corrections · b6c48457
  Boris Bonev authored May 22, 2025
  
  b6c48457
- adapting header files · ae8257b5
  Boris Bonev authored May 22, 2025
  
  ae8257b5
- cleaning up imports · 13d6130e
  Boris Bonev authored May 22, 2025
  
  13d6130e
- minor cleanup · 7d8d8eb4
  Boris Bonev authored May 22, 2025
  
  7d8d8eb4
- adding spherical attention · 6a845fd3
  Boris Bonev authored May 22, 2025
  
  6a845fd3
08 May, 2025 1 commit
- Setting imaginary parts of DCT and Nyquist frequencies to 0 in IRFFT (#72) · b3816ebc
  Thorsten Kurth authored May 08, 2025
```
* setting imaginary parts of DCT and nyquist frequency to zero in IRSHT variants
```
  b3816ebc
29 Apr, 2025 2 commits
- Revert "setting imaginary parts of DCT and nyquist frequency to zero in IRSHT…" (#71) · dca116b5
  Boris Bonev authored Apr 29, 2025
```
This reverts commit 82881276.
```
  dca116b5
- setting imaginary parts of DCT and nyquist frequency to zero in IRSHT (#70) · 82881276
  Thorsten Kurth authored Apr 29, 2025
```
* setting imaginary parts of DCT and nyquist frequency to zero in IRSHT variants

* small fix

* making einsum result contiguous

* adding zero frequency to distributed sht
```
  82881276
26 Feb, 2025 1 commit

Tkurth/lobatto grid hotfix (#67) · 39a0e375

Thorsten Kurth authored Feb 26, 2025

* small hotfix for lobatto grid precomputation routine

* adding lobatto grid to tests

39a0e375

21 Feb, 2025 1 commit

Tkurth/torchification (#66) · 6730e5c1

Thorsten Kurth authored Feb 21, 2025

* adding caching

* replacing many numpy calls with torch calls

* bumping up version number to 0.7.6

6730e5c1

21 Jan, 2025 1 commit

Improved computation of Morlet filter basis (#65) · 780fd143

Boris Bonev authored Jan 21, 2025

* Improved computation of Morlet filter basis and switched to a Hann window.

* addresses #064 and some cleanup

780fd143

17 Jan, 2025 1 commit

Update README.md · 9eea871c

Mike McCann authored Jan 16, 2025

Without putting `signal` on `device`, we get `RuntimeError: Expected all tensors to be on the same device` when `sht` is called.

9eea871c

14 Jan, 2025 8 commits
- completing changelog · 41293968
  Boris Bonev authored Jan 14, 2025
  
  41293968
- formating changes to resample module · 8680e023
  Boris Bonev authored Jan 14, 2025
  
  8680e023
- updated changelog and bumping up version number · 4d8755b5
  Boris Bonev authored Jan 14, 2025
  
  4d8755b5
- switched psi tensor computation to double precision and implemented a fudge... · 15d0750c
  Boris Bonev authored Jan 14, 2025
```
switched psi tensor computation to double precision and implemented a fudge factor for theta_cutoff to avoid aliasing issues with the grid width
```
  15d0750c
- implemented slerp · 55bbcb25
  Thorsten Kurth authored Jan 14, 2025
  
  55bbcb25
- bugfix in distributed convolution · 87d9bfdc
  Boris Bonev authored Jan 13, 2025
  
  87d9bfdc
- adding distributed resampling and test routines · e5a9c4af
  Thorsten Kurth authored Jan 13, 2025
  
  e5a9c4af
- adding option in LSNO to select between upsampling methods · 3350099a
  Boris Bonev authored Jan 13, 2025
  
  3350099a