torch_harmonics/csrc/attention/attention_utils.cu · 9a463332e7d3a0b22dd66fa7a136164add2e67b9 · OpenDAS / torch-harmonics

"3rdparty/common-r22.12/src/model_config.cc" did not exist on "e38ee081a0495769e25766b894abe19bc8a6209e"

Optimized BWD kernel with the same changes for FWD from commit : · 9a463332

Mauro Bisson authored Jul 10, 2025

* Replaced PyTorch's slow permutation.
* Split kernel into general and specialized versions (for num_channel <= 8192)
* Enabled float4-based vectorized memory access, when possible.
* Added runtime dispatch logic for kernel specialization.

Aligned attention_fwd_cuda.cu with attention_bwd_cuda.cu in terms of naming conventions and kernel parameters.

Extracted shared host/device functions and declarations into a separate module:
* attention_utils.cuh
* attention_utils.cu

9a463332

attention_utils.cu 12.5 KB

Replace attention_utils.cu