"src/libtorchaudio/sox/utils.cpp" did not exist on "463a8b2c83653ce01698f542e2ff07f4947dce7e"
Optimized BWD kernel with the same changes for FWD from commit 8cb399ee:
* Replaced PyTorch's slow permutation. * Split kernel into general and specialized versions (for num_channel <= 8192) * Enabled float4-based vectorized memory access, when possible. * Added runtime dispatch logic for kernel specialization. Aligned attention_fwd_cuda.cu with attention_bwd_cuda.cu in terms of naming conventions and kernel parameters. Extracted shared host/device functions and declarations into a separate module: * attention_utils.cuh * attention_utils.cu
Showing
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment