Fixes deform_conv issue with large input/output (#4351)

* WIP on fixing index overflow issue * Fixed backward pass for large num_kernels * Fixed clang formatting * Fixed GET_BLOCKS int/int64_t types issue Co-authored-by: vfdev-5 <vfdev-5@gmail.com> Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

Fixes deform_conv issue with large input/output (#4351)
* WIP on fixing index overflow issue * Fixed backward pass for large num_kernels * Fixed clang formatting * Fixed GET_BLOCKS int/int64_t types issue Co-authored-by: vfdev-5 <vfdev-5@gmail.com> Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
6ce278bb · vfdev · GitHub · d9e6d60f · 6ce278bb · 6ce278bb
Unverified Commit 6ce278bb authored Sep 06, 2021 by vfdev Committed by GitHub Sep 06, 2021
Showing with 325 additions and 208 deletions

torchvision/csrc/ops/cuda/cuda_helpers.h torchvision/csrc/ops/cuda/cuda_helpers.h +4 -2

torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu +321 -206

No files found.
--- a/torchvision/csrc/ops/cuda/cuda_helpers.h
+++ b/torchvision/csrc/ops/cuda/cuda_helpers.h
@@ -3,10 +3,12 @@
 namespace vision {
 namespace ops {
-#define CUDA_1D_KERNEL_LOOP(i, n)                                \
+#define CUDA_1D_KERNEL_LOOP_T(i, n, index_t)                         \
-  for (int i = (blockIdx.x * blockDim.x) + threadIdx.x; i < (n); \
+  for (index_t i = (blockIdx.x * blockDim.x) + threadIdx.x; i < (n); \
       i += (blockDim.x * gridDim.x))
+#define CUDA_1D_KERNEL_LOOP(i, n) CUDA_1D_KERNEL_LOOP_T(i, n, int)
 template <typename integer>
 constexpr __host__ __device__ inline integer ceil_div(integer n, integer m) {
  return (n + m - 1) / m;

--- a/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu
+++ b/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu