[Feature] Add several MLU ops (#1563)

* [Feature] Add roiaware pool3d ops from mmdet3d (#1382) * add ops (roiaware pool3d) in mmdet3d * refactor code * fix typo Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Add iou3d op from mmdet3d (#1356) * add ops (iou3d) in mmdet3d * add unit test * refactor code * refactor code * refactor code * refactor code * refactor code Co-authored-by: zhouzaida <zhouzaida@163.com> * [Fix] Update test data for test_iou3d (#1427) * Update test data for test_iou3d * delete blank lines Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Feature] Add group points ops from mmdet3d (#1415) * add op (group points) and its related ops (ball query and knn) in mmdet3d * refactor code * fix typo * refactor code * fix typo * refactor code * make input contiguous Co-authored-by: zhouzaida <zhouzaida@163.com> * add mmdet3d op (#1425) Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Loading objects from different backends and dumping objects to different backends (#1330) * [Feature] Choose storage backend by the prefix of filepath * refactor FileClient and add unittest * support loading from different backends * polish docstring * fix unittet * rename attribute str_like_obj to is_str_like_obj * add infer_client method * add check_exist method * rename var client to file_client * polish docstring * add join_paths method * remove join_paths and add _format_path * enhance unittest * refactor unittest * singleton pattern * fix test_clientio.py * deprecate CephBackend * enhance docstring * refactor unittest for petrel * refactor unittest for disk backend * update io.md * add concat_paths method * improve docstring * improve docstring * add isdir and copyfile for file backend * delete copyfile and add get_local_path * remove isdir method of petrel * fix typo * add comment and polish docstring * polish docstring * rename _path_mapping to _map_path * polish docstring and fix typo * refactor get_local_path * add list_dir_or_file for FileClient * add list_dir_or_file for PetrelBackend * fix windows ci * Add return docstring * polish docstring * fix typo * fix typo * deprecate the conversion from Path to str * add docs for loading checkpoints with FileClient * refactor map_path * add _ensure_methods to ensure methods have been implemented * fix list_dir_or_file * rename _ensure_method_implemented to has_method * Add CI for pytorch 1.10 (#1431) * [Feature] Upload checkpoints and logs to ceph (#1375) * [Feature] Choose storage backend by the prefix of filepath * refactor FileClient and add unittest * support loading from different backends * polish docstring * fix unittet * rename attribute str_like_obj to is_str_like_obj * [Docs] Upload checkpoint to petrel oss * add infer_client method * Support uploading checkpoint to petrel oss * add check_exist method * refactor CheckpointHook * support uploading logs to ceph * rename var client to file_client * polish docstring * enhance load_from_ceph * refactor load_from_ceph * refactor TextLoggerHook * change the meaning of out_dir argument * fix test_checkpoint_hook.py * add join_paths method * remove join_paths and add _format_path * enhance unittest * refactor unittest * add a unittest for EvalHook when file backend is petrel * singleton pattern * fix test_clientio.py * deprecate CephBackend * add warning in load_from_ceph * fix type of out_suffix * enhance docstring * refactor unittest for petrel * refactor unittest for disk backend * update io.md * add concat_paths method * fix CI * mock check_exist * improve docstring * improve docstring * improve docstring * improve docstring * add isdir and copyfile for file backend * delete copyfile and add get_local_path * remove isdir method of petrel * fix typo * rename check_exists to exists * refactor code and polish docstring * fix windows ci * add comment and polish docstring * polish docstring * polish docstring * rename _path_mapping to _map_path * polish docstring and fix typo * refactor get_local_path * add list_dir_or_file for FileClient * add list_dir_or_file for PetrelBackend * fix windows ci * Add return docstring * polish docstring * fix typo * fix typo * fix typo * fix error when mocking PetrelBackend * deprecate the conversion from Path to str * add docs for loading checkpoints with FileClient * rename keep_log to keep_local * refactor map_path * add _ensure_methods to ensure methods have been implemented * fix list_dir_or_file * rename _ensure_method_implemented to has_method * refactor * polish information * format information * bump version to v1.3.16 (#1430) * [Fix]: Update test data of test_tin_shift (#1426) * Update test data of test_tin_shift * Delete tmp.engine * add pytest raises asserterror test * raise valueerror, update test log * add more comment * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * fix the wrong function reference bug in BaseTransformerLayer when batch_first is True (#1418) * [Docs] Add mmcv itself in the docs list (#1441) * Add mmcv itself in the docs list * modify link of docs * [Improve] improve checkpoint loading log (#1446) * [Feature] Support SigmoidFocalLoss with Cambricon MLU backend (#1346) * [Feature] Support SigmoidFocalLoss with Cambricon MLU backend * refactor MMCV_WITH_MLU macro define * refactor NFU_ALIGN_SIZE, PAD_DOWN and split_pipeline_num * delete extra fool proofing in cpp * [Feature] Support SigmoidFocalLossBackward with Cambricon MLU backend * fix macro definition in SigmoidFocalLoss * refactor mlu files into clang-format * refactor sigmoid focal loss test * refactor Sigmoid Focal Loss file structure. * fix python lint error * fix import torch_mlu error type * fix lint * refactor clang format style to google Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Support RoiAlign With Cambricon MLU Backend (#1429) * [Feature] Support NMS with cambricon MLU backend (#1467) * [Feature] Support BBoxOverlaps with cambricon MLU backend (#1507) * [Refactor] Format C++ code * [Refactor] include common_mlu_helper in pytorch_mlu_helper and refactor build condition * [Improve] Improve the performance of roialign, nms and focalloss with MLU backend (#1572) * [Improve] Improve the performance of roialign with MLU backend * replace CHECK_MLU with CHECK_MLU_INPUT * [Improve] Improve the perf of nms and focallosssigmoid with MLU backend * [Improve] Improve the performance of roialign with MLU backend (#1741) * [Feature] Support tin_shift with cambricon MLU backend (#1696) * [Feature] Support tin_shift with cambricon MLU backend * [fix] Add the assertion of batch_size in tin_shift.py * [fix] fix the param check of tin_shift in cambricon code * [fix] Fix lint failure. * [fix] Fix source file lint failure. * Update mmcv/ops/tin_shift.py [Refactor] Modify the code in mmcv/ops/tin_shift.py. Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: budefei <budefei@cambricom.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * resolve conflicts and fix lint * fix mmcv.utils.__init__ * fix mmcv.utils.__init__ * Fix lints and change FLAG * fix setup and refine * remove a redundant line * remove an unnecessary 'f' * fix compilation error Co-authored-by: dingchang <hudingchang.vendor@sensetime.com> Co-authored-by: zhouzaida <zhouzaida@163.com> Co-authored-by: q.yao <yaoqian@sensetime.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: pc <luopeichao@sensetime.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: q.yao <streetyao@live.com> Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: Yuxin Liu <liuyuxin@cambricon.com> Co-authored-by: zihanchang11 <92860914+zihanchang11@users.noreply.github.com> Co-authored-by: shlrao <shenglong.rao@gmail.com> Co-authored-by: zhouchenyang <zcy19950525@gmail.com> Co-authored-by: Mrxiaofei <36697723+Mrxiaofei@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: budefei <budefei@cambricom.com>

[Feature] Add several MLU ops (#1563)
* [Feature] Add roiaware pool3d ops from mmdet3d (#1382) * add ops (roiaware pool3d) in mmdet3d * refactor code * fix typo Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Add iou3d op from mmdet3d (#1356) * add ops (iou3d) in mmdet3d * add unit test * refactor code * refactor code * refactor code * refactor code * refactor code Co-authored-by: zhouzaida <zhouzaida@163.com> * [Fix] Update test data for test_iou3d (#1427) * Update test data for test_iou3d * delete blank lines Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Feature] Add group points ops from mmdet3d (#1415) * add op (group points) and its related ops (ball query and knn) in mmdet3d * refactor code * fix typo * refactor code * fix typo * refactor code * make input contiguous Co-authored-by: zhouzaida <zhouzaida@163.com> * add mmdet3d op (#1425) Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Loading objects from different backends and dumping objects to different backends (#1330) * [Feature] Choose storage backend by the prefix of filepath * refactor FileClient and add unittest * support loading from different backends * polish docstring * fix unittet * rename attribute str_like_obj to is_str_like_obj * add infer_client method * add check_exist method * rename var client to file_client * polish docstring * add join_paths method * remove join_paths and add _format_path * enhance unittest * refactor unittest * singleton pattern * fix test_clientio.py * deprecate CephBackend * enhance docstring * refactor unittest for petrel * refactor unittest for disk backend * update io.md * add concat_paths method * improve docstring * improve docstring * add isdir and copyfile for file backend * delete copyfile and add get_local_path * remove isdir method of petrel * fix typo * add comment and polish docstring * polish docstring * rename _path_mapping to _map_path * polish docstring and fix typo * refactor get_local_path * add list_dir_or_file for FileClient * add list_dir_or_file for PetrelBackend * fix windows ci * Add return docstring * polish docstring * fix typo * fix typo * deprecate the conversion from Path to str * add docs for loading checkpoints with FileClient * refactor map_path * add _ensure_methods to ensure methods have been implemented * fix list_dir_or_file * rename _ensure_method_implemented to has_method * Add CI for pytorch 1.10 (#1431) * [Feature] Upload checkpoints and logs to ceph (#1375) * [Feature] Choose storage backend by the prefix of filepath * refactor FileClient and add unittest * support loading from different backends * polish docstring * fix unittet * rename attribute str_like_obj to is_str_like_obj * [Docs] Upload checkpoint to petrel oss * add infer_client method * Support uploading checkpoint to petrel oss * add check_exist method * refactor CheckpointHook * support uploading logs to ceph * rename var client to file_client * polish docstring * enhance load_from_ceph * refactor load_from_ceph * refactor TextLoggerHook * change the meaning of out_dir argument * fix test_checkpoint_hook.py * add join_paths method * remove join_paths and add _format_path * enhance unittest * refactor unittest * add a unittest for EvalHook when file backend is petrel * singleton pattern * fix test_clientio.py * deprecate CephBackend * add warning in load_from_ceph * fix type of out_suffix * enhance docstring * refactor unittest for petrel * refactor unittest for disk backend * update io.md * add concat_paths method * fix CI * mock check_exist * improve docstring * improve docstring * improve docstring * improve docstring * add isdir and copyfile for file backend * delete copyfile and add get_local_path * remove isdir method of petrel * fix typo * rename check_exists to exists * refactor code and polish docstring * fix windows ci * add comment and polish docstring * polish docstring * polish docstring * rename _path_mapping to _map_path * polish docstring and fix typo * refactor get_local_path * add list_dir_or_file for FileClient * add list_dir_or_file for PetrelBackend * fix windows ci * Add return docstring * polish docstring * fix typo * fix typo * fix typo * fix error when mocking PetrelBackend * deprecate the conversion from Path to str * add docs for loading checkpoints with FileClient * rename keep_log to keep_local * refactor map_path * add _ensure_methods to ensure methods have been implemented * fix list_dir_or_file * rename _ensure_method_implemented to has_method * refactor * polish information * format information * bump version to v1.3.16 (#1430) * [Fix]: Update test data of test_tin_shift (#1426) * Update test data of test_tin_shift * Delete tmp.engine * add pytest raises asserterror test * raise valueerror, update test log * add more comment * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * fix the wrong function reference bug in BaseTransformerLayer when batch_first is True (#1418) * [Docs] Add mmcv itself in the docs list (#1441) * Add mmcv itself in the docs list * modify link of docs * [Improve] improve checkpoint loading log (#1446) * [Feature] Support SigmoidFocalLoss with Cambricon MLU backend (#1346) * [Feature] Support SigmoidFocalLoss with Cambricon MLU backend * refactor MMCV_WITH_MLU macro define * refactor NFU_ALIGN_SIZE, PAD_DOWN and split_pipeline_num * delete extra fool proofing in cpp * [Feature] Support SigmoidFocalLossBackward with Cambricon MLU backend * fix macro definition in SigmoidFocalLoss * refactor mlu files into clang-format * refactor sigmoid focal loss test * refactor Sigmoid Focal Loss file structure. * fix python lint error * fix import torch_mlu error type * fix lint * refactor clang format style to google Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Support RoiAlign With Cambricon MLU Backend (#1429) * [Feature] Support NMS with cambricon MLU backend (#1467) * [Feature] Support BBoxOverlaps with cambricon MLU backend (#1507) * [Refactor] Format C++ code * [Refactor] include common_mlu_helper in pytorch_mlu_helper and refactor build condition * [Improve] Improve the performance of roialign, nms and focalloss with MLU backend (#1572) * [Improve] Improve the performance of roialign with MLU backend * replace CHECK_MLU with CHECK_MLU_INPUT * [Improve] Improve the perf of nms and focallosssigmoid with MLU backend * [Improve] Improve the performance of roialign with MLU backend (#1741) * [Feature] Support tin_shift with cambricon MLU backend (#1696) * [Feature] Support tin_shift with cambricon MLU backend * [fix] Add the assertion of batch_size in tin_shift.py * [fix] fix the param check of tin_shift in cambricon code * [fix] Fix lint failure. * [fix] Fix source file lint failure. * Update mmcv/ops/tin_shift.py [Refactor] Modify the code in mmcv/ops/tin_shift.py. Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: budefei <budefei@cambricom.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * resolve conflicts and fix lint * fix mmcv.utils.__init__ * fix mmcv.utils.__init__ * Fix lints and change FLAG * fix setup and refine * remove a redundant line * remove an unnecessary 'f' * fix compilation error Co-authored-by: dingchang <hudingchang.vendor@sensetime.com> Co-authored-by: zhouzaida <zhouzaida@163.com> Co-authored-by: q.yao <yaoqian@sensetime.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: pc <luopeichao@sensetime.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: q.yao <streetyao@live.com> Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: Yuxin Liu <liuyuxin@cambricon.com> Co-authored-by: zihanchang11 <92860914+zihanchang11@users.noreply.github.com> Co-authored-by: shlrao <shenglong.rao@gmail.com> Co-authored-by: zhouchenyang <zcy19950525@gmail.com> Co-authored-by: Mrxiaofei <36697723+Mrxiaofei@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: budefei <budefei@cambricom.com>
362a90f8 · Jiazhen Wang · GitHub · 95273020 · 362a90f8 · 362a90f8
Unverified Commit 362a90f8 authored Apr 16, 2022 by Jiazhen Wang Committed by GitHub Apr 16, 2022
6 changed files
--- a/tests/test_device/test_mlu/test_mlu_parallel.py
+++ b/tests/test_device/test_mlu/test_mlu_parallel.py
@@ -5,7 +5,8 @@ import pytest
 import torch
 import torch.nn as nn
-from mmcv.device.mlu import IS_MLU, MLUDataParallel, MLUDistributedDataParallel
+from mmcv.device.mlu import (IS_MLU_AVAILABLE, MLUDataParallel,
+                             MLUDistributedDataParallel)
 from mmcv.device.mlu._functions import Scatter, scatter
 from mmcv.parallel import is_module_wrapper
@@ -31,7 +32,7 @@ def test_is_module_wrapper():
    model = Model()
    assert not is_module_wrapper(model)
-    if IS_MLU:
+    if IS_MLU_AVAILABLE:
        mludp = MLUDataParallel(model)
        assert is_module_wrapper(mludp)
@@ -51,7 +52,7 @@ def test_scatter():
        assert torch.allclose(input, output)
    # if the device is MLU, copy the input from CPU to MLU
-    if IS_MLU:
+    if IS_MLU_AVAILABLE:
        input = torch.zeros([1, 3, 3, 3])
        output = scatter(input=input, devices=[0])
        assert torch.allclose(input.to('mlu'), output)
@@ -82,7 +83,7 @@ def test_Scatter():
        assert torch.allclose(input, output)
    # if the device is MLU, copy the input from CPU to MLU
-    if IS_MLU:
+    if IS_MLU_AVAILABLE:
        target_mlus = [0]
        input = torch.zeros([1, 3, 3, 3])
        outputs = Scatter.forward(target_mlus, input)

--- a/tests/test_ops/test_bbox.py
+++ b/tests/test_ops/test_bbox.py
@@ -3,41 +3,60 @@ import numpy as np
 import pytest
 import torch
+from mmcv.device.mlu import IS_MLU_AVAILABLE
+from mmcv.utils import IS_CUDA_AVAILABLE
-@pytest.mark.skipif(
-    not torch.cuda.is_available(), reason='requires CUDA support')
-class TestBBox(object):
-    def _test_bbox_overlaps(self, dtype=torch.float):
+class TestBBox(object):
+    def _test_bbox_overlaps(self, device, dtype=torch.float):
        from mmcv.ops import bbox_overlaps
        b1 = torch.tensor([[1.0, 1.0, 3.0, 4.0], [2.0, 2.0, 3.0, 4.0],
-                           [7.0, 7.0, 8.0, 8.0]]).cuda().type(dtype)
+                           [7.0, 7.0, 8.0, 8.0]]).to(device).type(dtype)
        b2 = torch.tensor([[0.0, 2.0, 2.0, 5.0], [2.0, 1.0, 3.0,
-                                                  3.0]]).cuda().type(dtype)
+                                                  3.0]]).to(device).type(dtype)
        should_output = np.array([[0.33333334, 0.5], [0.2, 0.5], [0.0, 0.0]])
        out = bbox_overlaps(b1, b2, offset=1)
        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)
        b1 = torch.tensor([[1.0, 1.0, 3.0, 4.0], [2.0, 2.0, 3.0,
-                                                  4.0]]).cuda().type(dtype)
+                                                  4.0]]).to(device).type(dtype)
        b2 = torch.tensor([[0.0, 2.0, 2.0, 5.0], [2.0, 1.0, 3.0,
-                                                  3.0]]).cuda().type(dtype)
+                                                  3.0]]).to(device).type(dtype)
        should_output = np.array([0.33333334, 0.5])
        out = bbox_overlaps(b1, b2, aligned=True, offset=1)
        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)
-        b1 = torch.tensor([[0.0, 0.0, 3.0, 3.0]]).cuda().type(dtype)
+        b1 = torch.tensor([[0.0, 0.0, 3.0, 3.0]]).to(device).type(dtype)
-        b1 = torch.tensor([[0.0, 0.0, 3.0, 3.0]]).cuda().type(dtype)
        b2 = torch.tensor([[4.0, 0.0, 5.0, 3.0], [3.0, 0.0, 4.0, 3.0],
                           [2.0, 0.0, 3.0, 3.0], [1.0, 0.0, 2.0,
-                                                  3.0]]).cuda().type(dtype)
+                                                  3.0]]).to(device).type(dtype)
        should_output = np.array([0, 0.2, 0.5, 0.5])
        out = bbox_overlaps(b1, b2, offset=1)
        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)
-    def test_bbox_overlaps_float(self):
+    @pytest.mark.parametrize('device', [
-        self._test_bbox_overlaps(torch.float)
+        pytest.param(
+            'cuda',
-    def test_bbox_overlaps_half(self):
+            marks=pytest.mark.skipif(
-        self._test_bbox_overlaps(torch.half)
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_bbox_overlaps_float(self, device):
+        self._test_bbox_overlaps(device, dtype=torch.float)
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_bbox_overlaps_half(self, device):
+        self._test_bbox_overlaps(device, dtype=torch.half)
--- a/tests/test_ops/test_focal_loss.py
+++ b/tests/test_ops/test_focal_loss.py
 # Copyright (c) OpenMMLab. All rights reserved.
 import numpy as np
+import pytest
 import torch
+from mmcv.device.mlu import IS_MLU_AVAILABLE
+from mmcv.utils import IS_CUDA_AVAILABLE
 _USING_PARROTS = True
 try:
    from parrots.autograd import gradcheck
@@ -57,9 +61,7 @@ class Testfocalloss(object):
            assert np.allclose(loss.data.cpu().numpy(), output[0], 1e-2)
            assert np.allclose(x.grad.data.cpu(), np_x_grad, 1e-2)
-    def _test_sigmoid(self, dtype=torch.float):
+    def _test_sigmoid(self, device, dtype=torch.float):
-        if not torch.cuda.is_available():
-            return
        from mmcv.ops import sigmoid_focal_loss
        alpha = 0.25
        gamma = 2.0
@@ -68,9 +70,9 @@ class Testfocalloss(object):
            np_y = np.array(case[1])
            np_x_grad = np.array(output[1])
-            x = torch.from_numpy(np_x).cuda().type(dtype)
+            x = torch.from_numpy(np_x).to(device).type(dtype)
            x.requires_grad_()
-            y = torch.from_numpy(np_y).cuda().long()
+            y = torch.from_numpy(np_y).to(device).long()
            loss = sigmoid_focal_loss(x, y, gamma, alpha, None, 'mean')
            loss.backward()
@@ -128,11 +130,31 @@ class Testfocalloss(object):
    def test_softmax_half(self):
        self._test_softmax(dtype=torch.half)
-    def test_sigmoid_float(self):
+    @pytest.mark.parametrize('device', [
-        self._test_sigmoid(dtype=torch.float)
+        pytest.param(
+            'cuda',
-    def test_sigmoid_half(self):
+            marks=pytest.mark.skipif(
-        self._test_sigmoid(dtype=torch.half)
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_sigmoid_float(self, device):
+        self._test_sigmoid(device=device, dtype=torch.float)
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_sigmoid_half(self, device):
+        self._test_sigmoid(device, dtype=torch.half)
    def test_grad_softmax_float(self):
        self._test_grad_softmax(dtype=torch.float)

--- a/tests/test_ops/test_nms.py
+++ b/tests/test_ops/test_nms.py
@@ -3,12 +3,23 @@ import numpy as np
 import pytest
 import torch
+from mmcv.device.mlu import IS_MLU_AVAILABLE
+from mmcv.utils import IS_CUDA_AVAILABLE
 class Testnms(object):
-    def test_nms_allclose(self):
+    @pytest.mark.parametrize('device', [
-        if not torch.cuda.is_available():
+        pytest.param(
-            return
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_nms_allclose(self, device):
        from mmcv.ops import nms
        np_boxes = np.array([[6.0, 3.0, 8.0, 7.0], [3.0, 6.0, 9.0, 11.0],
                             [3.0, 7.0, 10.0, 12.0], [1.0, 4.0, 13.0, 7.0]],
@@ -24,7 +35,7 @@ class Testnms(object):
        assert np.allclose(dets, np_dets)  # test cpu
        assert np.allclose(inds, np_inds)  # test cpu
        dets, inds = nms(
-            boxes.cuda(), scores.cuda(), iou_threshold=0.3, offset=0)
+            boxes.to(device), scores.to(device), iou_threshold=0.3, offset=0)
        assert np.allclose(dets.cpu().numpy(), np_dets)  # test gpu
        assert np.allclose(inds.cpu().numpy(), np_inds)  # test gpu

--- a/tests/test_ops/test_roi_align.py
+++ b/tests/test_ops/test_roi_align.py
@@ -3,6 +3,9 @@ import numpy as np
 import pytest
 import torch
+from mmcv.device.mlu import IS_MLU_AVAILABLE
+from mmcv.utils import IS_CUDA_AVAILABLE
 _USING_PARROTS = True
 try:
    from parrots.autograd import gradcheck
@@ -11,6 +14,7 @@ except ImportError:
    _USING_PARROTS = False
 # yapf:disable
 inputs = [([[[[1., 2.], [3., 4.]]]],
           [[0., 0., 0., 1., 1.]]),
          ([[[[1., 2.], [3., 4.]],
@@ -39,8 +43,6 @@ sampling_ratio = 2
 def _test_roialign_gradcheck(device, dtype):
-    if not torch.cuda.is_available() and device == 'cuda':
-        pytest.skip('test requires GPU')
    try:
        from mmcv.ops import RoIAlign
    except ModuleNotFoundError:
@@ -65,8 +67,6 @@ def _test_roialign_gradcheck(device, dtype):
 def _test_roialign_allclose(device, dtype):
-    if not torch.cuda.is_available() and device == 'cuda':
-        pytest.skip('test requires GPU')
    try:
        from mmcv.ops import roi_align
    except ModuleNotFoundError:
@@ -75,7 +75,6 @@ def _test_roialign_allclose(device, dtype):
    pool_w = 2
    spatial_scale = 1.0
    sampling_ratio = 2
    for case, output in zip(inputs, outputs):
        np_input = np.array(case[0])
        np_rois = np.array(case[1])
@@ -95,8 +94,26 @@ def _test_roialign_allclose(device, dtype):
            x.grad.data.type(torch.float).cpu().numpy(), np_grad, atol=1e-3)
-@pytest.mark.parametrize('device', ['cuda', 'cpu'])
+@pytest.mark.parametrize('device', [
-@pytest.mark.parametrize('dtype', [torch.float, torch.double, torch.half])
+    'cpu',
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float,
+    pytest.param(
+        torch.double,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE,
+            reason='MLU does not support for 64-bit floating point')),
+    torch.half
+])
 def test_roialign(device, dtype):
    # check double only
    if dtype is torch.double:

--- a/tests/test_ops/test_tin_shift.py
+++ b/tests/test_ops/test_tin_shift.py
@@ -5,6 +5,9 @@ import numpy as np
 import pytest
 import torch
+from mmcv.device.mlu import IS_MLU_AVAILABLE
+from mmcv.utils import IS_CUDA_AVAILABLE
 _USING_PARROTS = True
 try:
    from parrots.autograd import gradcheck
@@ -131,7 +134,7 @@ grads = [
 ]
-def _test_tinshift_gradcheck(dtype):
+def _test_tinshift_gradcheck(device, dtype):
    try:
        from mmcv.ops import tin_shift
    except ModuleNotFoundError:
@@ -145,15 +148,15 @@ def _test_tinshift_gradcheck(dtype):
        np_shift = np.array(shift)
        x = torch.tensor(
-            np_input, dtype=dtype, device='cuda', requires_grad=True)
+            np_input, dtype=dtype, device=device, requires_grad=True)
-        shift = torch.tensor(np_shift, device='cuda').int()
+        shift = torch.tensor(np_shift, device=device).int()
        if torch.__version__ == 'parrots':
            gradcheck(tin_shift, (x, shift))
        else:
            gradcheck(tin_shift, (x, shift), atol=1, rtol=0.1)
-def _test_tinshift_allclose(dtype):
+def _test_tinshift_allclose(device, dtype):
    try:
        from mmcv.ops import tin_shift
    except ModuleNotFoundError:
@@ -166,8 +169,8 @@ def _test_tinshift_allclose(dtype):
        np_grad = np.array(grad)
        x = torch.tensor(
-            np_input, dtype=dtype, device='cuda', requires_grad=True)
+            np_input, dtype=dtype, device=device, requires_grad=True)
-        shift = torch.tensor(np_shift, device='cuda').int()
+        shift = torch.tensor(np_shift, device=device).int()
        output = tin_shift(x, shift)
        output.backward(torch.ones_like(output))
@@ -177,28 +180,48 @@ def _test_tinshift_allclose(dtype):
            x.grad.data.type(torch.float).cpu().numpy(), np_grad, 1e-3)
-def _test_tinshift_assert(dtype):
+def _test_tinshift_assert(device, dtype):
    try:
        from mmcv.ops import tin_shift
    except ModuleNotFoundError:
        pytest.skip('TINShift op is not successfully compiled')
-    inputs = [torch.rand(2, 3, 4, 2), torch.rand(2, 3, 4, 2)]
+    inputs = [
+        torch.rand(2, 3, 4, 2),
+        torch.rand(2, 3, 4, 2),
+        torch.rand(1, 3, 4, 2)
+    ]
    shifts = [torch.rand(2, 3), torch.rand(2, 5)]
    for x, shift in zip(inputs, shifts):
-        x = x.cuda()
+        x = x.to(device).type(dtype)
-        shift = shift.cuda()
+        shift = shift.to(device).type(dtype)
        # A ValueError should be raised if ops get inputs with wrong shapes.
        with pytest.raises(ValueError):
            tin_shift(x, shift)
-@pytest.mark.skipif(
+@pytest.mark.parametrize('device', [
-    not torch.cuda.is_available(), reason='requires CUDA support')
+    pytest.param(
-@pytest.mark.parametrize('dtype', [torch.float, torch.double, torch.half])
+        'cuda',
-def test_tinshift(dtype):
+        marks=pytest.mark.skipif(
-    _test_tinshift_allclose(dtype=dtype)
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
-    _test_tinshift_gradcheck(dtype=dtype)
+    pytest.param(
-    _test_tinshift_assert(dtype=dtype)
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float,
+    pytest.param(
+        torch.double,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE,
+            reason='MLU does not support for 64-bit floating point')),
+    torch.half
+])
+def test_tinshift(device, dtype):
+    _test_tinshift_allclose(device=device, dtype=dtype)
+    _test_tinshift_gradcheck(device=device, dtype=dtype)
+    _test_tinshift_assert(device=device, dtype=dtype)