[Feature] Support downloading datasets from OpenDataLab using mim (#2593)

* init commit * add dataset unzip scripts * polish docs * polish docs

[Feature] Support downloading datasets from OpenDataLab using mim (#2593)
* init commit * add dataset unzip scripts * polish docs * polish docs
148a0856 · Jingwei Zhang · GitHub · f40d8d28 · 148a0856 · 148a0856
Unverified Commit 148a0856 authored Jun 14, 2023 by Jingwei Zhang Committed by GitHub Jun 14, 2023
8 changed files
--- a/MANIFEST.in
+++ b/MANIFEST.in
 include mmdet3d/.mim/model-index.yml
+include mmdet3d/.mim/dataset-index.yml
 include requirements/*.txt
 recursive-include mmdet3d/.mim/ops *.cpp *.cu *.h *.cc
 recursive-include mmdet3d/.mim/configs *.py *.yml

--- a/dataset-index.yml
+++ b/dataset-index.yml
+kitti:
+  # The name of dataset in OpenDataLab referring to
+  # https://opendatalab.com/KITTI_Object/cli. You can also download it
+  # by running `odl get ${dataset}` independently
+  dataset: KITTI_Object
+  download_root: data
+  data_root: data/kitti
+  # Scripts for unzipping datasets
+  script: tools/dataset_converters/kitti_unzip.sh
+nuscenes:
+  # The name of dataset in OpenDataLab referring to
+  # https://opendatalab.com/nuScenes/cli. You can also download it
+  # by running `odl get ${dataset}` independently
+  dataset: nuScenes
+  download_root: data
+  data_root: data/nuscenes
+  # Scripts for unzipping datasets
+  script: tools/dataset_converters/nuscenes_unzip.sh
+semantickitti:
+  # The name of dataset in OpenDataLab referring to
+  # https://opendatalab.com/SemanticKITTI/cli. You can also download it
+  # by running `odl get ${dataset}` independently
+  dataset: SemanticKITTI
+  download_root: data
+  data_root: data/semantickitti
+  # Scripts for unzipping datasets
+  script: tools/dataset_converters/semantickitti_unzip.sh
--- a/docs/en/user_guides/dataset_prepare.md
+++ b/docs/en/user_guides/dataset_prepare.md
@@ -86,7 +86,20 @@ mmdetection3d
 ### KITTI
-Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Prepare KITTI data splits by running:
+1. Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Alternatively, you
+   can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The command scripts are the following:
+```bash
+# install OpenDataLab CLI tools
+pip install -U opendatalab
+# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
+pip install odl
+odl login
+# download and preprocess by MIM
+mim download mmdet3d --dataset kitti
+```
+2. Prepare KITTI data splits by running:
 ```bash
 mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
@@ -98,7 +111,7 @@ wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/sec
 wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
 ```
-Then generate info files by running:
+3. Generate info files by running:
 ```bash
 python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
@@ -160,7 +173,20 @@ Note that:
 ### NuScenes
-Download nuScenes V1.0 full dataset data [HERE](https://www.nuscenes.org/download). Prepare nuscenes data by running:
+1. Download nuScenes V1.0 full dataset data [HERE](https://www.nuscenes.org/download). Alternatively, you
+   can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
+```bash
+# install OpenDataLab CLI tools
+pip install -U opendatalab
+# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
+pip install odl
+odl login
+# download and preprocess by MIM
+mim download mmdet3d --dataset nuscenes
+```
+2. Prepare nuscenes data by running:
 ```bash
 python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
@@ -187,9 +213,20 @@ Note that we follow the original folder names for clear organization. Please ren
 ### SemanticKITTI
-Download SemanticKITTI dataset [HERE](http://semantic-kitti.org/dataset.html#download) and unzip all zip files.
+1. Download SemanticKITTI dataset [HERE](http://semantic-kitti.org/dataset.html#download) and unzip all zip files. Alternatively, you
+   can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
+```bash
+# install OpenDataLab CLI tools
+pip install -U opendatalab
+# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
+pip install odl
+odl login
+# download and preprocess by MIM
+mim download mmdet3d --dataset semantickitti
+```
-Then generate info files by running:
+2. Generate info files by running:
 ```bash
 python ./tools/create_data.py semantickitti --root-path ./data/semantickitti --out-dir ./data/semantickitti --extra-tag semantickitti

--- a/projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py
+++ b/projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py
@@ -127,7 +127,7 @@ train_pipeline = [
            'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx',
            'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation',
            'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix',
-            'lidar_aug_matrix'
+            'lidar_aug_matrix', 'num_pts_feats'
        ])
 ]
@@ -168,7 +168,7 @@ test_pipeline = [
        meta_keys=[
            'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar',
            'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx',
-            'lidar_path', 'img_path'
+            'lidar_path', 'img_path', 'num_pts_feats'
        ])
 ]

--- a/setup.py
+++ b/setup.py
@@ -158,7 +158,9 @@ def add_mim_extention():
    else:
        return
-    filenames = ['tools', 'configs', 'demo', 'model-index.yml']
+    filenames = [
+        'tools', 'configs', 'demo', 'model-index.yml', 'dataset-index.yml'
+    ]
    repo_path = osp.dirname(__file__)
    mim_path = osp.join(repo_path, 'mmdet3d', '.mim')
    os.makedirs(mim_path, exist_ok=True)

--- a/tools/dataset_converters/kitti_unzip.sh
+++ b/tools/dataset_converters/kitti_unzip.sh
+#!/usr/bin/env bash
+DOWNLOAD_DIR=$1  # The directory where the downloaded data set is stored
+DATA_ROOT=$2  # The root directory of the converted dataset
+for zip_file in $DOWNLOAD_DIR/KITTI_Object/raw/*.zip; do
+    echo "Unzipping $zip_file to $DATA_ROOT ......"
+	unzip -oq $zip_file -d $DATA_ROOT
+    echo "[Done] Unzip $zip_file to $DATA_ROOT"
+    # delete the original files
+	rm -f $zip_file
+done
--- a/tools/dataset_converters/nuscenes_unzip.sh
+++ b/tools/dataset_converters/nuscenes_unzip.sh
+#!/usr/bin/env bash
+DOWNLOAD_DIR=$1  # The directory where the downloaded data set is stored
+DATA_ROOT=$2  # The root directory of the converted dataset
+for split in $DOWNLOAD_DIR/nuScenes/raw/*; do
+    for tgz_file in $split/*; do
+        if [[ $tgz_file == *.tgz ]]
+        then
+            echo "Unzipping $tgz_file to $DATA_ROOT ......"
+            unzip -oq $tgz_file -d $DATA_ROOT/
+            echo "[Done] Unzip $tgz_file to $DATA_ROOT"
+        fi
+        # delete the original files
+        rm -f $tgz_file
+    done
+done
--- a/tools/dataset_converters/semantickitti_unzip.sh
+++ b/tools/dataset_converters/semantickitti_unzip.sh
+#!/usr/bin/env bash
+DOWNLOAD_DIR=$1  # The directory where the downloaded data set is stored
+DATA_ROOT=$2  # The root directory of the converted dataset
+for zip_file in $DOWNLOAD_DIR/SemanticKITTI/raw/*.zip; do
+    echo "Unzipping $zip_file to $DATA_ROOT ......"
+	unzip -oq $zip_file -d $DATA_ROOT
+    echo "[Done] Unzip $zip_file to $DATA_ROOT"
+    # delete the original files
+	rm -f $zip_file
+done