Use pre-commit to reformat code

Use pre-commit to reformat code

Use pre-commit to reformat code
41b18fd8 · zhe chen · ff20ea39 · 41b18fd8 · 41b18fd8 · 41b18fd8
Commit 41b18fd8 authored Jan 06, 2025 by zhe chen
20 changed files
--- a/.flake8
+++ b/.flake8
+[flake8]
+ignore = E501, E502, F403, C901, W504, W605, E251, E122, E126, E127, E722, W503, E128, E741, E731, E701, E712
+select = E1, E3, E502, E7, E9, W1, W5, W6
+max-line-length = 180
+exclude=*.egg/*,build,dist,detection/configs/*
--- a/.isort.cfg
+++ b/.isort.cfg
+[isort]
+line-length = 180
+multi_line_output = 0
+extra_standard_library = setuptools
+known_third_party = PIL,asynctest,cityscapesscripts,cv2,gather_models,matplotlib,mmcv,numpy,onnx,onnxruntime,pycocotools,pytest,pytorch_sphinx_theme,requests,scipy,seaborn,six,terminaltables,torch,ts,yaml
+no_lines_before = STDLIB,LOCALFOLDER
+default_section = THIRDPARTY
+
+[yapf]
+BASED_ON_STYLE = pep8
+BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = true
+SPLIT_BEFORE_EXPRESSION_AFTER_OPENING_PAREN = true
+
+[codespell]
+skip = *.ipynb
+quiet-level = 3
+ignore-words-list = patten,nd,ty,mot,hist,formating,winn,gool,datas,wan,confids,TOOD,tood
+© 2022 GitHub, Inc.
+Terms
+Privacy
+Security
+Status
+Docs
+Contact GitHub
+Pricing
+API
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
+exclude: ^internvl_chat_llava/
+repos:
+  - repo: https://github.com/PyCQA/flake8
+    rev: 5.0.4
+    hooks:
+      - id: flake8
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.11.5
+    hooks:
+      - id: isort
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.3.0
+    hooks:
+      - id: trailing-whitespace
+      - id: check-yaml
+      - id: end-of-file-fixer
+      - id: requirements-txt-fixer
+      - id: double-quote-string-fixer
+      - id: check-merge-conflict
+      - id: fix-encoding-pragma
+        args: ["--remove"]
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.9
+    hooks:
+      - id: mdformat
+        args: ["--number"]
+        additional_dependencies:
+          - mdformat-openmmlab
+          - mdformat_frontmatter
+          - linkify-it-py
--- a/README.md
+++ b/README.md
@@ -25,43 +25,49 @@ We currently receive a bunch of issues, our team will check and solve them one b
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-v2-adapting-modern-image-backbones/3d-object-detection-on-nuscenes-camera-only)](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes-camera-only?p=bevformer-v2-adapting-modern-image-backbones)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/internimage-exploring-large-scale-vision/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=internimage-exploring-large-scale-vision)

-The official implementation of  
+The official implementation of

 [InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions](https://arxiv.org/abs/2211.05778).

-[[Paper](https://arxiv.org/abs/2211.05778)]  [[Blog in Chinese](https://zhuanlan.zhihu.com/p/610772005)]
+\[[Paper](https://arxiv.org/abs/2211.05778)\]  \[[Blog in Chinese](https://zhuanlan.zhihu.com/p/610772005)\]

 ## Highlights
+
 - :thumbsup: **The strongest open-source visual universal backbone model with up to 3 billion parameters**
 - 🏆 **Achieved `90.1% Top1` accuracy in ImageNet, the most accurate among open-source models**
 - 🏆 **Achieved `65.5 mAP` on the COCO benchmark dataset for object detection, the only model that exceeded `65.0 mAP`**

 ## Related Projects
+
 ### Foundation Models
+
 - [Uni-Perceiver](https://github.com/fundamentalvision/Uni-Perceiver): A Pre-training unified architecture for generic perception for zero-shot and few-shot tasks
 - [Uni-Perceiver v2](https://arxiv.org/abs/2211.09808): A generalist model for large-scale vision and vision-language tasks
 - [M3I-Pretraining](https://github.com/OpenGVLab/M3I-Pretraining): One-stage pre-training paradigm via maximizing multi-modal mutual information
 - [InternVL](https://github.com/OpenGVLab/InternVL): The largest open-source vision/vision-language foundation model (14B) to date
-  
+
 ### Autonomous Driving
+
 - [BEVFormer](https://github.com/fundamentalvision/BEVFormer): A cutting-edge baseline for camera-based 3D detection
 - [BEVFormer v2](https://arxiv.org/abs/2211.10439):  Adapting modern image backbones to Bird's-Eye-View recognition via perspective supervision

 ## Application in Challenges
+
 - [2022 Waymo 3D Camera-Only Detection Challenge](https://waymo.com/open/challenges/2022/3d-camera-only-detection/): BEVFormer++ **Ranks 1st** based on InternImage
 - [nuScenes 3D detection task](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Camera): BEVFormer v2 achieves SOTA performance of 64.8 NDS on nuScenes Camera Only
 - [CVPR 2023 Workshop End-to-End Autonomous Driving](https://opendrivelab.com/e2ead/cvpr23): InternImage supports the baseline of the [3D Occupancy Prediction Challenge](https://opendrivelab.com/AD23Challenge.html#Track3) and [OpenLane Topology Challenge](https://opendrivelab.com/AD23Challenge.html#Track1)

-
 ## News
+
 - `Jan 22, 2024`: 🚀 Support [DCNv4](https://github.com/OpenGVLab/DCNv4) in InternImage!
 - `Mar 14, 2023`: 🚀 "INTERN-2.5" is released！
 - `Feb 28, 2023`: 🚀 InternImage is accepted to CVPR 2023!
 - `Nov 18, 2022`: 🚀 InternImage-XL merged into [BEVFormer v2](https://arxiv.org/abs/2211.10439) achieves state-of-the-art performance of `63.4 NDS` on nuScenes Camera Only.
 - `Nov 10, 2022`: 🚀 InternImage-H achieves a new record `65.4 mAP` on COCO detection test-dev and `62.9 mIoU` on
-ADE20K, outperforming previous models by a large margin.
+  ADE20K, outperforming previous models by a large margin.

 ## History
+
 - [ ] Models/APIs for other downstream tasks
 - [ ] Support [CVPR 2023 Workshop on End-to-End Autonomous Driving](https://opendrivelab.com/e2ead/cvpr23), see [here](https://github.com/OpenGVLab/InternImage/tree/master/autonomous_driving)
 - [ ] Support Segment Anything
@@ -77,6 +83,7 @@ ADE20K, outperforming previous models by a large margin.
 - [x] InternImage-T/S/B/L/XL semantic segmentation model

 ## Introduction
+
 "INTERN-2.5" is a powerful multimodal multitask general model jointly released by SenseTime and Shanghai AI Laboratory. It consists of large-scale vision foundation model "InternImage", pre-training method "M3I-Pretraining", generic decoder "Uni-Perceiver" series, and generic encoder for autonomous driving perception "BEVFormer" series.

 <div align=left>
@@ -93,10 +100,10 @@ ADE20K, outperforming previous models by a large margin.

 "INTERN-2.5" also demonstrated world's best performance on 16 other important visual benchmark datasets, covering a wide range of tasks such as classification, detection, and segmentation, making it the top-performing model across multiple domains.

-
 **Performance**

 - Classification
+
 <table border="1" width="90%">
 	<tr align="center">
        <th colspan="1"> Image Classification</th><th colspan="2"> Scene Classification </th><th colspan="1">Long-Tail Classification</th>
@@ -124,6 +131,7 @@ ADE20K, outperforming previous models by a large margin.
 </table>

 - Segmentation
+
 <table border="1" width="90%">
 	<tr align="center">
        <th colspan="3">Semantic Segmentation</th><th colspan="1">Street Segmentation</th><th colspan="1">RGBD Segmentation</th>
@@ -141,10 +149,8 @@ ADE20K, outperforming previous models by a large margin.

 **Image-Text Retrieval**: "INTERN-2.5" can quickly locate and retrieve the most semantically relevant images based on textual content requirements. This capability can be applied to both videos and image collections and can be further combined with object detection boxes to enable a variety of applications, helping users quickly and easily find the required image resources. For example, it can return the relevant images specified by the text in the album.

-
 **Image-To-Text**: "INTERN-2.5" has a strong understanding capability in various aspects of visual-to-text tasks such as image captioning, visual question answering, visual reasoning, and optical character recognition. For example, in the context of autonomous driving, it can enhance the scene perception and understanding capabilities, assist the vehicle in judging traffic signal status, road signs, and other information, and provide effective perception information support for vehicle decision-making and planning.

-
 **Performance**

 <table border="1" width="90%">
@@ -173,6 +179,7 @@ ADE20K, outperforming previous models by a large margin.
 | InternImage-XL | ImageNet-22K |         384x384          |  335M  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth)   |
 | InternImage-H  |  Joint 427M  |         384x384          | 1.08B  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth)   |
 | InternImage-G  |      -       |         384x384          |   3B   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
+
 </div>

 </details>
@@ -182,8 +189,8 @@ ADE20K, outperforming previous models by a large margin.
 <br>
 <div>

-|      name      |   pretrain   | resolution | acc@1 | #param | FLOPs |                                                                             download                                                                              |
-| :------------: | :----------: | :--------: | :---: | :----: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|      name      |   pretrain   | resolution | acc@1 | #param | FLOPs |                                                                                      download                                                                                      |
+| :------------: | :----------: | :--------: | :---: | :----: | :---: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | InternImage-T  | ImageNet-1K  |  224x224   | 83.5  |  30M   |  5G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](classification/configs/without_lr_decay/internimage_t_1k_224.yaml)       |
 | InternImage-S  | ImageNet-1K  |  224x224   | 84.2  |  50M   |  8G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](classification/configs/without_lr_decay/internimage_s_1k_224.yaml)       |
 | InternImage-B  | ImageNet-1K  |  224x224   | 84.9  |  97M   |  16G  |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](classification/configs/without_lr_decay/internimage_b_1k_224.yaml)       |
@@ -191,6 +198,7 @@ ADE20K, outperforming previous models by a large margin.
 | InternImage-XL | ImageNet-22K |  384x384   | 88.0  |  335M  | 163G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](classification/configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
 | InternImage-H  |  Joint 427M  |  640x640   | 89.6  | 1.08B  | 1478G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](classification/configs/without_lr_decay/internimage_h_22kto1k_640.yaml)  |
 | InternImage-G  |      -       |  512x512   | 90.1  |   3B   | 2700G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](classification/configs/without_lr_decay/internimage_g_22kto1k_512.yaml)  |
+
 </div>

 </details>
@@ -200,18 +208,18 @@ ADE20K, outperforming previous models by a large margin.
 <br>
 <div>

-|    backbone    |   method   | schd  | box mAP | mask mAP | #param | FLOPs |                                                                                     download                                                                                      |
-| :------------: | :--------: | :---: | :-----: | :------: | :----: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| InternImage-T  | Mask R-CNN |  1x   |  47.2   |   42.5   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py) |
-| InternImage-T  | Mask R-CNN |  3x   |  49.1   |   43.7   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_3x_coco.py) |
-| InternImage-S  | Mask R-CNN |  1x   |  47.8   |   43.3   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_1x_coco.py) |
-| InternImage-S  | Mask R-CNN |  3x   |  49.7   |   44.5   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_3x_coco.py) |
-| InternImage-B  | Mask R-CNN |  1x   |  48.8   |   44.0   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_1x_coco.py) |
-| InternImage-B  | Mask R-CNN |  3x   |  50.3   |   44.8   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_3x_coco.py) |
-| InternImage-L  |  Cascade   |  1x   |  54.9   |   47.7   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_1x_coco.py)   |
-| InternImage-L  |  Cascade   |  3x   |  56.1   |   48.5   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_3x_coco.py)   |
-| InternImage-XL |  Cascade   |  1x   |  55.3   |   48.1   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_1x_coco.py)  |
-| InternImage-XL |  Cascade   |  3x   |  56.2   |   48.8   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_3x_coco.py)  |
+|    backbone    |   method   | schd | box mAP | mask mAP | #param | FLOPs |                                                                                     download                                                                                      |
+| :------------: | :--------: | :--: | :-----: | :------: | :----: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-T  | Mask R-CNN |  1x  |  47.2   |   42.5   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py) |
+| InternImage-T  | Mask R-CNN |  3x  |  49.1   |   43.7   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_3x_coco.py) |
+| InternImage-S  | Mask R-CNN |  1x  |  47.8   |   43.3   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_1x_coco.py) |
+| InternImage-S  | Mask R-CNN |  3x  |  49.7   |   44.5   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_3x_coco.py) |
+| InternImage-B  | Mask R-CNN |  1x  |  48.8   |   44.0   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_1x_coco.py) |
+| InternImage-B  | Mask R-CNN |  3x  |  50.3   |   44.8   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_3x_coco.py) |
+| InternImage-L  |  Cascade   |  1x  |  54.9   |   47.7   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_1x_coco.py)   |
+| InternImage-L  |  Cascade   |  3x  |  56.1   |   48.5   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_3x_coco.py)   |
+| InternImage-XL |  Cascade   |  1x  |  55.3   |   48.1   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_1x_coco.py)  |
+| InternImage-XL |  Cascade   |  3x  |  56.2   |   48.8   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_3x_coco.py)  |

 |   backbone    |   method   | box mAP (val/test) | #param | FLOPs | download |
 | :-----------: | :--------: | :----------------: | :----: | :---: | :------: |
@@ -222,21 +230,20 @@ ADE20K, outperforming previous models by a large margin.

 </details>

-
 <details>
 <summary> ADE20K Semantic Segmentation </summary>
 <br>
 <div>

-|    backbone    |   method    | resolution | mIoU (ss/ms) | #param | FLOPs |                                                                                           download                                                                                           |
-| :------------: | :---------: | :--------: | :----------: | :----: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| InternImage-T  |   UperNet   |  512x512   | 47.9 / 48.1  |  59M   | 944G  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_t_512_160k_ade20k.py)  |
-| InternImage-S  |   UperNet   |  512x512   | 50.1 / 50.9  |  80M   | 1017G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py)  |
-| InternImage-B  |   UperNet   |  512x512   | 50.8 / 51.3  |  128M  | 1185G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_b_512_160k_ade20k.py)  |
-| InternImage-L  |   UperNet   |  640x640   | 53.9 / 54.1  |  256M  | 2526G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_l_640_160k_ade20k.py)  |
-| InternImage-XL |   UperNet   |  640x640   | 55.0 / 55.3  |  368M  | 3142G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py) |
-| InternImage-H  |   UperNet   |  896x896   | 59.9 / 60.3  | 1.12B  | 3566G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_h_896_160k_ade20k.py)  |
-| InternImage-H  | Mask2Former |  896x896   | 62.5 / 62.9  | 1.31B  | 4635G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [cfg](segmentation/configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py)  |
+|    backbone    |   method    | resolution | mIoU (ss/ms) | #param | FLOPs |                                                                                                        download                                                                                                         |
+| :------------: | :---------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-T  |   UperNet   |  512x512   | 47.9 / 48.1  |  59M   | 944G  |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_t_512_160k_ade20k.py)                |
+| InternImage-S  |   UperNet   |  512x512   | 50.1 / 50.9  |  80M   | 1017G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py)                |
+| InternImage-B  |   UperNet   |  512x512   | 50.8 / 51.3  |  128M  | 1185G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_b_512_160k_ade20k.py)                |
+| InternImage-L  |   UperNet   |  640x640   | 53.9 / 54.1  |  256M  | 2526G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_l_640_160k_ade20k.py)                |
+| InternImage-XL |   UperNet   |  640x640   | 55.0 / 55.3  |  368M  | 3142G |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py)               |
+| InternImage-H  |   UperNet   |  896x896   | 59.9 / 60.3  | 1.12B  | 3566G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_h_896_160k_ade20k.py)                |
+| InternImage-H  | Mask2Former |  896x896   | 62.5 / 62.9  | 1.31B  | 4635G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [cfg](segmentation/configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) |

 </div>

@@ -262,6 +269,7 @@ ADE20K, outperforming previous models by a large margin.
 | InternImage-XL |  384x384   |  335M  | 163G  |           47           |

 Before using `mmdeploy` to convert our PyTorch models to TensorRT, please make sure you have the DCNv3 custom operator built correctly. You can build it with the following command:
+
 ```shell
 export MMDEPLOY_DIR=/the/root/path/of/MMDeploy

@@ -278,14 +286,13 @@ make -j$(nproc) && make install
 cd ${MMDEPLOY_DIR}
 pip install -e .
 ```
+
 For more details on building custom ops, please refering to [this document](https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/01-how-to-build/linux-x86_64.md).

 </div>

 </details>

-
-
 ## Citations

 If this work is helpful for your research, please consider citing the following BibTeX entry.

--- a/README_CN.md
+++ b/README_CN.md
@@ -27,31 +27,37 @@

 这个代码仓库是[InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions](https://arxiv.org/abs/2211.05778)的官方实现。

-[[论文](https://arxiv.org/abs/2211.05778)] [[知乎专栏](https://zhuanlan.zhihu.com/p/610772005)]
+\[[论文](https://arxiv.org/abs/2211.05778)\] \[[知乎专栏](https://zhuanlan.zhihu.com/p/610772005)\]

 ## 亮点
+
 - :thumbsup: **高达30亿参数的最强视觉通用主干模型**
 - 🏆 **图像分类标杆数据集ImageNet `90.1% Top1`准确率，开源模型中准确度最高**
 - 🏆 **物体检测标杆数据集COCO `65.5 mAP`，唯一超过`65 mAP`的模型**

 ## 相关项目
+
 ### 多模态基模型
+
 - [Uni-Perceiver](https://github.com/fundamentalvision/Uni-Perceiver): 通用感知任务预训练统一框架, 可直接处理zero-shot和few-shot任务
- [Uni-Perceiver v2](https://arxiv.org/abs/2211.09808): 
-用于处理图像/图文任务的通用模型
+- [Uni-Perceiver v2](https://arxiv.org/abs/2211.09808):
+  用于处理图像/图文任务的通用模型
 - [M3I-Pretraining](https://github.com/OpenGVLab/M3I-Pretraining): 基于最大化输入和目标的互信息的单阶段预训练范式

 ### 自动驾驶
+
 - [BEVFormer](https://github.com/fundamentalvision/BEVFormer): 基于BEV的新一代纯视觉环视感知方案
 - [BEVFormer v2](https://arxiv.org/abs/2211.10439): 融合BEV感知和透视图检测的两阶段检测器
+
 ## Application in Challenge
+
 - [2022 Waymo 3D Camera-Only Detection Challenge](https://waymo.com/open/challenges/2022/3d-camera-only-detection/): 基于书生2.5 BEVFormer++取得赛道冠军
 - [nuScenes 3D detection task](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Camera): BEVFormer v2 在nuScenes纯视觉检测任务中取得SOTA性能(64.8 NDS)
- [CVPR 2023 Workshop End-to-End Autonomous Driving](https://opendrivelab.com/e2ead/cvpr23): InternImage作为baseline支持了比赛 
-[3D Occupancy Prediction Challenge](https://opendrivelab.com/AD23Challenge.html#Track3)和[OpenLane Topology Challenge](https://opendrivelab.com/AD23Challenge.html#Track1)
-
+- [CVPR 2023 Workshop End-to-End Autonomous Driving](https://opendrivelab.com/e2ead/cvpr23): InternImage作为baseline支持了比赛
+  [3D Occupancy Prediction Challenge](https://opendrivelab.com/AD23Challenge.html#Track3)和[OpenLane Topology Challenge](https://opendrivelab.com/AD23Challenge.html#Track1)

 ## 最新进展
+
 - 2023年3月14日: 🚀 “书生2.5”发布！
 - 2023年2月28日: 🚀 InternImage 被CVPR 2023接收!
 - 2022年11月18日: 🚀 基于 InternImage-XL 主干网络，[BEVFormer v2](https://arxiv.org/abs/2211.10439) 在nuScenes的纯视觉3D检测任务上取得了最佳性能 `63.4 NDS` ！
@@ -59,6 +65,7 @@
 - 2022年11月10日: 🚀 InternImage-H 在ADE20K语义分割数据集上取得 `62.9 mIoU` 的SOTA性能！

 ## 项目功能
+
 - [ ] 各类下游任务
 - [ ] 支持[CVPR 2023 Workshop on End-to-End Autonomous Driving](https://opendrivelab.com/e2ead/cvpr23)，[详见](https://github.com/OpenGVLab/InternImage/tree/master/autonomous_driving)
 - [ ] 支持Segment Anything
@@ -73,26 +80,27 @@
 - [x] InternImage-T/S/B/L/XL 检测和实例分割模型
 - [x] InternImage-T/S/B/L/XL 语义分割模型

-
 ## 简介
+
 "书生2.5"是商汤科技与上海人工智能实验室联合发布的多模态多任务通用大模型。"书生2.5"包括大规模视觉基础模型"InternImage"，预训练算法"M3I-Pretraining"，通用解码器"Uni-Perceiver"系列，以及自动驾驶感知通用编码器"BEVFormer"系列。

 <div align=left>
 <img src='./docs/figs/intern_pipeline.png' width=900>
 </div>

-
 ## “书生2.5”的应用

 ### 1. 图像模态任务性能
+
 - 在图像分类标杆数据集ImageNet上，“书生2.5”仅基于公开数据便达到了 90.1% 的Top-1准确率。这是除谷歌与微软两个未公开模型及额外数据集外，唯一准确率超过90.0%的模型，同时也是世界上开源模型中ImageNet准确度最高，规模最大的模型；
 - 在物体检测标杆数据集COCO上，“书生2.5” 取得了 65.5 的 mAP，是世界上唯一超过65 mAP的模型；
 - 在另外16个重要的视觉基础数据集（覆盖分类、检测和分割任务）上取得世界最好性能。

 <div align="left">
 <br>
-	
+
 **分类任务**
+
 <table border="1" width="90%">
 	<tr align="center">
        <th colspan="1"> 图像分类</th><th colspan="2"> 场景分类 </th><th colspan="1">长尾分类</th>
@@ -106,8 +114,8 @@
 </table>
 <br>

-
 **检测任务**
+
 <table border="1" width="90%">
 	<tr align="center">
        <th colspan="4"> 常规物体检测</th><th colspan="2">长尾物体检测 </th><th colspan="2">自动驾驶物体检测</th><th colspan="1">密集物体检测</th>
@@ -122,6 +130,7 @@
 <br>

 **分割任务**
+
 <table border="1" width="90%">
 	<tr align="center">
        <th colspan="3">语义分割</th><th colspan="1">街景分割</th><th colspan="1">RGBD分割</th>
@@ -143,17 +152,15 @@

 “书生2.5”可根据文本内容需求快速定位检索出语义最相关的图像。这一能力既可应用于视频和图像集合，也可进一步结合物体检测框，具有丰富的应用模式，帮助用户更便捷、快速地找到所需图像资源, 例如可在相册中返回文本所指定的相关图像。

-
 - 以图生文

 “书生2.5”的“以图生文”在图像描述、视觉问答、视觉推理和文字识别等多个方面均拥有强大的理解能力。例如在自动驾驶场景下，可以提升场景感知理解能力，辅助车辆判断交通信号灯状态、道路标志牌等信息，为车辆的决策规划提供有效的感知信息支持。

-
-
 <div align="left">
 <br>
-	
+
 **图文多模态任务**
+
 <table border="1" width="90%">
 	<tr align="center">
        <th colspan="1">图像描述</th><th colspan="2">微调图文检索</th><th colspan="1">零样本图文检索</th>
@@ -169,7 +176,6 @@

 </div>

-
 ## 预训练模型

 <details>
@@ -183,6 +189,7 @@
 | InternImage-XL | ImageNet-22K |         384x384          |  335M  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth)   |
 | InternImage-H  |  Joint 427M  |         384x384          | 1.08B  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth)   |
 | InternImage-G  |      -       |         384x384          |   3B   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
+
 </div>

 </details>
@@ -192,8 +199,8 @@
 <br>
 <div>

-|      name      |   pretrain   | resolution | acc@1 | #param | FLOPs |                                                                             download                                                                              |
-| :------------: | :----------: | :--------: | :---: | :----: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|      name      |   pretrain   | resolution | acc@1 | #param | FLOPs |                                                                                      download                                                                                      |
+| :------------: | :----------: | :--------: | :---: | :----: | :---: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | InternImage-T  | ImageNet-1K  |  224x224   | 83.5  |  30M   |  5G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](classification/configs/without_lr_decay/internimage_t_1k_224.yaml)       |
 | InternImage-S  | ImageNet-1K  |  224x224   | 84.2  |  50M   |  8G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](classification/configs/without_lr_decay/internimage_s_1k_224.yaml)       |
 | InternImage-B  | ImageNet-1K  |  224x224   | 84.9  |  97M   |  16G  |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](classification/configs/without_lr_decay/internimage_b_1k_224.yaml)       |
@@ -201,6 +208,7 @@
 | InternImage-XL | ImageNet-22K |  384x384   | 88.0  |  335M  | 163G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](classification/configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
 | InternImage-H  |  Joint 427M  |  640x640   | 89.6  | 1.08B  | 1478G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](classification/configs/without_lr_decay/internimage_h_22kto1k_640.yaml)  |
 | InternImage-G  |      -       |  512x512   | 90.1  |   3B   | 2700G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](classification/configs/without_lr_decay/internimage_g_22kto1k_512.yaml)  |
+
 </div>

 </details>
@@ -210,18 +218,18 @@
 <br>
 <div>

-|    backbone    |   method   | schd  | box mAP | mask mAP | #param | FLOPs |                                                                                     download                                                                                      |
-| :------------: | :--------: | :---: | :-----: | :------: | :----: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| InternImage-T  | Mask R-CNN |  1x   |  47.2   |   42.5   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py) |
-| InternImage-T  | Mask R-CNN |  3x   |  49.1   |   43.7   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_3x_coco.py) |
-| InternImage-S  | Mask R-CNN |  1x   |  47.8   |   43.3   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_1x_coco.py) |
-| InternImage-S  | Mask R-CNN |  3x   |  49.7   |   44.5   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_3x_coco.py) |
-| InternImage-B  | Mask R-CNN |  1x   |  48.8   |   44.0   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_1x_coco.py) |
-| InternImage-B  | Mask R-CNN |  3x   |  50.3   |   44.8   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_3x_coco.py) |
-| InternImage-L  |  Cascade   |  1x   |  54.9   |   47.7   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_1x_coco.py)   |
-| InternImage-L  |  Cascade   |  3x   |  56.1   |   48.5   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_3x_coco.py)   |
-| InternImage-XL |  Cascade   |  1x   |  55.3   |   48.1   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_1x_coco.py)  |
-| InternImage-XL |  Cascade   |  3x   |  56.2   |   48.8   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_3x_coco.py)  |
+|    backbone    |   method   | schd | box mAP | mask mAP | #param | FLOPs |                                                                                     download                                                                                      |
+| :------------: | :--------: | :--: | :-----: | :------: | :----: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-T  | Mask R-CNN |  1x  |  47.2   |   42.5   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py) |
+| InternImage-T  | Mask R-CNN |  3x  |  49.1   |   43.7   |  49M   | 270G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_t_fpn_3x_coco.py) |
+| InternImage-S  | Mask R-CNN |  1x  |  47.8   |   43.3   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_1x_coco.py) |
+| InternImage-S  | Mask R-CNN |  3x  |  49.7   |   44.5   |  69M   | 340G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_s_fpn_3x_coco.py) |
+| InternImage-B  | Mask R-CNN |  1x  |  48.8   |   44.0   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_1x_coco.py) |
+| InternImage-B  | Mask R-CNN |  3x  |  50.3   |   44.8   |  115M  | 501G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/mask_rcnn_internimage_b_fpn_3x_coco.py) |
+| InternImage-L  |  Cascade   |  1x  |  54.9   |   47.7   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_1x_coco.py)   |
+| InternImage-L  |  Cascade   |  3x  |  56.1   |   48.5   |  277M  | 1399G |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_l_fpn_3x_coco.py)   |
+| InternImage-XL |  Cascade   |  1x  |  55.3   |   48.1   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_1x_coco.py)  |
+| InternImage-XL |  Cascade   |  3x  |  56.2   |   48.8   |  387M  | 1782G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.pth) \| [cfg](detection/configs/coco/cascade_internimage_xl_fpn_3x_coco.py)  |

 |   backbone    |   method   | box mAP (val/test) | #param | FLOPs | download |
 | :-----------: | :--------: | :----------------: | :----: | :---: | :------: |
@@ -232,21 +240,20 @@

 </details>

-
 <details>
 <summary> ADE20K语义分割 </summary>
 <br>
 <div>

-|    backbone    |   method    | resolution | mIoU (ss/ms) | #param | FLOPs |                                                                                           download                                                                                           |
-| :------------: | :---------: | :--------: | :----------: | :----: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| InternImage-T  |   UperNet   |  512x512   | 47.9 / 48.1  |  59M   | 944G  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_t_512_160k_ade20k.py)  |
-| InternImage-S  |   UperNet   |  512x512   | 50.1 / 50.9  |  80M   | 1017G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py)  |
-| InternImage-B  |   UperNet   |  512x512   | 50.8 / 51.3  |  128M  | 1185G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_b_512_160k_ade20k.py)  |
-| InternImage-L  |   UperNet   |  640x640   | 53.9 / 54.1  |  256M  | 2526G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_l_640_160k_ade20k.py)  |
-| InternImage-XL |   UperNet   |  640x640   | 55.0 / 55.3  |  368M  | 3142G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py) |
-| InternImage-H  |   UperNet   |  896x896   | 59.9 / 60.3  | 1.12B  | 3566G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_h_896_160k_ade20k.py)  |
-| InternImage-H  | Mask2Former |  896x896   | 62.5 / 62.9  | 1.31B  | 4635G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [cfg](segmentation/configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py)  |
+|    backbone    |   method    | resolution | mIoU (ss/ms) | #param | FLOPs |                                                                                                        download                                                                                                         |
+| :------------: | :---------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-T  |   UperNet   |  512x512   | 47.9 / 48.1  |  59M   | 944G  |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_t_512_160k_ade20k.py)                |
+| InternImage-S  |   UperNet   |  512x512   | 50.1 / 50.9  |  80M   | 1017G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py)                |
+| InternImage-B  |   UperNet   |  512x512   | 50.8 / 51.3  |  128M  | 1185G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_b_512_160k_ade20k.py)                |
+| InternImage-L  |   UperNet   |  640x640   | 53.9 / 54.1  |  256M  | 2526G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_l_640_160k_ade20k.py)                |
+| InternImage-XL |   UperNet   |  640x640   | 55.0 / 55.3  |  368M  | 3142G |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py)               |
+| InternImage-H  |   UperNet   |  896x896   | 59.9 / 60.3  | 1.12B  | 3566G |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [cfg](segmentation/configs/ade20k/upernet_internimage_h_896_160k_ade20k.py)                |
+| InternImage-H  | Mask2Former |  896x896   | 62.5 / 62.9  | 1.31B  | 4635G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [cfg](segmentation/configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) |

 </div>

@@ -357,5 +364,4 @@ pip install -e .

 <div align=left>

-[//]: # (<img src='./docs/figs/log.png' width=600>)
 </div>
--- a/autonomous_driving/Online-HD-Map-Construction/README.md
+++ b/autonomous_driving/Online-HD-Map-Construction/README.md
 <div id="top" align="center">

 # InternImage-based Baseline for Online HD Map Construction Challenge For Autonomous Driving
- </div>

+</div>

+If you need detaild information about the challenge, please refer
+to https://github.com/Tsinghua-MARS-Lab/Online-HD-Map-Construction-CVPR2023/tree/master

-If you need detaild information about the challenge, please refer to https://github.com/Tsinghua-MARS-Lab/Online-HD-Map-Construction-CVPR2023/tree/master
 #### 1. Requirements
+
 ```bash
 python>=3.8
 torch==1.11 # recommend
@@ -18,8 +20,8 @@ numpy==1.23.5
 mmdet3d==1.0.0rc6 # recommend
 ```

-
 ### 2. Install DCNv3 for InternImage
+
 ```bash
 cd projects/ops_dcnv3
 bash make.sh # requires torch>=1.10
@@ -33,21 +35,17 @@ bash tools/dist_train.sh src/configs/vectormapnet_intern.py ${NUM_GPUS}

 Notes: InatenImage provides abundant pre-trained model weights that can be used!!!

-
 ### 4. Performance compared to baseline

-model name|weight|$\mathrm{mAP}$ | $\mathrm{AP}_{pc}$ | $\mathrm{AP}_{div}$ | $\mathrm{AP}_{bound}$ | 
----|:----------:| :--: | :--: | :--: | :--: | 
-vectormapnet_intern|[Checkpoint](https://github.com/OpenGVLab/InternImage/releases/download/track_model/vectormapnet_internimage.pth) | 49.35 | 45.05 | 56.78 | 46.22 | 
-vectormapnet_base|[Google Drive](https://drive.google.com/file/d/16D1CMinwA8PG1sd9PV9_WtHzcBohvO-D/view) | 42.79 | 37.22 | 50.47	 | 40.68 | 
-
-
-
-
+| model name          |                                                      weight                                                       | $\\mathrm{mAP}$ | $\\mathrm{AP}\_{pc}$ | $\\mathrm{AP}\_{div}$ | $\\mathrm{AP}\_{bound}$ |
+| ------------------- | :---------------------------------------------------------------------------------------------------------------: | :-------------: | :------------------: | :-------------------: | :---------------------: |
+| vectormapnet_intern | [Checkpoint](https://github.com/OpenGVLab/InternImage/releases/download/track_model/vectormapnet_internimage.pth) |      49.35      |        45.05         |         56.78         |          46.22          |
+| vectormapnet_base   |              [Google Drive](https://drive.google.com/file/d/16D1CMinwA8PG1sd9PV9_WtHzcBohvO-D/view)               |      42.79      |        37.22         |         50.47         |          40.68          |

 ## Citation

-The evaluation metrics of this challenge follows [HDMapNet](https://arxiv.org/abs/2107.06307). We provide [VectorMapNet](https://arxiv.org/abs/2206.08920) as the baseline. Please cite:
+The evaluation metrics of this challenge follows [HDMapNet](https://arxiv.org/abs/2107.06307). We
+provide [VectorMapNet](https://arxiv.org/abs/2206.08920) as the baseline. Please cite:

 ```
 @article{li2021hdmapnet,
@@ -69,8 +67,8 @@ Our dataset is built on top of the [Argoverse 2](https://www.argoverse.org/av2.h
 }
 ```

-
 ## License

-Before participating in our challenge, you should register on the website and agree to the terms of use of the [Argoverse 2](https://www.argoverse.org/av2.html) dataset.
-All code in this project is released under [GNU General Public License v3.0](./LICENSE).
+Before participating in our challenge, you should register on the website and agree to the terms of use of
+the [Argoverse 2](https://www.argoverse.org/av2.html) dataset. All code in this project is released
+under [GNU General Public License v3.0](./LICENSE).
--- a/autonomous_driving/Online-HD-Map-Construction/src/__init__.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/__init__.py
-from .models import *
-from .datasets import *
\ No newline at end of file
--- a/autonomous_driving/Online-HD-Map-Construction/src/configs/_base_/datasets/s3dis_seg-3d-13class.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/configs/_base_/datasets/s3dis_seg-3d-13class.py
@@ -125,8 +125,7 @@ data = dict(
        classes=class_names,
        test_mode=True,
        ignore_index=len(class_names),
-        scene_idxs=data_root +
-        f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
+        scene_idxs=data_root + f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
    test=dict(
        type=dataset_type,
        data_root=data_root,

--- a/autonomous_driving/Online-HD-Map-Construction/src/configs/_base_/models/3dssd.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/configs/_base_/models/3dssd.py
@@ -25,7 +25,7 @@ model = dict(
            in_channels=256,
            num_points=256,
            gt_per_seed=1,
-            conv_channels=(128, ),
+            conv_channels=(128,),
            conv_cfg=dict(type='Conv1d'),
            norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
            with_res_feat=False,
@@ -43,8 +43,8 @@ model = dict(
        pred_layer_cfg=dict(
            in_channels=1536,
            shared_conv_channels=(512, 128),
-            cls_conv_channels=(128, ),
-            reg_conv_channels=(128, ),
+            cls_conv_channels=(128,),
+            reg_conv_channels=(128,),
            conv_cfg=dict(type='Conv1d'),
            norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
            bias=True),

--- a/autonomous_driving/Online-HD-Map-Construction/src/configs/_base_/models/fcos3d.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/configs/_base_/models/fcos3d.py
@@ -31,16 +31,16 @@ model = dict(
        dir_offset=0.7854,  # pi/4
        strides=[8, 16, 32, 64, 128],
        group_reg_dims=(2, 1, 3, 1, 2),  # offset, depth, size, rot, velo
-        cls_branch=(256, ),
+        cls_branch=(256,),
        reg_branch=(
-            (256, ),  # offset
-            (256, ),  # depth
-            (256, ),  # size
-            (256, ),  # rot
+            (256,),  # offset
+            (256,),  # depth
+            (256,),  # size
+            (256,),  # rot
            ()  # velo
        ),
-        dir_branch=(256, ),
-        attr_branch=(256, ),
+        dir_branch=(256,),
+        attr_branch=(256,),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,

--- a/autonomous_driving/Online-HD-Map-Construction/src/configs/vectormapnet.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/configs/vectormapnet.py
@@ -11,12 +11,12 @@ meta = {
    'output_format': 'vector',

    # NOTE: please modify the information below
-    'method': 'VectorMapNet', # name of your method
+    'method': 'VectorMapNet',  # name of your method
    'authors': ['Yicheng Liu', 'Tianyuan Yuan', 'Yue Wang',
-        'Yilun Wang', 'Hang Zhao'], # author names
-    'e-mail': 'yuantianyuan01@gmail.com', # your e-mail address
-    'institution / company': 'MarsLab, Tsinghua University', # your organization
-    'country / region': 'xxx', # (IMPORTANT) your country/region in iso3166 standard
+                'Yilun Wang', 'Hang Zhao'],  # author names
+    'e-mail': 'yuantianyuan01@gmail.com',  # your e-mail address
+    'institution / company': 'MarsLab, Tsinghua University',  # your organization
+    'country / region': 'xxx',  # (IMPORTANT) your country/region in iso3166 standard
 }

 # model type
@@ -30,7 +30,7 @@ plugin_dir = 'src/'
 img_norm_cfg = dict(
    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)

-img_size = (int(128*2), int((16/9*128)*2))
+img_size = (int(128 * 2), int((16 / 9 * 128) * 2))

 # category configs
 cat2id = {
@@ -41,14 +41,14 @@ cat2id = {
 num_class = max(list(cat2id.values())) + 1

 # bev configs
-roi_size = (60, 30) # bev range, 60m in x-axis, 30m in y-axis
-canvas_size = (200, 100) # bev feature size
+roi_size = (60, 30)  # bev range, 60m in x-axis, 30m in y-axis
+canvas_size = (200, 100)  # bev feature size

 # vectorize params
-coords_dim = 2 # polylines coordinates dimension, 2 or 3
-sample_dist = -1 # sampling params, vectormapnet uses simplify
-sample_num = -1 # sampling params, vectormapnet uses simplify
-simplify = True # sampling params, vectormapnet uses simplify
+coords_dim = 2  # polylines coordinates dimension, 2 or 3
+sample_dist = -1  # sampling params, vectormapnet uses simplify
+sample_num = -1  # sampling params, vectormapnet uses simplify
+simplify = True  # sampling params, vectormapnet uses simplify

 # model configs
 head_dim = 256
@@ -85,21 +85,21 @@ model = dict(
        upsample=dict(
            zoom_size=(1, 2, 4, 8),
            in_channels=128,
-            out_channels=128,),
-        xbound=[-roi_size[0]/2, roi_size[0]/2, roi_size[0]/canvas_size[0]],
-        ybound=[-roi_size[1]/2, roi_size[1]/2, roi_size[1]/canvas_size[1]],
+            out_channels=128, ),
+        xbound=[-roi_size[0] / 2, roi_size[0] / 2, roi_size[0] / canvas_size[0]],
+        ybound=[-roi_size[1] / 2, roi_size[1] / 2, roi_size[1] / canvas_size[1]],
        heights=[-1.1, 0, 0.5, 1.1],
        out_channels=128,
        pretrained=None,
        num_cam=7,
-        ),
+    ),
    head_cfg=dict(
        type='DGHead',
        augmentation=True,
        augmentation_kwargs=dict(
-            p=0.3,scale=0.01,
+            p=0.3, scale=0.01,
            bbox_type='xyxy',
-            ),
+        ),
        det_net_cfg=dict(
            type='MapElementDetector',
            num_query=120,
@@ -135,30 +135,30 @@ model = dict(
                                num_heads=8,
                                attn_drop=0.1,
                                proj_drop=0.1,
-                                dropout_layer=dict(type='Dropout', drop_prob=0.1),),
+                                dropout_layer=dict(type='Dropout', drop_prob=0.1), ),
                            dict(
                                type='MultiScaleDeformableAttention',
                                embed_dims=head_dim,
                                num_heads=8,
                                num_levels=1,
-                                ),
+                            ),
                        ],
                        ffn_cfgs=dict(
                            type='FFN',
                            embed_dims=head_dim,
-                            feedforward_channels=head_dim*2,
+                            feedforward_channels=head_dim * 2,
                            num_fcs=2,
                            ffn_drop=0.1,
-                            act_cfg=dict(type='ReLU', inplace=True),        
+                            act_cfg=dict(type='ReLU', inplace=True),
                        ),
-                        feedforward_channels=head_dim*2,
+                        feedforward_channels=head_dim * 2,
                        ffn_dropout=0.1,
                        operation_order=('norm', 'self_attn', 'norm', 'cross_attn',
-                                        'norm', 'ffn',)))
-                ),
+                                         'norm', 'ffn',)))
+            ),
            positional_encoding=dict(
                type='SinePositionalEncoding',
-                num_feats=head_dim//2,
+                num_feats=head_dim // 2,
                normalize=True,
                offset=-0.5),
            loss_cls=dict(
@@ -176,30 +176,30 @@ model = dict(
                    cost=dict(
                        type='MapQueriesCost',
                        cls_cost=dict(type='FocalLossCost', weight=2.0),
-                        reg_cost=dict(type='BBoxCostC', weight=0.1), # continues
-                        iou_cost=dict(type='IoUCostC', weight=1,box_format='xyxy'), # continues
-                        ),
+                        reg_cost=dict(type='BBoxCostC', weight=0.1),  # continues
+                        iou_cost=dict(type='IoUCostC', weight=1, box_format='xyxy'),  # continues
                    ),
                ),
+            ),
        ),
        gen_net_cfg=dict(
            type='PolylineGenerator',
            in_channels=128,
            encoder_config=None,
            decoder_config={
-                    'layer_config': {
-                        'd_model': 256,
-                        'nhead': 8,
-                        'dim_feedforward': 512,
-                        'dropout': 0.2,
-                        'norm_first': True,
-                        're_zero': True,
-                    },
-                    'num_layers': 6,
+                'layer_config': {
+                    'd_model': 256,
+                    'nhead': 8,
+                    'dim_feedforward': 512,
+                    'dropout': 0.2,
+                    'norm_first': True,
+                    're_zero': True,
                },
+                'num_layers': 6,
+            },
            class_conditional=True,
            num_classes=num_class,
-            canvas_size=canvas_size, #xy
+            canvas_size=canvas_size,  # xy
            max_seq_length=500,
            decoder_cross_attention=False,
            use_discrete_vertex_embeddings=True,
@@ -207,7 +207,7 @@ model = dict(
        max_num_vertices=80,
        top_p_gen_model=0.9,
        sync_cls_avg_factor=True,
-        ),  
+    ),
    with_auxiliary_head=False,
    model_name='VectorMapNet'
 )
@@ -226,11 +226,11 @@ train_pipeline = [
        canvas_size=canvas_size,  # xy
        coord_dim=2,
        num_class=num_class,
-        threshold=4/200,
+        threshold=4 / 200,
    ),
    dict(type='LoadMultiViewImagesFromFiles'),
    dict(type='ResizeMultiViewImages',
-         size = (int(128*2), int((16/9*128)*2)), # H, W
+         size=(int(128 * 2), int((16 / 9 * 128) * 2)),  # H, W
         change_intrinsics=True,
         ),
    dict(type='Normalize3D', **img_norm_cfg),
@@ -243,7 +243,7 @@ train_pipeline = [
 test_pipeline = [
    dict(type='LoadMultiViewImagesFromFiles'),
    dict(type='ResizeMultiViewImages',
-         size=img_size, # H, W
+         size=img_size,  # H, W
         change_intrinsics=True,
         ),
    dict(type='Normalize3D', **img_norm_cfg),
@@ -296,9 +296,9 @@ optimizer = dict(
    type='AdamW',
    lr=1e-3,
    paramwise_cfg=dict(
-    custom_keys={
-        'backbone': dict(lr_mult=0.1),
-    }),
+        custom_keys={
+            'backbone': dict(lr_mult=0.1),
+        }),
    weight_decay=0.01)
 optimizer_config = dict(grad_clip=dict(max_norm=3.5, norm_type=2))

@@ -315,7 +315,7 @@ total_epochs = 130
 # kwargs for dataset evaluation
 eval_kwargs = dict()
 evaluation = dict(
-    interval=5, 
+    interval=5,
    **eval_kwargs)

 runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)

--- a/autonomous_driving/Online-HD-Map-Construction/src/configs/vectormapnet_intern.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/configs/vectormapnet_intern.py
@@ -11,12 +11,12 @@ meta = {
    'output_format': 'vector',

    # NOTE: please modify the information below
-    'method': 'VectorMapNet', # name of your method
+    'method': 'VectorMapNet',  # name of your method
    'authors': ['Yicheng Liu', 'Tianyuan Yuan', 'Yue Wang',
-        'Yilun Wang', 'Hang Zhao'], # author names
-    'e-mail': 'yuantianyuan01@gmail.com', # your e-mail address
-    'institution / company': 'MarsLab, Tsinghua University', # your organization
-    'country / region': 'xxx', # (IMPORTANT) your country/region in iso3166 standard
+                'Yilun Wang', 'Hang Zhao'],  # author names
+    'e-mail': 'yuantianyuan01@gmail.com',  # your e-mail address
+    'institution / company': 'MarsLab, Tsinghua University',  # your organization
+    'country / region': 'xxx',  # (IMPORTANT) your country/region in iso3166 standard
 }

 # model type
@@ -28,11 +28,11 @@ plugin_dir = 'src/'

 # img configs
 # img_norm_cfg = dict(
-    # mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
+# mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
 img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

-img_size = (int(128*2), int((16/9*128)*2))
+img_size = (int(128 * 2), int((16 / 9 * 128) * 2))

 # category configs
 cat2id = {
@@ -43,14 +43,14 @@ cat2id = {
 num_class = max(list(cat2id.values())) + 1

 # bev configs
-roi_size = (60, 30) # bev range, 60m in x-axis, 30m in y-axis
-canvas_size = (200, 100) # bev feature size
+roi_size = (60, 30)  # bev range, 60m in x-axis, 30m in y-axis
+canvas_size = (200, 100)  # bev feature size

 # vectorize params
-coords_dim = 2 # polylines coordinates dimension, 2 or 3
-sample_dist = -1 # sampling params, vectormapnet uses simplify
-sample_num = -1 # sampling params, vectormapnet uses simplify
-simplify = True # sampling params, vectormapnet uses simplify
+coords_dim = 2  # polylines coordinates dimension, 2 or 3
+sample_dist = -1  # sampling params, vectormapnet uses simplify
+sample_num = -1  # sampling params, vectormapnet uses simplify
+simplify = True  # sampling params, vectormapnet uses simplify

 # model configs
 head_dim = 256
@@ -62,20 +62,20 @@ model = dict(
    backbone_cfg=dict(
        type='IPMEncoder',
        img_backbone=dict(
-        _delete_=True,
-        type='InternImage',
-        core_op='DCNv3',
-        channels=80,
-        depths=[4, 4, 21, 4],
-        groups=[5, 10, 20, 40],
-        mlp_ratio=4.,
-        drop_path_rate=0.3,
-        norm_layer='LN',
-        layer_scale=1.0,
-        offset_scale=1.0,
-        post_norm=True,
-        with_cp=False,
-        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
+            _delete_=True,
+            type='InternImage',
+            core_op='DCNv3',
+            channels=80,
+            depths=[4, 4, 21, 4],
+            groups=[5, 10, 20, 40],
+            mlp_ratio=4.,
+            drop_path_rate=0.3,
+            norm_layer='LN',
+            layer_scale=1.0,
+            offset_scale=1.0,
+            post_norm=True,
+            with_cp=False,
+            init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
        img_neck=dict(
            type='FPN',
            in_channels=[80, 160, 320, 640],
@@ -89,21 +89,21 @@ model = dict(
        upsample=dict(
            zoom_size=(1, 2, 4, 8),
            in_channels=128,
-            out_channels=128,),
-        xbound=[-roi_size[0]/2, roi_size[0]/2, roi_size[0]/canvas_size[0]],
-        ybound=[-roi_size[1]/2, roi_size[1]/2, roi_size[1]/canvas_size[1]],
+            out_channels=128, ),
+        xbound=[-roi_size[0] / 2, roi_size[0] / 2, roi_size[0] / canvas_size[0]],
+        ybound=[-roi_size[1] / 2, roi_size[1] / 2, roi_size[1] / canvas_size[1]],
        heights=[-1.1, 0, 0.5, 1.1],
        out_channels=128,
        pretrained=None,
        num_cam=7,
-        ),
+    ),
    head_cfg=dict(
        type='DGHead',
        augmentation=True,
        augmentation_kwargs=dict(
-            p=0.3,scale=0.01,
+            p=0.3, scale=0.01,
            bbox_type='xyxy',
-            ),
+        ),
        det_net_cfg=dict(
            type='MapElementDetector',
            num_query=120,
@@ -139,30 +139,30 @@ model = dict(
                                num_heads=8,
                                attn_drop=0.1,
                                proj_drop=0.1,
-                                dropout_layer=dict(type='Dropout', drop_prob=0.1),),
+                                dropout_layer=dict(type='Dropout', drop_prob=0.1), ),
                            dict(
                                type='MultiScaleDeformableAttention',
                                embed_dims=head_dim,
                                num_heads=8,
                                num_levels=1,
-                                ),
+                            ),
                        ],
                        ffn_cfgs=dict(
                            type='FFN',
                            embed_dims=head_dim,
-                            feedforward_channels=head_dim*2,
+                            feedforward_channels=head_dim * 2,
                            num_fcs=2,
                            ffn_drop=0.1,
                            act_cfg=dict(type='ReLU', inplace=True),
                        ),
-                        feedforward_channels=head_dim*2,
+                        feedforward_channels=head_dim * 2,
                        ffn_dropout=0.1,
                        operation_order=('norm', 'self_attn', 'norm', 'cross_attn',
-                                        'norm', 'ffn',)))
-                ),
+                                         'norm', 'ffn',)))
+            ),
            positional_encoding=dict(
                type='SinePositionalEncoding',
-                num_feats=head_dim//2,
+                num_feats=head_dim // 2,
                normalize=True,
                offset=-0.5),
            loss_cls=dict(
@@ -180,30 +180,30 @@ model = dict(
                    cost=dict(
                        type='MapQueriesCost',
                        cls_cost=dict(type='FocalLossCost', weight=2.0),
-                        reg_cost=dict(type='BBoxCostC', weight=0.1), # continues
-                        iou_cost=dict(type='IoUCostC', weight=1,box_format='xyxy'), # continues
-                        ),
+                        reg_cost=dict(type='BBoxCostC', weight=0.1),  # continues
+                        iou_cost=dict(type='IoUCostC', weight=1, box_format='xyxy'),  # continues
                    ),
                ),
+            ),
        ),
        gen_net_cfg=dict(
            type='PolylineGenerator',
            in_channels=128,
            encoder_config=None,
            decoder_config={
-                    'layer_config': {
-                        'd_model': 256,
-                        'nhead': 8,
-                        'dim_feedforward': 512,
-                        'dropout': 0.2,
-                        'norm_first': True,
-                        're_zero': True,
-                    },
-                    'num_layers': 6,
+                'layer_config': {
+                    'd_model': 256,
+                    'nhead': 8,
+                    'dim_feedforward': 512,
+                    'dropout': 0.2,
+                    'norm_first': True,
+                    're_zero': True,
                },
+                'num_layers': 6,
+            },
            class_conditional=True,
            num_classes=num_class,
-            canvas_size=canvas_size, #xy
+            canvas_size=canvas_size,  # xy
            max_seq_length=500,
            decoder_cross_attention=False,
            use_discrete_vertex_embeddings=True,
@@ -211,7 +211,7 @@ model = dict(
        max_num_vertices=80,
        top_p_gen_model=0.9,
        sync_cls_avg_factor=True,
-        ),
+    ),
    with_auxiliary_head=False,
    model_name='VectorMapNet'
 )
@@ -230,11 +230,11 @@ train_pipeline = [
        canvas_size=canvas_size,  # xy
        coord_dim=2,
        num_class=num_class,
-        threshold=4/200,
+        threshold=4 / 200,
    ),
    dict(type='LoadMultiViewImagesFromFiles'),
    dict(type='ResizeMultiViewImages',
-         size = (int(128*2), int((16/9*128)*2)), # H, W
+         size=(int(128 * 2), int((16 / 9 * 128) * 2)),  # H, W
         change_intrinsics=True,
         ),
    dict(type='Normalize3D', **img_norm_cfg),
@@ -247,7 +247,7 @@ train_pipeline = [
 test_pipeline = [
    dict(type='LoadMultiViewImagesFromFiles'),
    dict(type='ResizeMultiViewImages',
-         size=img_size, # H, W
+         size=img_size,  # H, W
         change_intrinsics=True,
         ),
    dict(type='Normalize3D', **img_norm_cfg),
@@ -300,9 +300,9 @@ optimizer = dict(
    type='AdamW',
    lr=1e-3,
    paramwise_cfg=dict(
-    custom_keys={
-        'backbone': dict(lr_mult=0.1),
-    }),
+        custom_keys={
+            'backbone': dict(lr_mult=0.1),
+        }),
    weight_decay=0.01)
 optimizer_config = dict(grad_clip=dict(max_norm=3.5, norm_type=2))


--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/__init__.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/__init__.py
-from .pipelines import *
-from .argo_dataset import AV2Dataset
\ No newline at end of file
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/argo_dataset.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/argo_dataset.py
-from .base_dataset import BaseMapDataset
-from mmdet.datasets import DATASETS
-import numpy as np
+import os
 from time import time
+
 import mmcv
-import os
+import numpy as np
+from mmdet.datasets import DATASETS
 from shapely.geometry import LineString

+from .base_dataset import BaseMapDataset
+
+
 @DATASETS.register_module()
 class AV2Dataset(BaseMapDataset):
    """Argoverse2 map dataset class.
@@ -22,9 +25,9 @@ class AV2Dataset(BaseMapDataset):
        test_mode (bool): whether in test mode
    """

-    def __init__(self, **kwargs,):
+    def __init__(self, **kwargs, ):
        super().__init__(**kwargs)
-    
+
    def load_annotations(self, ann_file):
        """Load annotations from ann_file.

@@ -34,20 +37,20 @@ class AV2Dataset(BaseMapDataset):
        Returns:
            list[dict]: List of annotations.
        """
-        
+
        start_time = time()
        ann = mmcv.load(ann_file)
        samples = []
        for seg_id, sequence in ann.items():
            samples.extend(sequence)
        samples = samples[::self.interval]
-        
+
        print(f'collected {len(samples)} samples in {(time() - start_time):.2f}s')
        self.samples = samples

    def get_sample(self, idx):
-        """Get data sample. For each sample, map extractor will be applied to extract 
-        map elements. 
+        """Get data sample. For each sample, map extractor will be applied to extract
+        map elements.

        Args:
            idx (int): data index
@@ -57,7 +60,7 @@ class AV2Dataset(BaseMapDataset):
        """

        sample = self.samples[idx]
-        
+
        if not self.test_mode:
            ann = sample['annotation']

@@ -66,7 +69,7 @@ class AV2Dataset(BaseMapDataset):
            for k, v in ann.items():
                if k in self.cat2id.keys():
                    map_label2geom[self.cat2id[k]] = [LineString(np.array(l)[:, :3]) for l in v]
-        
+
        ego2img_rts = []
        cams = sample['sensor']
        for c in cams.values():
@@ -87,10 +90,10 @@ class AV2Dataset(BaseMapDataset):
            # extrinsics are 4x4 tranform matrix, NOTE: **ego2cam**
            'cam_extrinsics': [c['extrinsic'] for c in cams.values()],
            'ego2img': ego2img_rts,
-            'ego2global_translation': pose['ego2global_translation'], 
+            'ego2global_translation': pose['ego2global_translation'],
            'ego2global_rotation': pose['ego2global_rotation'],
        }
        if not self.test_mode:
-            input_dict.update({'map_geoms': map_label2geom}) # {0: List[ped_crossing(LineString)], 1: ...}})
+            input_dict.update({'map_geoms': map_label2geom})  # {0: List[ped_crossing(LineString)], 1: ...}})

-        return input_dict
\ No newline at end of file
+        return input_dict
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/base_dataset.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/base_dataset.py
-import numpy as np
 import os
 import os.path as osp
-import mmcv
-from .evaluation.vector_eval import VectorEvaluate
+import warnings

+import mmcv
+import numpy as np
 from mmdet3d.datasets.pipelines import Compose
 from mmdet.datasets import DATASETS
 from torch.utils.data import Dataset
-import warnings

-warnings.filterwarnings("ignore")
+from .evaluation.vector_eval import VectorEvaluate
+
+warnings.filterwarnings('ignore')
+

 @DATASETS.register_module()
 class BaseMapDataset(Dataset):
@@ -26,7 +28,8 @@ class BaseMapDataset(Dataset):
        work_dir (str): path to work dir
        test_mode (bool): whether in test mode
    """
-    def __init__(self, 
+
+    def __init__(self,
                 ann_file,
                 root_path,
                 cat2id,
@@ -36,12 +39,12 @@ class BaseMapDataset(Dataset):
                 interval=1,
                 work_dir=None,
                 test_mode=False,
-        ):
+                 ):
        super().__init__()
        self.ann_file = ann_file
        self.meta = meta
        self.root_path = root_path
-        
+
        self.classes = list(cat2id.keys())
        self.num_classes = len(self.classes)
        self.cat2id = cat2id
@@ -60,12 +63,12 @@ class BaseMapDataset(Dataset):
            self.pipeline = Compose(pipeline)
        else:
            self.pipeline = None
-        
+
        # dummy flags to fit with mmdet dataset
        self.flag = np.zeros(len(self), dtype=np.uint8)

        self.roi_size = roi_size
-        
+
        self.work_dir = work_dir
        self.test_mode = test_mode

@@ -77,7 +80,7 @@ class BaseMapDataset(Dataset):

    def format_results(self, results, denormalize=True, prefix=None):
        '''Format prediction result to submission format.
-        
+
        Args:
            results (list[Tensor]): List of prediction results.
            denormalize (bool): whether to denormalize prediction from (0, 1) \
@@ -99,18 +102,18 @@ class BaseMapDataset(Dataset):
            For each case, the result should be formatted as Dict{'vectors': [], 'scores': [], 'labels': []}
            'vectors': List of vector, each vector is a array([[x1, y1], [x2, y2] ...]),
                contain all vectors predicted in this sample.
-            'scores: List of score(float), 
+            'scores: List of score(float),
                contain scores of all instances in this sample.
-            'labels': List of label(int), 
+            'labels': List of label(int),
                contain labels of all instances in this sample.
            '''
-            if pred is None: # empty prediction
+            if pred is None:  # empty prediction
                continue
-            
+
            single_case = {'vectors': [], 'scores': [], 'labels': []}
            token = pred['token']
            roi_size = np.array(self.roi_size)
-            origin = -np.array([self.roi_size[0]/2, self.roi_size[1]/2])
+            origin = -np.array([self.roi_size[0] / 2, self.roi_size[1] / 2])

            for i in range(len(pred['scores'])):
                score = pred['scores'][i]
@@ -120,7 +123,7 @@ class BaseMapDataset(Dataset):
                # A line should have >=2 points
                if len(vector) < 2:
                    continue
-                
+
                if denormalize:
                    eps = 2
                    vector = vector * (roi_size + eps) + origin
@@ -128,9 +131,9 @@ class BaseMapDataset(Dataset):
                single_case['vectors'].append(vector)
                single_case['scores'].append(score)
                single_case['labels'].append(label)
-            
+
            submissions['results'][token] = single_case
-        
+
        out_path = osp.join(prefix, 'submission_vector.json')
        print(f'\nsaving submissions results to {out_path}')
        os.makedirs(os.path.dirname(out_path), exist_ok=True)
@@ -152,7 +155,7 @@ class BaseMapDataset(Dataset):
        self.evaluator = VectorEvaluate(self.ann_file)

        print('len of the results', len(results))
-        
+
        result_path = self.format_results(results, denormalize=True, prefix=self.work_dir)

        result_dict = self.evaluator.evaluate(result_path, logger=logger)
@@ -165,7 +168,7 @@ class BaseMapDataset(Dataset):
            int: Length of data infos.
        """
        return len(self.samples)
-        
+
    def _rand_another(self, idx):
        """Randomly get another item.

@@ -183,4 +186,3 @@ class BaseMapDataset(Dataset):
        input_dict = self.get_sample(idx)
        data = self.pipeline(input_dict)
        return data
-
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/evaluation/AP.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/evaluation/AP.py
-import numpy as np
-from .distance import chamfer_distance, frechet_distance
 from typing import List, Tuple, Union
+
+import numpy as np
 from numpy.typing import NDArray

+from .distance import chamfer_distance, frechet_distance
+
+
 def average_precision(recalls, precisions, mode='area'):
-    """Calculate average precision. 
+    """Calculate average precision.

    Args:
        recalls (ndarray): shape (num_dets, )
@@ -31,11 +34,11 @@ def average_precision(recalls, precisions, mode='area'):
        mpre = np.hstack((zeros, precisions, zeros))
        for i in range(mpre.shape[1] - 1, 0, -1):
            mpre[:, i - 1] = np.maximum(mpre[:, i - 1], mpre[:, i])
-        
+
        ind = np.where(mrec[0, 1:] != mrec[0, :-1])[0]
        ap = np.sum(
            (mrec[0, ind + 1] - mrec[0, ind]) * mpre[0, ind + 1])
-    
+
    elif mode == '11points':
        for thr in np.arange(0, 1 + 1e-3, 0.1):
            precs = precisions[0, recalls[i, :] >= thr]
@@ -45,14 +48,15 @@ def average_precision(recalls, precisions, mode='area'):
    else:
        raise ValueError(
            'Unrecognized mode, only "area" and "11points" are supported')
-    
+
    return ap

-def instance_match(pred_lines: List[NDArray], 
-                   scores: NDArray, 
-                   gt_lines: List[NDArray], 
-                   thresholds: Union[Tuple, List], 
-                   metric: str='chamfer') -> List:
+
+def instance_match(pred_lines: List[NDArray],
+                   scores: NDArray,
+                   gt_lines: List[NDArray],
+                   thresholds: Union[Tuple, List],
+                   metric: str = 'chamfer') -> List:
    """Compute whether detected lines are true positive or false positive.

    Args:
@@ -71,7 +75,7 @@ def instance_match(pred_lines: List[NDArray],

    elif metric == 'frechet':
        distance_fn = frechet_distance
-    
+
    else:
        raise ValueError(f'unknown distance function {metric}')

@@ -89,7 +93,7 @@ def instance_match(pred_lines: List[NDArray],
        for thr in thresholds:
            tp_fp_list.append((tp.copy(), fp.copy()))
        return tp_fp_list
-    
+
    if num_preds == 0:
        for thr in thresholds:
            tp_fp_list.append((tp.copy(), fp.copy()))
@@ -126,7 +130,7 @@ def instance_match(pred_lines: List[NDArray],
                    fp[i] = 1
            else:
                fp[i] = 1
-        
+
        tp_fp_list.append((tp, fp))

-    return tp_fp_list
\ No newline at end of file
+    return tp_fp_list
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/evaluation/distance.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/evaluation/distance.py
-from scipy.spatial import distance
 from numpy.typing import NDArray
+from scipy.spatial import distance
+

 def chamfer_distance(line1: NDArray, line2: NDArray) -> float:
-    ''' Calculate chamfer distance between two lines. Make sure the 
+    ''' Calculate chamfer distance between two lines. Make sure the
    lines are interpolated.

    Args:
        line1 (array): coordinates of line1
        line2 (array): coordinates of line2
-    
+
    Returns:
        distance (float): chamfer distance
    '''
-    
+
    dist_matrix = distance.cdist(line1, line2, 'euclidean')
    dist12 = dist_matrix.min(-1).sum() / len(line1)
    dist21 = dist_matrix.min(-2).sum() / len(line2)

    return (dist12 + dist21) / 2

+
 def frechet_distance(line1: NDArray, line2: NDArray) -> float:
-    ''' Calculate frechet distance between two lines. Make sure the 
+    ''' Calculate frechet distance between two lines. Make sure the
    lines are interpolated.

    Args:
        line1 (array): coordinates of line1
        line2 (array): coordinates of line2
-    
+
    Returns:
        distance (float): frechet distance
    '''
-    
-    raise NotImplementedError

+    raise NotImplementedError
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/evaluation/vector_eval.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/evaluation/vector_eval.py
 from functools import partial
-import numpy as np
+from logging import Logger
 from multiprocessing import Pool
-from mmdet3d.datasets import build_dataset, build_dataloader
+from time import time
+from typing import Dict, List, Optional
+
 import mmcv
-from .AP import instance_match, average_precision
+import numpy as np
 import prettytable
-from time import time
-from functools import cached_property
-from shapely.geometry import LineString
 from numpy.typing import NDArray
-from typing import Dict, List, Optional
-from logging import Logger
-from mmcv import Config
-from copy import deepcopy
+from shapely.geometry import LineString

-INTERP_NUM = 100 # number of points to interpolate during evaluation
-SAMPLE_DIST = 0.3 # fixed sample distance
-THRESHOLDS = [0.5, 1.0, 1.5] # AP thresholds
-N_WORKERS = 16 # num workers to parallel
+from .AP import average_precision, instance_match
+
+INTERP_NUM = 100  # number of points to interpolate during evaluation
+SAMPLE_DIST = 0.3  # fixed sample distance
+THRESHOLDS = [0.5, 1.0, 1.5]  # AP thresholds
+N_WORKERS = 16  # num workers to parallel

 CAT2ID = {
    'ped_crossing': 0,
@@ -25,6 +23,7 @@ CAT2ID = {
    'boundary': 2,
 }

+
 class VectorEvaluate(object):
    """Evaluator for vectorized map.

@@ -33,7 +32,7 @@ class VectorEvaluate(object):
        n_workers (int): num workers to parallel
    """

-    def __init__(self, ann_file, n_workers: int=N_WORKERS) -> None:
+    def __init__(self, ann_file, n_workers: int = N_WORKERS) -> None:
        ann = mmcv.load(ann_file)
        gts = {}
        for seg_id, seq in ann.items():
@@ -42,69 +41,69 @@ class VectorEvaluate(object):
                for cat, vectors in frame['annotation'].items():
                    # only evaluate in 2-dimension
                    ann[cat] = [np.array(v)[:, :2] for v in vectors]
-                    
+
                gts[frame['timestamp']] = ann
-        
+
        self.gts = gts
        self.n_workers = n_workers
        self.cat2id = CAT2ID
        self.id2cat = {v: k for k, v in self.cat2id.items()}
-    
-    def interp_fixed_num(self, 
-                         vector: NDArray, 
+
+    def interp_fixed_num(self,
+                         vector: NDArray,
                         num_pts: int) -> NDArray:
        ''' Interpolate a polyline.
-        
+
        Args:
            vector (array): line coordinates, shape (M, 2)
-            num_pts (int): 
-        
+            num_pts (int):
+
        Returns:
            sampled_points (array): interpolated coordinates
        '''
        line = LineString(vector)
        distances = np.linspace(0, line.length, num_pts)
-        sampled_points = np.array([list(line.interpolate(distance).coords) 
-            for distance in distances]).squeeze()
-        
+        sampled_points = np.array([list(line.interpolate(distance).coords)
+                                   for distance in distances]).squeeze()
+
        return sampled_points
-    
-    def interp_fixed_dist(self, 
+
+    def interp_fixed_dist(self,
                          vector: NDArray,
                          sample_dist: float) -> NDArray:
        ''' Interpolate a line at fixed interval.
-        
+
        Args:
            vector (LineString): vector
            sample_dist (float): sample interval
-        
+
        Returns:
            points (array): interpolated points, shape (N, 2)
        '''
        line = LineString(vector)
        distances = list(np.arange(sample_dist, line.length, sample_dist))
        # make sure to sample at least two points when sample_dist > line.length
-        distances = [0,] + distances + [line.length,] 
-        
+        distances = [0, ] + distances + [line.length, ]
+
        sampled_points = np.array([list(line.interpolate(distance).coords)
-                                for distance in distances]).squeeze()
-        
+                                   for distance in distances]).squeeze()
+
        return sampled_points

-    def _evaluate_single(self, 
-                         pred_vectors: List, 
-                         scores: List, 
-                         groundtruth: List, 
-                         thresholds: List, 
-                         metric: str='metric') -> Dict[int, NDArray]:
+    def _evaluate_single(self,
+                         pred_vectors: List,
+                         scores: List,
+                         groundtruth: List,
+                         thresholds: List,
+                         metric: str = 'metric') -> Dict[int, NDArray]:
        ''' Do single-frame matching for one class.
-        
+
        Args:
-            pred_vectors (List): List[vector(ndarray) (different length)], 
+            pred_vectors (List): List[vector(ndarray) (different length)],
            scores (List): List[score(float)]
            groundtruth (List): List of vectors
            thresholds (List): List of thresholds
-        
+
        Returns:
            tp_fp_score_by_thr (Dict): matching results at different thresholds
                e.g. {0.5: (M, 2), 1.0: (M, 2), 1.5: (M, 2)}
@@ -125,36 +124,36 @@ class VectorEvaluate(object):
            # vector_interp = self.interp_fixed_num(vector, INTERP_NUM)
            vector_interp = self.interp_fixed_dist(vector, SAMPLE_DIST)
            gt_lines.append(vector_interp)
-        
+
        scores = np.array(scores)
-        tp_fp_list = instance_match(pred_lines, scores, gt_lines, thresholds, metric) # (M, 2)
+        tp_fp_list = instance_match(pred_lines, scores, gt_lines, thresholds, metric)  # (M, 2)
        tp_fp_score_by_thr = {}
        for i, thr in enumerate(thresholds):
            tp, fp = tp_fp_list[i]
            tp_fp_score = np.hstack([tp[:, None], fp[:, None], scores[:, None]])
            tp_fp_score_by_thr[thr] = tp_fp_score
-        
-        return tp_fp_score_by_thr # {0.5: (M, 2), 1.0: (M, 2), 1.5: (M, 2)}
-        
-    def evaluate(self, 
-                 result_path: str, 
-                 metric: str='chamfer', 
-                 logger: Optional[Logger]=None) -> Dict[str, float]:
+
+        return tp_fp_score_by_thr  # {0.5: (M, 2), 1.0: (M, 2), 1.5: (M, 2)}
+
+    def evaluate(self,
+                 result_path: str,
+                 metric: str = 'chamfer',
+                 logger: Optional[Logger] = None) -> Dict[str, float]:
        ''' Do evaluation for a submission file and print evalution results to `logger` if specified.
        The submission will be aligned by tokens before evaluation. We use multi-worker to speed up.
-        
+
        Args:
            result_path (str): path to submission file
            metric (str): distance metric. Default: 'chamfer'
            logger (Logger): logger to print evaluation result, Default: None
-        
+
        Returns:
            new_result_dict (Dict): evaluation results. AP by categories.
        '''
-        
+
        results = mmcv.load(result_path)
        results = results['results']
-        
+
        # re-group samples and gt by label
        samples_by_cls = {label: [] for label in self.id2cat.keys()}
        num_gts = {label: 0 for label in self.id2cat.keys()}
@@ -166,7 +165,7 @@ class VectorEvaluate(object):
                pred = results[token]
            else:
                pred = {'vectors': [], 'scores': [], 'labels': []}
-            
+
            # for every sample
            vectors_by_cls = {label: [] for label in self.id2cat.keys()}
            scores_by_cls = {label: [] for label in self.id2cat.keys()}
@@ -192,11 +191,11 @@ class VectorEvaluate(object):
        start = time()
        if self.n_workers > 0:
            pool = Pool(self.n_workers)
-        
+
        sum_mAP = 0
        pbar = mmcv.ProgressBar(len(self.id2cat))
        for label in self.id2cat.keys():
-            samples = samples_by_cls[label] # List[(pred_lines, scores, gts)]
+            samples = samples_by_cls[label]  # List[(pred_lines, scores, gts)]
            result_dict[self.id2cat[label]] = {
                'num_gts': num_gts[label],
                'num_preds': num_preds[label]
@@ -210,14 +209,14 @@ class VectorEvaluate(object):
                tpfp_score_list = []
                for sample in samples:
                    tpfp_score_list.append(fn(*sample))
-            
+
            for thr in THRESHOLDS:
                tp_fp_score = [i[thr] for i in tpfp_score_list]
-                tp_fp_score = np.vstack(tp_fp_score) # (num_dets, 3)
+                tp_fp_score = np.vstack(tp_fp_score)  # (num_dets, 3)
                sort_inds = np.argsort(-tp_fp_score[:, -1])

-                tp = tp_fp_score[sort_inds, 0] # (num_dets,)
-                fp = tp_fp_score[sort_inds, 1] # (num_dets,)
+                tp = tp_fp_score[sort_inds, 0]  # (num_dets,)
+                fp = tp_fp_score[sort_inds, 1]  # (num_dets,)
                tp = np.cumsum(tp, axis=0)
                fp = np.cumsum(fp, axis=0)
                eps = np.finfo(np.float32).eps
@@ -229,38 +228,38 @@ class VectorEvaluate(object):
                result_dict[self.id2cat[label]].update({f'AP@{thr}': AP})

            pbar.update()
-            
+
            AP = sum_AP / len(THRESHOLDS)
            sum_mAP += AP

            result_dict[self.id2cat[label]].update({f'AP': AP})
-        
+
        if self.n_workers > 0:
            pool.close()
-        
+
        mAP = sum_mAP / len(self.id2cat.keys())
        result_dict.update({'mAP': mAP})
-        
-        print(f"finished in {time() - start:.2f}s")
+
+        print(f'finished in {time() - start:.2f}s')

        # print results
-        table = prettytable.PrettyTable(['category', 'num_preds', 'num_gts'] + 
-                [f'AP@{thr}' for thr in THRESHOLDS] + ['AP'])
+        table = prettytable.PrettyTable(['category', 'num_preds', 'num_gts'] +
+                                        [f'AP@{thr}' for thr in THRESHOLDS] + ['AP'])
        for label in self.id2cat.keys():
            table.add_row([
-                self.id2cat[label], 
+                self.id2cat[label],
                result_dict[self.id2cat[label]]['num_preds'],
                result_dict[self.id2cat[label]]['num_gts'],
                *[round(result_dict[self.id2cat[label]][f'AP@{thr}'], 4) for thr in THRESHOLDS],
                round(result_dict[self.id2cat[label]]['AP'], 4),
            ])
-        
+
        from mmcv.utils import print_log
-        print_log('\n'+str(table), logger=logger)
+        print_log('\n' + str(table), logger=logger)
        print_log(f'mAP = {mAP:.4f}\n', logger=logger)

        new_result_dict = {}
        for name in self.cat2id:
            new_result_dict[name] = result_dict[name]['AP']

-        return new_result_dict
\ No newline at end of file
+        return new_result_dict
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/pipelines/__init__.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/pipelines/__init__.py
-from .loading import LoadMultiViewImagesFromFiles
 from .formating import FormatBundleMap
-from .transform import ResizeMultiViewImages, PadMultiViewImages, Normalize3D
-from .vectorize import VectorizeMap
+from .loading import LoadMultiViewImagesFromFiles
 from .poly_bbox import PolygonizeLocalMapBbox
+from .transform import Normalize3D, PadMultiViewImages, ResizeMultiViewImages
+from .vectorize import VectorizeMap
+
 # for argoverse

 __all__ = [
    'LoadMultiViewImagesFromFiles',
    'FormatBundleMap', 'Normalize3D', 'ResizeMultiViewImages', 'PadMultiViewImages',
    'VectorizeMap', 'PolygonizeLocalMapBbox'
-]
\ No newline at end of file
+]
--- a/autonomous_driving/Online-HD-Map-Construction/src/datasets/pipelines/formating.py
+++ b/autonomous_driving/Online-HD-Map-Construction/src/datasets/pipelines/formating.py
 import numpy as np
 from mmcv.parallel import DataContainer as DC
-
 from mmdet3d.core.points import BasePoints
 from mmdet.datasets.builder import PIPELINES
 from mmdet.datasets.pipelines import to_tensor

+
 @PIPELINES.register_module()
 class FormatBundleMap(object):
    """Format data for map tasks and then collect data for model input.
@@ -17,10 +17,10 @@ class FormatBundleMap(object):
    - img_metas: (1) to DataContainer (cpu_only=True)
    """

-    def __init__(self, process_img=True, 
-                keys=['img', 'semantic_mask', 'vectors'], 
-                meta_keys=['intrinsics', 'extrinsics']):
-        
+    def __init__(self, process_img=True,
+                 keys=['img', 'semantic_mask', 'vectors'],
+                 meta_keys=['intrinsics', 'extrinsics']):
+
        self.process_img = process_img
        self.keys = keys
        self.meta_keys = meta_keys
@@ -54,7 +54,7 @@ class FormatBundleMap(object):
            else:
                img = np.ascontiguousarray(results['img'].transpose(2, 0, 1))
                results['img'] = DC(to_tensor(img), stack=True)
-        
+
        if 'semantic_mask' in results:
            results['semantic_mask'] = DC(to_tensor(results['semantic_mask']), stack=True)

@@ -62,7 +62,7 @@ class FormatBundleMap(object):
            # vectors may have different sizes
            vectors = results['vectors']
            results['vectors'] = DC(vectors, stack=False, cpu_only=True)
-        
+
        if 'polys' in results:
            results['polys'] = DC(results['polys'], stack=False, cpu_only=True)