Commit e9cee049 authored by luopl's avatar luopl
Browse files

Initial commit

parents
Pipeline #1056 canceled with stages
## Prompt Tuning for YOLO-World
### NOTE:
This folder contains many experimental config files, which will be removed later!!
### Experimental Results
| Model | Config | AP | AP50 | AP75 | APS | APM | APL |
| :---- | :----: | :--: | :--: | :---: | :-: | :-: | :-: |
| YOLO-World-v2-L | Zero-shot | 45.7 | 61.6 | 49.8 | 29.9 | 50.0 | 60.8 |
| [YOLO-World-v2-L](./../configs/prompt_tuning_coco/yolo_world_v2_l_vlpan_bn_2e-4_80e_8gpus_mask-refine_prompt_tuning_coco.py) | Prompt tuning | 47.9 | 64.3 | 52.5 | 31.9 | 52.6 | 61.3 |
## Fine-tuning YOLO-World for Instance Segmentation
### Models
We fine-tune YOLO-World on LVIS (`LVIS-Base`) with mask annotations for open-vocabulary (zero-shot) instance segmentation.
We provide two fine-tuning strategies YOLO-World towards open-vocabulary instance segmentation:
* fine-tuning `all modules`: leads to better LVIS segmentation accuracy but affects the zero-shot performance.
* fine-tuning the `segmentation head`: maintains the zero-shot performanc but lowers LVIS segmentation accuracy.
| Model | Fine-tuning Data | Fine-tuning Modules| AP<sup>mask</su> | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | Weights |
| :---- | :--------------- | :----------------: | :--------------: | :------------: | :------------: | :------------: | :-----: |
| [YOLO-World-Seg-M](./yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 25.9 | 13.4 | 24.9 | 32.6 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis-ca465825.pth) |
| [YOLO-World-v2-Seg-M](./yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 25.9 | 13.4 | 24.9 | 32.6 | [HF Checkpoints 🤗]() |
| [YOLO-World-Seg-L](./yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 28.7 | 15.0 | 28.3 | 35.2| [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis-8c58c916.pth) |
| [YOLO-World-v2-Seg-L](./yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 28.7 | 15.0 | 28.3 | 35.2| [HF Checkpoints 🤗]() |
| [YOLO-World-Seg-M](./yolo_seg_world_m_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 16.7 | 12.6 | 14.6 | 20.8 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis-7bca59a7.pth) |
| [YOLO-World-v2-Seg-M](./yolo_world_v2_seg_m_vlpan_bn_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 17.8 | 13.9 | 15.5 | 22.0 | [HF Checkpoints 🤗]() |
| [YOLO-World-Seg-L](yolo_seg_world_l_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 19.1 | 14.2 | 17.2 | 23.5 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis-5a642d30.pth) |
| [YOLO-World-v2-Seg-L](./yolo_world_v2_seg_l_vlpan_bn_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 19.8 | 17.2 | 17.5 | 23.6 | [HF Checkpoints 🤗]() |
**NOTE:**
1. The mask AP are evaluated on the LVIS `val 1.0`.
2. All models are fine-tuned for 80 epochs on `LVIS-Base` (866 categories, `common + frequent`).
3. The YOLO-World-Seg with only `seg head` fine-tuned maintains the original zero-shot detection capability and segments objects.
[["person"], ["bicycle"], ["car"], ["motorcycle"], ["airplane"], ["bus"], ["train"], ["truck"], ["boat"], ["traffic light"], ["fire hydrant"], ["stop sign"], ["parking meter"], ["bench"], ["bird"], ["cat"], ["dog"], ["horse"], ["sheep"], ["cow"], ["elephant"], ["bear"], ["zebra"], ["giraffe"], ["backpack"], ["umbrella"], ["handbag"], ["tie"], ["suitcase"], ["frisbee"], ["skis"], ["snowboard"], ["sports ball"], ["kite"], ["baseball bat"], ["baseball glove"], ["skateboard"], ["surfboard"], ["tennis racket"], ["bottle"], ["wine glass"], ["cup"], ["fork"], ["knife"], ["spoon"], ["bowl"], ["banana"], ["apple"], ["sandwich"], ["orange"], ["broccoli"], ["carrot"], ["hot dog"], ["pizza"], ["donut"], ["cake"], ["chair"], ["couch"], ["potted plant"], ["bed"], ["dining table"], ["toilet"], ["tv"], ["laptop"], ["mouse"], ["remote"], ["keyboard"], ["cell phone"], ["microwave"], ["oven"], ["toaster"], ["sink"], ["refrigerator"], ["book"], ["clock"], ["vase"], ["scissors"], ["teddy bear"], ["hair drier"], ["toothbrush"]]
This diff is collapsed.
This diff is collapsed.
[["person"], ["sneakers"], ["chair"], ["hat"], ["lamp"], ["bottle"], ["cabinet", "shelf"], ["cup"], ["car"], ["glasses"], ["picture", "frame"], ["desk"], ["handbag"], ["street lights"], ["book"], ["plate"], ["helmet"], ["leather shoes"], ["pillow"], ["glove"], ["potted plant"], ["bracelet"], ["flower"], ["tv"], ["storage box"], ["vase"], ["bench"], ["wine glass"], ["boots"], ["bowl"], ["dining table"], ["umbrella"], ["boat"], ["flag"], ["speaker"], ["trash bin", "can"], ["stool"], ["backpack"], ["couch"], ["belt"], ["carpet"], ["basket"], ["towel", "napkin"], ["slippers"], ["barrel", "bucket"], ["coffee table"], ["suv"], ["toy"], ["tie"], ["bed"], ["traffic light"], ["pen", "pencil"], ["microphone"], ["sandals"], ["canned"], ["necklace"], ["mirror"], ["faucet"], ["bicycle"], ["bread"], ["high heels"], ["ring"], ["van"], ["watch"], ["sink"], ["horse"], ["fish"], ["apple"], ["camera"], ["candle"], ["teddy bear"], ["cake"], ["motorcycle"], ["wild bird"], ["laptop"], ["knife"], ["traffic sign"], ["cell phone"], ["paddle"], ["truck"], ["cow"], ["power outlet"], ["clock"], ["drum"], ["fork"], ["bus"], ["hanger"], ["nightstand"], ["pot", "pan"], ["sheep"], ["guitar"], ["traffic cone"], ["tea pot"], ["keyboard"], ["tripod"], ["hockey"], ["fan"], ["dog"], ["spoon"], ["blackboard", "whiteboard"], ["balloon"], ["air conditioner"], ["cymbal"], ["mouse"], ["telephone"], ["pickup truck"], ["orange"], ["banana"], ["airplane"], ["luggage"], ["skis"], ["soccer"], ["trolley"], ["oven"], ["remote"], ["baseball glove"], ["paper towel"], ["refrigerator"], ["train"], ["tomato"], ["machinery vehicle"], ["tent"], ["shampoo", "shower gel"], ["head phone"], ["lantern"], ["donut"], ["cleaning products"], ["sailboat"], ["tangerine"], ["pizza"], ["kite"], ["computer box"], ["elephant"], ["toiletries"], ["gas stove"], ["broccoli"], ["toilet"], ["stroller"], ["shovel"], ["baseball bat"], ["microwave"], ["skateboard"], ["surfboard"], ["surveillance camera"], ["gun"], ["life saver"], ["cat"], ["lemon"], ["liquid soap"], ["zebra"], ["duck"], ["sports car"], ["giraffe"], ["pumpkin"], ["piano"], ["stop sign"], ["radiator"], ["converter"], ["tissue"], ["carrot"], ["washing machine"], ["vent"], ["cookies"], ["cutting", "chopping board"], ["tennis racket"], ["candy"], ["skating and skiing shoes"], ["scissors"], ["folder"], ["baseball"], ["strawberry"], ["bow tie"], ["pigeon"], ["pepper"], ["coffee machine"], ["bathtub"], ["snowboard"], ["suitcase"], ["grapes"], ["ladder"], ["pear"], ["american football"], ["basketball"], ["potato"], ["paint brush"], ["printer"], ["billiards"], ["fire hydrant"], ["goose"], ["projector"], ["sausage"], ["fire extinguisher"], ["extension cord"], ["facial mask"], ["tennis ball"], ["chopsticks"], ["electronic stove and gas stove"], ["pie"], ["frisbee"], ["kettle"], ["hamburger"], ["golf club"], ["cucumber"], ["clutch"], ["blender"], ["tong"], ["slide"], ["hot dog"], ["toothbrush"], ["facial cleanser"], ["mango"], ["deer"], ["egg"], ["violin"], ["marker"], ["ship"], ["chicken"], ["onion"], ["ice cream"], ["tape"], ["wheelchair"], ["plum"], ["bar soap"], ["scale"], ["watermelon"], ["cabbage"], ["router", "modem"], ["golf ball"], ["pine apple"], ["crane"], ["fire truck"], ["peach"], ["cello"], ["notepaper"], ["tricycle"], ["toaster"], ["helicopter"], ["green beans"], ["brush"], ["carriage"], ["cigar"], ["earphone"], ["penguin"], ["hurdle"], ["swing"], ["radio"], ["cd"], ["parking meter"], ["swan"], ["garlic"], ["french fries"], ["horn"], ["avocado"], ["saxophone"], ["trumpet"], ["sandwich"], ["cue"], ["kiwi fruit"], ["bear"], ["fishing rod"], ["cherry"], ["tablet"], ["green vegetables"], ["nuts"], ["corn"], ["key"], ["screwdriver"], ["globe"], ["broom"], ["pliers"], ["volleyball"], ["hammer"], ["eggplant"], ["trophy"], ["dates"], ["board eraser"], ["rice"], ["tape measure", "ruler"], ["dumbbell"], ["hamimelon"], ["stapler"], ["camel"], ["lettuce"], ["goldfish"], ["meat balls"], ["medal"], ["toothpaste"], ["antelope"], ["shrimp"], ["rickshaw"], ["trombone"], ["pomegranate"], ["coconut"], ["jellyfish"], ["mushroom"], ["calculator"], ["treadmill"], ["butterfly"], ["egg tart"], ["cheese"], ["pig"], ["pomelo"], ["race car"], ["rice cooker"], ["tuba"], ["crosswalk sign"], ["papaya"], ["hair drier"], ["green onion"], ["chips"], ["dolphin"], ["sushi"], ["urinal"], ["donkey"], ["electric drill"], ["spring rolls"], ["tortoise", "turtle"], ["parrot"], ["flute"], ["measuring cup"], ["shark"], ["steak"], ["poker card"], ["binoculars"], ["llama"], ["radish"], ["noodles"], ["yak"], ["mop"], ["crab"], ["microscope"], ["barbell"], ["bread", "bun"], ["baozi"], ["lion"], ["red cabbage"], ["polar bear"], ["lighter"], ["seal"], ["mangosteen"], ["comb"], ["eraser"], ["pitaya"], ["scallop"], ["pencil case"], ["saw"], ["table tennis paddle"], ["okra"], ["starfish"], ["eagle"], ["monkey"], ["durian"], ["game board"], ["rabbit"], ["french horn"], ["ambulance"], ["asparagus"], ["hoverboard"], ["pasta"], ["target"], ["hotair balloon"], ["chainsaw"], ["lobster"], ["iron"], ["flashlight"]]
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment