# Keypoint matching Keypoint matching matches different points of interests that belong to same object appearing in two different images. Most modern keypoint matchers take images as input and output the following: - **Keypoint coordinates (x,y):** one-to-one mapping of pixel coordinates between the first and the second image using two lists. Each keypoint at a given index in the first list is matched to the keypoint at the same index in the second list. - **Matching scores:** Scores assigned to the keypoint matches. In this tutorial, you will extract keypoint matches with the [`EfficientLoFTR`] model trained with the [MatchAnything framework](https://huggingface.co/zju-community/matchanything_eloftr), and refine the matches. This model is only 16M parameters and can be run on a CPU. You will use the [`AutoModelForKeypointMatching`] class. ```python from transformers import AutoImageProcessor, AutoModelForKeypointMatching import torch processor = AutoImageProcessor.from_pretrained("zju-community/matchanything_eloftr") model = AutoModelForKeypointMatching.from_pretrained("zju-community/matchanything_eloftr")) ``` Load two images that have the same object of interest. The second photo is taken a second apart, it's colors are edited, and it is further cropped and rotated.
Bee Bee edited
```python from transformers.image_utils import load_image image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg") image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee_edited.jpg") images = [image1, image2] ``` We can pass the images to the processor and infer. ```python inputs = processor(images, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) ``` We can postprocess the outputs. The threshold parameter is used to refine noise (lower confidence thresholds) in the output matches. ```python image_sizes = [[(image.height, image.width) for image in images]] outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2) print(outputs) ``` Here's the outputs. ```text [{'keypoints0': tensor([[4514, 550], [4813, 683], [1972, 1547], ... [3916, 3408]], dtype=torch.int32), 'keypoints1': tensor([[2280, 463], [2378, 613], [2231, 887], ... [1521, 2560]], dtype=torch.int32), 'matching_scores': tensor([0.2189, 0.2073, 0.2414, ... ])}] ``` We have trimmed the output but there's 401 matches! ```python len(outputs[0]["keypoints0"]) # 401 ``` We can visualize them using the processor's [`~EfficientLoFTRImageProcessor.visualize_keypoint_matching`] method. ```python plot_images = processor.visualize_keypoint_matching(images, outputs) plot_images ``` ![Matched Image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/matched_bees.png) Optionally, you can use the [`Pipeline`] API and set the task to `keypoint-matching`. ```python from transformers import pipeline image_1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg" image_2 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee_edited.jpg" pipe = pipeline("keypoint-matching", model="zju-community/matchanything_eloftr") pipe([image_1, image_2]) ``` The output looks like following. ```bash [{'keypoint_image_0': {'x': 2444, 'y': 2869}, 'keypoint_image_1': {'x': 837, 'y': 1500}, 'score': 0.9756593704223633}, {'keypoint_image_0': {'x': 1248, 'y': 2819}, 'keypoint_image_1': {'x': 862, 'y': 866}, 'score': 0.9735618829727173}, {'keypoint_image_0': {'x': 1547, 'y': 3317}, 'keypoint_image_1': {'x': 1436, 'y': 1500}, ... } ] ```