{"id": 0, "image": ["./examples/Q1_1.png"], "conversations": [{"from": "human", "value": "Question: Consider the real-world 3D locations of the objects. Which is closer to the sink, the toilet paper or the towel?\nOptions: \nA. toilet paper\nB. towel\nGive me the answer letter directly. The best answer is:"}], "source": "SITE-Bench", "GT": "A"}
{"id": 1, "image": ["./examples/Q2_1.png", "./examples/Q2_2.png"], "conversations": [{"from": "human", "value": "If the landscape painting is on the east side of the bedroom, where is the window located in the bedroom?\nOptions: A. North side, B. South side, C. West side, D. East side\nAnswer with the option's letter from the given choices directly. Enclose the option's letter within ``."}], "source": "MMSI-Bench", "GT": "C"}
{"id": 2, "image": ["./examples/Q3_1.png", "./examples/Q3_2.png", "./examples/Q3_3.png"], "conversations": [{"from": "human", "value": "The robot is making tea. What is the order in which the pictures were taken?"}], "source": "MMSI-Bench", "GT": "Second, first, third"}
{"id": 3, "image": ["./examples/Q4.png"], "conversations": [{"from": "human", "value": "Please provide the bounding box coordinate of the region this sentence describes: [blue shirt lady]"}], "source": "RefCOCO", "GT": "[0.096234, 0.161229, 0.436516, 1.000000]"}
{"id": 4, "image": ["./examples/Q5.png"], "conversations": [{"from": "human", "value": "Identify the minimal distance between the point and the camera, in meters."}], "source": "Ibims", "GT": "4.4"}
{"id": 5, "image": ["./examples/Q6.png"], "conversations": [{"from": "human", "value": "Enclose your thinking process in tags and your final answer in "}], "source": "SolidMath", "GT": "D"}
{"id": 6, "image": ["./examples/Q7.png"], "conversations": [{"from": "human", "value": "请将你的思考过程放在标签内,并将你的最终答案放在标签内。"}], "source": "SolidMath", "GT": "D"}
{"id": 7, "image": ["./examples/Q8_1.jpg", "./examples/Q8_2.jpg", "./examples/Q8_3.jpg", "./examples/Q8_4.jpg"], "conversations": [{"from": "human", "value": "Based on these four images (image 1, 2, 3, and 4) showing the pink bottle from different viewpoints (front, left, back, and right), with each camera aligned with room walls and partially capturing the surroundings: From the viewpoint presented in image 4, what is to the left of the pink bottle?\nOptions: A. Pink plush toy and headboard B. Window and blue curtain C. Closet and door D. White wall\nAnswer with the option's letter from the given choices directly."}], "source": "MindCube-Tiny-Bench", "GT": "C"}
{"id": 8, "image": ["./examples/Q9.jpg"], "conversations": [{"from": "human", "value": "Question: Consider the real-world 3D locations and orientations of the objects. Which side of the bus in the center is facing the bus stop?\nOptions: \nA. front\nB. left\nC. back\nD. right\nGive me the answer letter directly. The best answer is:"}], "source": "SITE-Bench", "GT": "D"}
{"id": 9, "image": ["./examples/Q10.jpg"], "conversations": [{"from": "human", "value": "Question: Consider the real-world 3D orientations of the objects. Are the arrow on street sign and the taxi facing same or similar directions, or very different directions?\nOptions: \nA. same or similar directions\nB. very different directions\nGive me the answer letter directly. The best answer is:"}], "source": "SITE-Bench", "GT": "A"}
{"id": 10, "image": ["./examples/Q11.jpg"], "conversations": [{"from": "human", "value": "Question: What shape are all the men standing in?\nOptions: A. circle B. rectangle C. triangle D. square\nGive me the answer letter directly. The best answer is:"}], "source": "SITE-Bench", "GT": "A"}
{"id": 11, "image": ["./examples/Q12.jpg"], "conversations": [{"from": "human", "value": "From the perspective of this man who doesn't wear glasses, where is the man wearing glasses located beside him?\nOptions: A. left B. back-right C. front D. right\nAnswer with the option's letter from the given choices directly."}], "source": "ViewSpatial-Bench", "GT": "A"}
{"id": 12, "image": ["./examples/Q13_1.png", "./examples/Q13_2.png"], "conversations": [{"from": "human", "value": "The iMac is in the northern part of the room. In which direction is the area where students do their homework?"}], "source": "MMSI-Bench", "GT": "Northwest corner"}
{"id": 13, "image": ["./examples/Q14_1.png", "./examples/Q14_2.png"], "conversations": [{"from": "human", "value": "How many building models are captured in total in these two pictures?"}], "source": "MMSI-Bench", "GT": "4"}
{"id": 14, "image": ["./examples/Q15.png"], "conversations": [{"from": "human", "value": "请将你的思考过程放在标签内,并将你的最终答案放在标签内。"}], "source": "SolidMath", "GT": "B"}
{"id": 15, "image": ["./examples/Q16.png"], "conversations": [{"from": "human", "value": "请将你的思考过程放在标签内,并将你的最终答案放在标签内。"}], "source": "SolidMath", "GT": "C"}
{"id": 16, "image": ["./examples/Q17.png"], "conversations": [{"from": "human", "value": "请将你的思考过程放在标签内,并将你的最终答案放在标签内。"}], "source": "SolidMath", "GT": "C"}
{"id": 17, "image": ["./examples/Q18.png"], "conversations": [{"from": "human", "value": "请将你的思考过程放在标签内,并将你的最终答案放在标签内。"}], "source": "SolidMath", "GT": "A"}