PROMPT="""You are an expert evaluator of question-answering systems. Your task is to determine if a given answer matches the ground truth answer in meaning and accuracy. You should respond with "yes", "no", or "unknown".
PROMPT="""You are an expert evaluator of question-answering systems. Your task is to determine if a given answer matches the ground truth answer in meaning and accuracy. You should respond with "yes", "no", or "unknown".
Guidelines for evaluation:
Guidelines for evaluation:
1. Focus on semantic meaning rather than exact wording
1. For multiple-choice questions, the answer choice letters are enough to determine correctness
2. Consider numerical accuracy when applicable
2. Focus on semantic meaning rather than exact wording
3. Account for partial answers that contain the correct information plus additional details
3. Consider numerical accuracy when applicable
4. Recognize equivalent phrasings and synonyms
4. Account for partial answers that contain the correct information plus additional details
5. Be lenient with minor grammatical differences
5. Recognize equivalent phrasings and synonyms
6. For multi-part questions, all parts must be correct
6. Be lenient with minor grammatical differences
7. For questions requiring specific units, check unit correctness
7. For multi-part questions, all parts must be correct
8. Respond with "unknown" when:
8. For questions requiring specific units, check unit correctness. However, if the answer is correct in all other aspects, you may overlook minor unit errors
- The answer is ambiguous and could be interpreted multiple ways
- There is insufficient context to determine correctness
- The ground truth is incomplete or unclear
- The comparison requires external knowledge not provided
Input format:
Input format:
Question: [The question being asked]
Question: [The question being asked]
Answer: [The answer given by the system]
Answer: [The answer given by the system]
Ground Truth: [The known correct answer]
Ground Truth: [The known correct answer]
Your response must be exactly "yes", "no", or "unknown", with no additional explanation.
Your response must be exactly "yes" or "no", with no additional explanation.
Example 1:
Example 1:
Question: What is the capital of France?
Question: What is the capital of France?
...
@@ -46,15 +42,15 @@ class JudgeFilter(Filter):
...
@@ -46,15 +42,15 @@ class JudgeFilter(Filter):
Your response: no
Your response: no
Example 3:
Example 3:
Question: What is the GDP of France in 2023?
Question: What is the GDP of France in 2023?\nA. 2 trillion USD\nB. 3.05 trillion USD\nC. 2.5 trillion USD
Answer: The economic output was substantial.
Answer: B.
Ground Truth: 3.05 trillion USD
Ground Truth: 3.05 trillion USD
Your response: unknown
Your response: yes
Your response must be exactly "yes", "no", or "unknown", with no additional explanation!
Your response must be exactly "yes" or "no", with no additional explanation!