Add Histoires Morales task (#2662)

* Add Histoires Morales task * Histoires Morales task: fix mixed line endings * Histoires Morales task: fix mixed line endings * Remove tag for a single task * Add some MT for Histoires Morales

Add Histoires Morales task (#2662)
* Add Histoires Morales task * Histoires Morales task: fix mixed line endings * Histoires Morales task: fix mixed line endings * Remove tag for a single task * Add some MT for Histoires Morales
1208afd3 · Irina Proskurina · GitHub · fe9c5707 · 1208afd3 · 1208afd3
Unverified Commit 1208afd3 authored Jan 29, 2025 by Irina Proskurina Committed by GitHub Jan 29, 2025
4 changed files
--- a/lm_eval/tasks/README.md
+++ b/lm_eval/tasks/README.md
--- a/lm_eval/tasks/histoires_morales/README.md
+++ b/lm_eval/tasks/histoires_morales/README.md
+# Histoires Morales
+### Paper
+Title: `Histoires Morales: A French Dataset for Assessing Moral Alignment`
+Abstract: `https://arxiv.org/pdf/2501.17117`
+⚖ Histoires Morales is the first dataset for moral model alignment evaluation in French. It consists of narratives describing normative and norm-divergent actions taken by individuals to achieve certain intentions in concrete situations, along with their respective consequences.
+Each of the 12,000 stories (histoires) follows the same seven-sentence structure as the Moral Stories dataset:
+Context:
+1. Norm: A guideline for social conduct generally observed by most people in everyday situations.
+2. Situation: The setting of the story, introducing participants and describing their environment.
+3. Intention: A reasonable goal that one of the story participants (the actor) wants to achieve.
+Normative path:
+4. Normative action: An action by the actor that fulfills the intention while observing the norm.
+5. Normative consequence: A possible effect of the normative action on the actor’s environment.
+Norm-divergent path:
+6. Divergent action: An action by the actor that fulfills the intention but diverges from the norm.
+7. Divergent consequence: A possible effect of the divergent action on the actor’s environment.
+Histoires Morales is adapted to French from the widely used Moral Stories dataset.
+We translated the Moral Stories dataset and refined these translations through manual annotations.
+See paper for more details.
+Homepage: `https://huggingface.co/datasets/LabHC/histoires_morales`
+### Citation
+Coming soon (accepted to NAACL 2025)
+### Groups, Tags, and Tasks
+#### Groups
+* Not part of a group yet
+#### Tags
+No tags, since there is a single task.
+#### Tasks
+* `histoires_morales.yaml`
+### Checklist
+For adding novel benchmarks/datasets to the library:
+* [x] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/histoires_morales/histoires_morales.yaml
+++ b/lm_eval/tasks/histoires_morales/histoires_morales.yaml
+task: histoires_morales
+dataset_path: LabHC/histoires_morales
+output_type: multiple_choice
+test_split: train
+process_docs: !function utils.process_docs
+doc_to_text: "{{query}}"
+doc_to_target: "{{label}}"
+doc_to_choice: "choices"
+metric_list:
+  - metric: acc
+    aggregation: mean
+    higher_is_better: true
+  - metric: acc_norm
+    aggregation: mean
+    higher_is_better: true
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/histoires_morales/utils.py
+++ b/lm_eval/tasks/histoires_morales/utils.py
+import datasets
+def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
+    def _process_doc(doc):
+        ctx = (
+            doc["norm"].capitalize()
+            + " "
+            + doc["situation"].capitalize()
+            + " "
+            + doc["intention"].capitalize()
+        )
+        choices = [doc["moral_action"], doc["immoral_action"]]
+        out_doc = {
+            "query": ctx,
+            "choices": choices,
+            "label": 0,
+        }
+        return out_doc
+    return dataset.map(_process_doc)