Unverified Commit 7d242381 authored by Baber Abbasi's avatar Baber Abbasi Committed by GitHub
Browse files

change glianorex to test split (#2332)

* change glianorex to test set

* nit

* fix test; doc_to_target can be str for multiple_choice

* nit
parent af92448e
...@@ -18,3 +18,8 @@ All tasks are multiple choice questions with 4 options, only one correct option. ...@@ -18,3 +18,8 @@ All tasks are multiple choice questions with 4 options, only one correct option.
- `glianorex_en`: Evaluates the accuracy on 264 questions in English. - `glianorex_en`: Evaluates the accuracy on 264 questions in English.
- `glianorex_fr`: Evaluates the accuracy on 264 questions in French. - `glianorex_fr`: Evaluates the accuracy on 264 questions in French.
#### Change Log
* (all tasks) 2024-09-23 -- 1.0
* Switched the `test_split` from `train` to `test`.
task: glianorex task: glianorex
dataset_path: maximegmd/glianorex dataset_path: maximegmd/glianorex
output_type: multiple_choice output_type: multiple_choice
test_split: train test_split: test
doc_to_text: !function preprocess_glianorex.doc_to_text doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target doc_to_target: !function preprocess_glianorex.doc_to_target
doc_to_choice: [ 'A', 'B', 'C', 'D' ] doc_to_choice: [ 'A', 'B', 'C', 'D' ]
...@@ -12,3 +12,5 @@ metric_list: ...@@ -12,3 +12,5 @@ metric_list:
- metric: acc_norm - metric: acc_norm
aggregation: mean aggregation: mean
higher_is_better: true higher_is_better: true
metadata:
version: 1.0
task: glianorex_en task: glianorex_en
dataset_path: maximegmd/glianorex dataset_path: maximegmd/glianorex
output_type: multiple_choice output_type: multiple_choice
test_split: train test_split: test
doc_to_text: !function preprocess_glianorex.doc_to_text doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target doc_to_target: !function preprocess_glianorex.doc_to_target
process_docs: !function preprocess_glianorex.filter_english process_docs: !function preprocess_glianorex.filter_english
...@@ -13,3 +13,5 @@ metric_list: ...@@ -13,3 +13,5 @@ metric_list:
- metric: acc_norm - metric: acc_norm
aggregation: mean aggregation: mean
higher_is_better: true higher_is_better: true
metadata:
version: 1.0
task: glianorex_fr task: glianorex_fr
dataset_path: maximegmd/glianorex dataset_path: maximegmd/glianorex
output_type: multiple_choice output_type: multiple_choice
test_split: train test_split: test
doc_to_text: !function preprocess_glianorex.doc_to_text doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target doc_to_target: !function preprocess_glianorex.doc_to_target
process_docs: !function preprocess_glianorex.filter_french process_docs: !function preprocess_glianorex.filter_french
...@@ -13,3 +13,5 @@ metric_list: ...@@ -13,3 +13,5 @@ metric_list:
- metric: acc_norm - metric: acc_norm
aggregation: mean aggregation: mean
higher_is_better: true higher_is_better: true
metadata:
version: 1.0
...@@ -7,7 +7,8 @@ def doc_to_text(doc) -> str: ...@@ -7,7 +7,8 @@ def doc_to_text(doc) -> str:
return f"Question: {doc['question']}\n{answers}Answer:" return f"Question: {doc['question']}\n{answers}Answer:"
def doc_to_target(doc) -> int: def doc_to_target(doc) -> str:
# answer_idx is `A`, `B`, `C`, `D` etc.
return doc["answer_idx"] return doc["answer_idx"]
......
...@@ -101,7 +101,11 @@ class TestNewTasks: ...@@ -101,7 +101,11 @@ class TestNewTasks:
) )
_array_target = [task.doc_to_target(doc) for doc in arr] _array_target = [task.doc_to_target(doc) for doc in arr]
if task._config.output_type == "multiple_choice": if task._config.output_type == "multiple_choice":
assert all(isinstance(label, int) for label in _array_target) # TODO<baber>: label can be string or int; add better test conditions
assert all(
(isinstance(label, int) or isinstance(label, str))
for label in _array_target
)
def test_build_all_requests(self, task_class, limit): def test_build_all_requests(self, task_class, limit):
task_class.build_all_requests(rank=1, limit=limit, world_size=1) task_class.build_all_requests(rank=1, limit=limit, world_size=1)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment