Add various social bias tasks (#1185)
* Implementation of Winogender
* Minor fixes README.md
* Add winogender
* Clean winogender utils.py
* Change dataset to one containing All subsets
* Flesh out README for BBQ task
* Add missing tasks for BBQ
* Add simple cooccurrence bias task
* Fix wrong mask for ambiguated context+rename metrics
* Made generate_until evaluation (following PALM paper) default
Also moved separate config files per category to separate metrics using custom function.
Created config file for multiple_choice way of evaluating BBQ.
* Add missing version metadata
* Add missing versionmetadata for bbq multiple choice
* Fix metrics and address edge cases
* Made BBQ multiple choice the default version
* Added settings following winogrande
* Add num_fewshot to simple_cooccurrence_bias
* Fixes for bbq (multiple choice)
* Fix wrong dataset
* CrowS-Pairs: make it easier to use another dataset by removing dataset_name from the subsets.
* Use simplest prompt possible without description
* Merge
* BBQ: Fix np.NaN related bug
* BBQ: Fix wrong aggregation method for disamb accuracy
* BBQ: Make it possible to only evaluate on (dis)ambiguous subset (needed for few shot eval)
* BBQ: fix showing one target in case of few-shot evals
* BBQ: Fix few-shot example for bbq_generate
* BBQ: simplify subtasks
* BBQ: Minimize number of UNK variations to reduce inference time
* BBQ: Add extra UNK keywords for the generate task
* Add a generate_until version of simple_cooccurrence_bias
* Change system/description prompt to include few-shot examples
* Group agg rework
* Run pre-commit
* add tasks to readme table
* remove trailing space from simple_cooccurrence_bias_gen.yaml `doc_to_text`
* fix
* fix
* fix version
---------
Co-authored-by:
Baber <baber@hey.com>
Showing
lm_eval/tasks/bbq/README.md
0 → 100644
lm_eval/tasks/bbq/utils.py
0 → 100644
Please register or sign in to comment