lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_class · eea16d360291535ffbbe396b654a68dc57cfacf0 · gaoqiong / lm-evaluation-harness

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt



The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unitxt]'
2. Added a check that unitxt is installed and print a clear error message if not

* Commited missing pyproject change

* Added documentation on adding datasets

* More doc changes

* add unitxt extra to readme

* run precommit

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

885f48d6

unitxt_tasks.classification.multi_class 517 Bytes

Replace unitxt_tasks.classification.multi_class