Commits · ca3d86d6b8dea86211ec60b93c3c026ce73c9d60 · gaoqiong / lm-evaluation-harness

19 Aug, 2024 1 commit

Add TMLU Benchmark Dataset (#2093) · ca3d86d6

Yen-Ting Lin authored Aug 19, 2024



* add taiwan truthful qa

* add tmlu

* Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script

* add pega eval and legal eval

* add ccp eval

* Update .gitignore and harness_eval.slurm

* Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script

* Add Pega MMLU task and configuration files

* Add new models and update parameters in run_all.sh

* Add UMTCEval tasks and configurations

* Update dataset paths and output path

* Update .gitignore and harness_eval.slurm, and modify _generate_configs.py

* Update SLURM script and add new models

* clean for pr

* Update lm_eval/tasks/tmlu/default/tmlu.yaml
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

* adjust tag name

* removed group alias from tasks

* format

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
Co-authored-by: Yen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>

ca3d86d6