Add TMLU Benchmark Dataset (#2093)
* add taiwan truthful qa * add tmlu * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script * add pega eval and legal eval * add ccp eval * Update .gitignore and harness_eval.slurm * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script * Add Pega MMLU task and configuration files * Add new models and update parameters in run_all.sh * Add UMTCEval tasks and configurations * Update dataset paths and output path * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py * Update SLURM script and add new models * clean for pr * Update lm_eval/tasks/tmlu/default/tmlu.yaml Co-authored-by:Lintang Sutawika <lintang@sutawika.com> * adjust tag name * removed group alias from tasks * format --------- Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
Yen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
Showing
lm_eval/tasks/tmlu/README.md
0 → 100644
Please register or sign in to comment